This tutorial is based on essential background material described in the soilDB and related NASIS ODBC connection tutorials. If you have not reviewed those tutorials, please do so before proceeding.

The functions and documentation described in this tutorial are still a work in progress.

Introduction

The soilDB package contains a number of convenience functions (usually having a prefix of "fetch") for extracting data from commonly used databases, performing basic quality control checks, and returning the result as a SoilProfileCollection object. In order to simplify the process of stitching-together the many linked (and sometimes nested) tables associated with data in NASIS, convenience functions (e.g. fetchNASIS() or fetchNASIS_component_data()) must make a number of assumptions about the data. This document describes how those assumptions can be adjusted and how they may affect subsequent data analysis.

Pedon Data

Please see the related tutorial on loading NASIS pedon data into R.

Component Data

Load up your local database and selected set with some components. NASIS queries should "target" the tables:

The most commonly used data from the data map unit, component, component horizon tables can be quickly loaded using the fetchNASIS_component_data() function.

# load required libraries
library(soilDB)
library(ggplot2)
library(plyr)

# get component + component horizon data as a SoilProfileCollection, using defaults
nc <- fetchNASIS_component_data()

The result is a SoilProfileCollection object and can be subset or visualized as such. Lets try extracting just the components with names that match the pattern "Amador".

# filter out just those components named "Amador"
idx <- grep('Amador', nc$compname, ignore.case = TRUE)
nc.sub <- nc[idx, ]

# plot
par(mar=c(2,0,3,0))
plot(nc.sub, color='claytotal_r', label='dmudesc')
title(sub='Components named "Amador"', line=0)

There appear to be many duplicate components. This isn't surprising as a single component template is commonly shared among many map units. Lets filter out a unique subset of components based on horizon depth, horizon designation, and clay content.

# generate an index to unique components and subset
idx <- unique(nc.sub, vars=c('hzdept_r', 'hzdepb_r', 'claytotal_r', 'hzname'))
nc.sub.unique <- nc.sub[idx, ]

# plot the subset
par(mar=c(2,0,3,0))
plot(nc.sub.unique, color='claytotal_r', print.id=FALSE, n=length(nc.sub))
title(sub='Unique components named "Amador"', line=0)

Accessing Additional Types of "Component" Data

Linked Pedons

Sometimes it is convenient to know which pedons are linked to components. The following function returns a data.frame (table) of data containing component record IDs, pedon record IDs, and pedon IDs. This can be used to link the results of fetchNASIS_component_data() to the results of fetchNASIS() (pedon data). Lets assemble a list of all pedons linked to a component named "Amador", using the subset from above.

# get the full set of linked pedon data from the selected set
nc.linked.pedons <- get_copedon_from_NASIS_db()
# check format
head(nc.linked.pedons)
##     coiid  peiid     pedon_id representative
## 1 1710250 144163 2004CA630008              1
## 2 1413310 157549   04CA632028              0
## 3 1413310 157988 S04CA099-001              1
## 4 1297559 158468 S04CA099-003              1
## 5 1297566 158468 S04CA099-003              1
## 6 1297567 158467 S04CA099-004              1
# list just those pedons that are linked to our subset of components
# note that we are using the component record ID (coiid) to connect the two data sources
idx <- which(nc.linked.pedons$coiid %in% site(nc.sub)$coiid)
nc.linked.pedons[idx, ]
##       coiid  peiid      pedon_id representative
## 4   1297559 158468  S04CA099-003              1
## 5   1297566 158468  S04CA099-003              1
## 573 2331927 542170 2012CA6303043              1
## 577 2073965 620526 2012CA6304005              1

Map Unit, Correlation, Legend, Area Tables

The NASIS data elements that exist at levels higher than the data map unit table must be loaded with a different function because there isn't always a 1:1 mapping between data map unit objects and map unit / legend objects.

nc.correlation <- get_component_correlation_data_from_NASIS_db()
head(nc.correlation, 1)
##     muiid musym nationalmusym                                           muname  mukind mustatus
## 1 1612048 101sa         1r3gk Amador-Gillender complex, 2 to 15 percent slopes Complex        3
##   muacres          farmlndcl repdmu dmuiid areasymbol                areaname ssastatus    cordate
## 1     113 Not prime farmland      1 291184      CA628 Amador Area, California Published 1963-03-01

Lets tabulate the occurrence of components named "Amador" by soil survey area and component kind, using data loaded in the above examples.

# combine basic site data + correlation / legend data
# note that there is a 1:many relationship
nc.combined <- join(site(nc.sub), nc.correlation, by='dmuiid')

# cross-tabulate
xtabs( ~ areasymbol + compkind, data=nc.combined)
##           compkind
## areasymbol Family Taxon above family Miscellaneous area Series Taxadjunct Variant
##      CA067      0                  0                  0      2          0       0
##      CA077      0                  0                  0      1          0       0
##      CA628      0                  0                  0      1          0       0
##      CA630      0                  0                  0      6          0       0
##      CA632      0                  0                  0      0          2       0
##      CA644      0                  0                  0      4          0       0
##      CA648      0                  0                  0      3          0       0

Component Month Table

It can be convenient to join elements from the DMU, component, and comonth tables into a single data.frame. Lets start by loading the DMU/component records and creating a new ID that combines the DMU description and component name.

# load only the component table
comp <- get_component_data_from_NASIS_db()
# make a new ID by combining DMU name + comp name
comp$id <- paste0(comp$dmudesc, '-', comp$compname)

The comonth records are indexed by coiid values. In some cases there are no records populated in this table. You can "fill" missing records with NA by using the fill=TRUE argument.

# load comonth as-is: there will be missing records
cm <- get_comonth_from_NASIS_db()
nrow(cm)
## [1] 5724
# load comonth after filling missing months
cm <- get_comonth_from_NASIS_db(fill = TRUE)
nrow(cm)
## [1] 13032
# there should be the same number of "coiid" values after filling missing comonth records
length(comp$coiid) == length(unique(cm$coiid))
## [1] TRUE
# join component table to "filled" comonth table
x <- join(comp, cm, by='coiid')
nrow(x)
## [1] 13032

Try plotting some of the combined data, for select DMU and components. NULL values are displayed as grey squares.

ggplot(subset(x, subset= dmudesc == 'CA6305015'), aes(month, factor(id), flodfreqcl)) + geom_tile(aes(fill = flodfreqcl), color='white', lwd=1) + scale_fill_brewer(palette = "Spectral", drop=FALSE, na.value='grey80', name='Flooding Frequency') + scale_x_discrete(drop=FALSE) + theme_minimal() + xlab('') + ylab('')  

ggplot(subset(x, subset= compname == 'Gillender'), aes(month, factor(id), flodfreqcl)) + geom_tile(aes(fill = flodfreqcl), color='white', lwd=1) + scale_fill_brewer(palette = "Spectral", drop=FALSE, na.value='grey80', name='Flooding Frequency') + scale_x_discrete(drop=FALSE) + theme_minimal() + xlab('') + ylab('')  

Try plotting those components with flooding / ponding frequency classes that are something other than "None" (or NULL).

ggplot(subset(x, subset= coiid %in% x$coiid[x$flodfreqcl != 'None']), aes(month, factor(id), flodfreqcl)) + geom_tile(aes(fill = flodfreqcl), color='white', lwd=1) + scale_fill_brewer(palette = "Spectral", drop=FALSE, na.value='grey80', name='Flooding Frequency') + scale_x_discrete(drop=FALSE) + theme_minimal() + xlab('') + ylab('')

ggplot(subset(x, subset= coiid %in% x$coiid[x$floddurcl != 'None']), aes(month, factor(id), floddurcl)) + geom_tile(aes(fill = floddurcl), color='white', lwd=1) + scale_fill_brewer(palette = "Spectral", drop=FALSE, na.value='grey80', name='Flooding Duration') + scale_x_discrete(drop=FALSE) + theme_minimal() + xlab('') + ylab('')

ggplot(subset(x, subset=coiid %in% x$coiid[x$pondfreqcl != 'None']), aes(month, factor(id), pondfreqcl)) + geom_tile(aes(fill = pondfreqcl), color='white', lwd=1) + scale_fill_brewer(palette = "Spectral", drop=FALSE, na.value='grey80', name='Ponding Frequency') + scale_x_discrete(drop=FALSE) + theme_minimal() + xlab('') + ylab('')

ggplot(subset(x, subset= coiid %in% x$coiid[x$ponddurcl != 'None']), aes(month, factor(id), ponddurcl)) + geom_tile(aes(fill = ponddurcl), color='white', lwd=1) + scale_fill_brewer(palette = "Spectral", drop=FALSE, na.value='grey80', name='Ponding Duration') + scale_x_discrete(drop=FALSE) + theme_minimal() + xlab('') + ylab('')