This tutorial is based on essential background material described in the soilDB and related NASIS ODBC connection tutorials. If you have not reviewed those tutorials, please do so before proceeding.
The functions and documentation described in this tutorial are still a work in progress.
The soilDB
package contains a number of convenience functions (usually having a prefix of "fetch") for extracting data from commonly used databases, performing basic quality control checks, and returning the result as a SoilProfileCollection object. In order to simplify the process of stitching-together the many linked (and sometimes nested) tables associated with data in NASIS, convenience functions (e.g. fetchNASIS()
) must make a number of assumptions about the data. This document describes how those assumptions can be adjusted and how they may affect subsequent data analysis.
Please see the related tutorial on loading NASIS pedon data into R.
Load up your local database and selected set with some components. NASIS queries should "target" the tables:
The most commonly used data from the data map unit, component, component horizon tables can be quickly loaded using the fetchNASIS()
function.
# load required libraries
library(soilDB)
library(ggplot2)
library(plyr)
# get component + component horizon data as a SoilProfileCollection, using defaults
nc <- fetchNASIS(from = 'components')
The result is a SoilProfileCollection object and can be subset or visualized as such. Lets try extracting just the components with names that match the pattern "Amador".
# filter out just those components named "Amador"
idx <- grep('Amador', nc$compname, ignore.case = TRUE)
nc.sub <- nc[idx, ]
# plot
par(mar=c(2,0,3,0))
plot(nc.sub, color='claytotal_r', label='dmudesc')
title(sub='Components named "Amador"', line=0)
There appear to be many duplicate components. This isn't surprising as a single component template is commonly shared among many map units. Lets filter out a unique subset of components based on horizon depth, horizon designation, and clay content.
# generate an index to unique components and subset
idx <- unique(nc.sub, vars=c('hzdept_r', 'hzdepb_r', 'claytotal_r', 'hzname'))
nc.sub.unique <- nc.sub[idx, ]
# plot the subset
par(mar=c(2,0,3,0))
plot(nc.sub.unique, color='claytotal_r', label='dmudesc', n=length(nc.sub))
title(sub='Unique components named "Amador"', line=0)
Sometimes it is convenient to know which pedons are linked to components. The following function returns a data.frame
(table) of data containing component record IDs, pedon record IDs, and pedon IDs. Lets assemble a list of all pedons linked to a component named "Amador", using the subset from above.
# get the full set of linked pedon data from the selected set
nc.linked.pedons <- get_copedon_from_NASIS_db()
# check format
head(nc.linked.pedons)
## coiid peiid pedon_id representative
## 1 2073965 207247 06JCR002 1
## 2 2076762 530760 11BJM046 1
## 3 2045840 620506 11DEB041 0
## 4 2528931 839210 2012CA6302054 1
## 5 2278328 978613 2014CA6304027 1
## 6 2528901 1209611 2017CA6303009 1
# list just those pedons that are linked to our subset of components
# note that we are using the component record ID (coiid) to connect the two data sources
idx <- which(nc.linked.pedons$coiid %in% site(nc.sub)$coiid)
nc.linked.pedons[idx, ]
## coiid peiid pedon_id representative
## 1 2073965 207247 06JCR002 1
## 4 2528931 839210 2012CA6302054 1
## 5 2278328 978613 2014CA6304027 1
## 6 2528901 1209611 2017CA6303009 1
## 960 2331930 620494 S2011DEB094N 1
## 966 2331929 978594 2013CA6304020 1
The NASIS data elements that exist at levels higher than the data map unit table must be loaded with a different function because there isn't always a 1:1 mapping between data map unit objects and map unit / legend objects.
nc.correlation <- get_component_correlation_data_from_NASIS_db()
head(nc.correlation, 1)
## muiid musym nationalmusym
## 3 1865918 3046 20mmx
## muname mukind mutype
## 3 Goldwall, intermound-Goldwall, mound-Rock outcrop complex, 0 to 8 percent slopes complex <NA>
## mustatus muacres farmlndcl repdmu dmuiid areasymbol
## 3 provisional 5561 <NA> 1 524297 CA630
## areaname ssastatus
## 3 Central Sierra Foothills Area, California, Parts of Calaveras and Tuolumne Counties initial
## cordate
## 3 <NA>
Lets tabulate the occurrence of components named "Amador" by soil survey area and component kind, using data loaded in the above examples.
# combine basic site data + correlation / legend data
# note that there is a 1:many relationship
nc.combined <- join(site(nc.sub), nc.correlation, by='dmuiid')
# cross-tabulate
xtabs( ~ areasymbol + compkind, data=nc.combined)
## compkind
## areasymbol series taxadjunct
## CA630 6 1
It can be convenient to join elements from the DMU, component, and comonth tables into a single data.frame
. Lets start by loading the DMU/component records and creating a new ID that combines the DMU description and component name.
# load only the component table
comp <- get_component_data_from_NASIS_db()
# make a new ID by combining DMU name + comp name
comp$id <- paste0(comp$dmudesc, '-', comp$compname)
The comonth records are indexed by coiid
values. In some cases there are no records populated in this table. You can "fill" missing records with NA
by using the fill=TRUE
argument.
# load comonth as-is: there will be missing records
cm <- get_comonth_from_NASIS_db()
nrow(cm)
## [1] 6718
# load comonth after filling missing months
cm <- get_comonth_from_NASIS_db(fill = TRUE)
nrow(cm)
## [1] 11231
# there should be the same number of "coiid" values after filling missing comonth records
length(comp$coiid) == length(unique(cm$coiid))
## [1] FALSE
# join component table to "filled" comonth table
x <- join(comp, cm, by='coiid')
nrow(x)
## [1] 8556
Try plotting some of the combined data, for select DMU and components. NULL values are displayed as grey squares.
ggplot(subset(x, subset= dmudesc == 'CA6305015'), aes(month, factor(id), flodfreqcl)) + geom_tile(aes(fill = flodfreqcl), color='white', lwd=1) + scale_fill_brewer(palette = "Spectral", drop=FALSE, na.value='grey80', name='Flooding Frequency') + scale_x_discrete(drop=FALSE) + theme_minimal() + xlab('') + ylab('')
ggplot(subset(x, subset= compname == 'Gillender'), aes(month, factor(id), flodfreqcl)) + geom_tile(aes(fill = flodfreqcl), color='white', lwd=1) + scale_fill_brewer(palette = "Spectral", drop=FALSE, na.value='grey80', name='Flooding Frequency') + scale_x_discrete(drop=FALSE) + theme_minimal() + xlab('') + ylab('')
Try plotting those components with flooding / ponding frequency classes that are something other than "None" (or NULL).
idx <- which(x$flodfreqcl != 'none')
ggplot(subset(x, subset= coiid %in% x$coiid[idx]), aes(month, factor(id), flodfreqcl)) + geom_tile(aes(fill = flodfreqcl), color='white', lwd=1) + scale_fill_brewer(palette = "Spectral", drop=FALSE, na.value='grey80', name='Flooding Frequency') + scale_x_discrete(drop=FALSE) + theme_minimal() + xlab('') + ylab('')
idx <- which(x$floddurcl != 'none')
ggplot(subset(x, subset= coiid %in% x$coiid[idx]), aes(month, factor(id), floddurcl)) + geom_tile(aes(fill = floddurcl), color='white', lwd=1) + scale_fill_brewer(palette = "Spectral", drop=FALSE, na.value='grey80', name='Flooding Duration') + scale_x_discrete(drop=FALSE) + theme_minimal() + xlab('') + ylab('')
idx <- which(x$pondfreqcl != 'none')
ggplot(subset(x, subset=coiid %in% x$coiid[idx]), aes(month, factor(id), pondfreqcl)) + geom_tile(aes(fill = pondfreqcl), color='white', lwd=1) + scale_fill_brewer(palette = "Spectral", drop=FALSE, na.value='grey80', name='Ponding Frequency') + scale_x_discrete(drop=FALSE) + theme_minimal() + xlab('') + ylab('')
idx <- which(x$ponddurcl != 'none')
ggplot(subset(x, subset= coiid %in% x$coiid[idx]), aes(month, factor(id), ponddurcl)) + geom_tile(aes(fill = ponddurcl), color='white', lwd=1) + scale_fill_brewer(palette = "Spectral", drop=FALSE, na.value='grey80', name='Ponding Duration') + scale_x_discrete(drop=FALSE) + theme_minimal() + xlab('') + ylab('')
Details pending.
nc.esd <- get_component_esd_data_from_NASIS_db()
head(nc.esd)
## [1] coiid ecositeid ecositenm ecositeorigin ecositetype ecositemlra
## [7] ecositelru ecositenumber ecositestate
## <0 rows> (or 0-length row.names)
These examples are still a work in progress. Detailed descriptions pending...
There are some cases where a single DMU may be linked to multiple map units (such as...).
library(igraph)
# re-load relevant data from NASIS:
# just MU/DMU objects with components named "Amador"
nc <- get_component_data_from_NASIS_db()
idx <- grep('Amador', nc$compname, ignore.case = TRUE)
nc.sub <- nc[idx, ]
nc.correlation <- get_component_correlation_data_from_NASIS_db()
nc.combined <- join(nc.sub, nc.correlation, by='dmuiid')
# extract edges of graph linking DMU --> MU
e <- na.omit(nc.combined[, c('dmuiid', 'nationalmusym')])
# create vertex attributes
v <- rbind(cbind(name=e[, 1], color='RoyalBlue', label.cex=0.75), cbind(name=e[, 2], color='Orange', label.cex=0.85))
v <- as.data.frame(v, stringsAsFactors = FALSE)
v$label.font <- 2
v$label.cex <- as.numeric(v$label.cex)
# optionally flag DMUs that connect to multiple MU
if(exists('multiple.mu.per.dmu', envir=soilDB.env)) {
dupe.dmu <- get('multiple.mu.per.dmu', envir=soilDB.env)
v$color[which(v$name %in% dupe.dmu)] <- 'Yellow'
}
# create graph
g <- graph.data.frame(e, directed=TRUE, vertices = unique(v))
V(g)$size <- 5
# check
par(mar=c(0,0,0,0))
set.seed(10101)
plot(g, vertex.label.color='black', vertex.label.family='sans', edge.arrow.size=0.5)
legend('bottomleft', legend = c('MU (national musym)', 'DMU (dmu description)', 'flagged DMU (dmu description)'), pt.bg=c('Orange', 'RoyalBlue', 'Yellow'), pch=21, pt.cex=2, cex=0.85, bty='n')
Rank data map units that contain a major component named "Amador" based on the degree to which component horizon data has been populated.
this needs work
library(lattice)
# re-load relevant data from NASIS:
# get the component data
nc <- get_component_data_from_NASIS_db()
# subset columns to those we are using
nc <- nc[, c('dmuiid', 'dmudesc', 'coiid', 'compname', 'comppct_r', 'majcompflag')]
# just MU/DMU objects with components named "Amador"
idx <- grep('Amador', nc$compname, ignore.case = TRUE)
nc <- nc[idx, ]
# look only at major components for now
nc <- nc[which(nc$majcompflag == 1), ]
# get component horizon data
nc.hz <- fetchNASIS(from='components')
# subset the unique set of component IDs that have horizon data
nc.hz.coiids <- site(nc.hz)$coiid
# make a new column and fill with '0' when no horizon data present, '1' when horizon data present
nc$hz.data.present <- 0
nc$hz.data.present[which(nc$coiid %in% nc.hz.coiids)] <- 1
# get correlation data and join to component data
nc.correlation <- get_component_correlation_data_from_NASIS_db()
nc.combined <- join(nc, nc.correlation, by='dmuiid', type='inner')
# tabulate components w/o hz data, w/ hz data, and total components
addmargins(table(nc.combined$hz.data.present))
##
## 1 Sum
## 5 5
# compute weighted fraction of hz data populated using component pct
p <- ddply(nc.combined, 'dmudesc', function(i) {
with(i, sum((comppct_r / 100) * hz.data.present), na.rm=TRUE)
})
names(p) <- c('dmudesc', 'frac.populated')
# re-order musym labels in increasing order of hz data populated
p$dmudesc <- factor(p$dmudesc, levels=p$dmudesc[order(p$frac.populated)])
dotplot(dmudesc ~ frac.populated, data=p, cex=1.25, ylab='DMU Description', xlab='Fraction Component Hz Data Populated', main='Major Components Named "Amador"', scales=list(x=list(alternating=TRUE, tick.number=10)), subset=dmudesc != 'NOTCOM National DMU', panel=function(...){
panel.abline(v=seq(0, 1, by=0.1), col='grey')
panel.dotplot(...)
})
Just kidding, this stuff matters.
As part of the quality control process, a number of possible messages are printed to the console when running fetchNASIS()
. These messages and their meaning are described below:
-> QC: horizon errors detected ...
Horizon depths are checked for gaps, overlap, or bottom depths than are shallower than top depths. In most cases pedons flagged as having errors suggests a quality control issue. However, combination horizons (e.g. B/C) that have been entered as two distinct horizons sharing the same horizon depths will trigger this warning.
-> QC: DMUs assigned to multiple MU ...
This message means that some data map units in the selected set have been used within several different map units.
-> QC: duplicate muiids: multiple 'representative' DMU / MU ...
This message means that there may be several "representative" DMU assigned to a single map unit.
Some of the assumptions made by fetchNASIS()
can be adjusted using arguments to the function:
rmHzErrors = TRUE
Setting this value to TRUE
(the default) will enable checks for horizon depth consistency. Consider setting this argument to FALSE
if you aren't concerned about horizon depth errors, or know that your selected set contains many combination horizons. Note that any records flagged as having horizon depths errors (rmHzErrors = TRUE
) will be omitted from the data returned by fetchNASIS()
.
fill = FALSE
Should missing "month" rows in the comonth table be filled with NA (FALSE)?
An increased level of verbosity in the quality control messages can be enabled by setting the option: options(soilDB.verbose=TRUE)
. This is not generally suggested, but may be useful if you are having difficulty loading data from NASIS.
This document is based on aqp
version 1.15.2 and soilDB
version 2.0.