1 Introduction

This document demonstrates how to use the soilDB package to download KSSL data from SoilWeb. These data are from the September 2018 snapshot, and will be updated as future snapshots are released. Comparisons are made via graphical summaries of key soil properties with depth, using data structures and functions from the aqp package.

1.1 Installation

With a recent version of R (>= 3.5), it is possible to get all of the packages that this tutorial depends on via:

# stable packages from CRAN
install.packages('aqp', dep = TRUE)
install.packages('soilDB', dep = TRUE)
install.packages('Gmedian', dep = TRUE)
install.packages('tactile', dep = TRUE)
install.packages('maps', dep = TRUE)

2 Quick Example

KSSL and NASIS data are from the September 2018 snapshot. Details pending.

library(aqp)
library(soilDB)

# get lab and morphologic data
# simplify colors to 1 color (moist and dry) per horizon via mixing
# see ?mix_and_clean_colors for details.
s <- fetchKSSL(series = 'auburn', returnMorphologicData = TRUE, simplifyColors = TRUE)

# extract pedons into SoilProfileCollection
pedons <- s$SPC

The result is a list, check it out.

str(s, 2)
## List of 2
##  $ SPC  :Formal class 'SoilProfileCollection' [package "aqp"] with 9 slots
##  $ morph:List of 6
##   ..$ phcolor       :'data.frame':   105 obs. of  7 variables:
##   ..$ phfrags       :'data.frame':   27 obs. of  10 variables:
##   ..$ phpores       :'data.frame':   71 obs. of  5 variables:
##   ..$ phstructure   :'data.frame':   40 obs. of  7 variables:
##   ..$ pediagfeatures:'data.frame':   24 obs. of  5 variables:
##   ..$ rmf           : logi [1, 1] FALSE
# check out the "raw" morphologic data:
lapply(s$morph, head)
## $phcolor
##    phiid labsampnum colorpct colorhue colorvalue colorchroma colormoistst
## 1 820051   05N02182       NA     10YR          4           4          Dry
## 2 820051   05N02182       NA     10YR          3           3        Moist
## 3 820052   05N02183       NA    7.5YR          3           3        Moist
## 4 820052   05N02183       NA    7.5YR          4           6          Dry
## 5 820053   05N02184       NA      5YR          3           3        Moist
## 6 820053   05N02184       NA      5YR          4           4          Dry
## 
## $phfrags
##    phiid labsampnum fragvol                   fragkind fragsize_l fragsize_r fragsize_h fragshp
## 1 820051   05N02182      14 Metamorphic rock fragments          2          5         76      NA
## 2 820052   05N02183      10 Metamorphic rock fragments          2         20         76      NA
## 3 820053   05N02184      18 Metamorphic rock fragments          2          5         76      NA
## 4 570101   83P04043       5                       <NA>          2         NA         75      NA
## 5 570102   83P04044       5                       <NA>         75         NA        250      NA
## 6 570102   83P04044      10                       <NA>          2         NA         75      NA
##    fraground  fraghard
## 1 Subrounded Indurated
## 2 Subrounded Indurated
## 3 Subangular Indurated
## 4       <NA>      <NA>
## 5       <NA>      <NA>
## 6       <NA>      <NA>
## 
## $phpores
##    phiid labsampnum poreqty           poresize           poreshp
## 1 820051   05N02182       5               Fine      Interstitial
## 2 820051   05N02182       7             Medium           Tubular
## 3 820052   05N02183       5          Very fine Dendritic tubular
## 4 820052   05N02183       6             Medium           Tubular
## 5 820053   05N02184       5          Very fine Dendritic tubular
## 6 519185   40A23590      21 Very fine and fine              <NA>
## 
## $phstructure
##    phiid labsampnum structgrade structsize        structtype structid structpartsto
## 1 820051   05N02182    Moderate       Fine          Granular       NA            NA
## 2 820052   05N02183    Moderate     Medium Subangular blocky       NA            NA
## 3 820053   05N02184    Moderate     Medium Subangular blocky       NA            NA
## 4 519185   40A23590        <NA>       <NA>           Massive       NA            NA
## 5 518423   40A23592        <NA>       <NA>           Massive       NA            NA
## 6 518425   40A23594        <NA>       <NA>           Massive       NA            NA
## 
## $pediagfeatures
##   pedon_key  peiid featdept featdepb        featkind
## 1     11634 122235        0        5 Ochric epipedon
## 2     11634 122235        5       43  Cambic horizon
## 3     11634 122235       43        0  Lithic contact
## 4     32586 158467        0       18 Ochric epipedon
## 5     32586 158467       10       33  Cambic horizon
## 6     32586 158467       33       41  Lithic contact
## 
## $rmf
##       [,1]
## [1,] FALSE

Graphically check the results with soil profile sketches. The Auburn soil is typically shallow to bedrock, limit sketches to 60cm.

par(mar = c(1,0,0,2))
plotSPC(pedons, color = 'moist_soil_color', print.id = FALSE, name = 'hzn_desgn', width = 0.3, name.style = 'center-center', max.depth = 60)

Simplification of fragment data is still a manual process.

# simplify fragment data
s.frags <- simplifyFragmentData(s$morph$phfrags, id.var = 'labsampnum')

# merge fragment data into SPC via "left" join
horizons(pedons) <- s.frags

# check
par(mar = c(0, 0, 3, 2))
plotSPC(pedons, color = 'total_frags_pct', print.id = FALSE, name = 'hzn_desgn', width = 0.3, max.depth = 60)
addVolumeFraction(pedons, 'total_frags_pct', pch = 1, cex.min = 0.25, cex.max = 0.5)

2.1 Fragment Volume >= 100%

There are cases where errors in the source data (NASIS Pedon Horizon Fragments table) suggest coarse fragment volume > 100%. As of soilDB 2.0.2 and aqp 1.15.4, the simplifyFragmentData() function will report such errors and the addVolumeFraction() function will fail gracefully by truncating fragment volume at 100%.

Give it a try with an example (thanks to Brian Gardner for finding it).

# get lab and morphologic data
s <- fetchKSSL(series = 'kettenbach', returnMorphologicData = TRUE)

# extract pedons into SoilProfileCollection
pedons <- s$SPC

# simplify fragment data
s.frags <- simplifyFragmentData(s$morph$phfrags, id.var = 'labsampnum')

# merge fragment data into SPC via left join
horizons(pedons) <- s.frags

# flag horizons with >= 100% total fragment volume
pedons$fragment_errors <- factor(pedons$total_frags_pct >= 100)

# label pedons with user pedon ID
# label horizons with total fragment %
par(mar = c(0, 0, 3, 5))
plotSPC(pedons, color = 'fragment_errors', name = 'total_frags_pct', label = 'pedon_id', cex.name = 0.85)
addVolumeFraction(pedons, colname = 'total_frags_pct', pch = 1, cex.min = 0.25, cex.max = 0.5)
## Warning: 160 is >= 100, likely a data error
## Warning: 180 is >= 100, likely a data error

Check the original fragment data, based on the warnings issued by simplifyFragmentData(). Looks like multiple versions of the same data were accidentally entered. Errors like this can be corrected in NASIS, but may require assistance from the regional staff depending on KSSL pedon ownership. fetchKSSL() is based on a snapshot of NASIS and LIMS data, therefore corrections may not be available until after the next release cycle.

# identify and print affected rows
idx <- which(s$morph$phfrags$labsampnum %in% c('82P02200', '82P02204'))
s$morph$phfrags[idx, ]
##      phiid labsampnum fragvol         fragkind fragsize_l fragsize_r fragsize_h fragshp  fraground
## 8   570064   82P02200      10             <NA>         76         NA        250    <NA>       <NA>
## 9   570064   82P02200      70             <NA>          2         NA         76    <NA>       <NA>
## 10 4384673   82P02200      10 Basalt fragments         75         NA        250 Nonflat Subangular
## 11 4384673   82P02200      70 Basalt fragments          2         NA         75 Nonflat Subangular
## 24  571809   82P02204      40             <NA>          2         NA         76    <NA>       <NA>
## 25  571809   82P02204      50             <NA>         76         NA        250    <NA>       <NA>
## 26 4384681   82P02204      40 Basalt fragments          2         NA         75 Nonflat Subangular
## 27 4384681   82P02204      50 Basalt fragments         75         NA        250 Nonflat Subangular
##     fraghard
## 8       <NA>
## 9       <NA>
## 10 Indurated
## 11 Indurated
## 24      <NA>
## 25      <NA>
## 26 Indurated
## 27 Indurated

3 Fetch Data by ID

As of soilDB version 2.5.3, fetchKSSL() is fully vectorized with the exception of bbox. Note that it is much more efficient to work with the KSSL snapshot when attempting to load large collections of data by ID. Note that pedon 04N0610 appears to have an error in the horizon depths near 150cm. This is due to the presence of a “grab sample” (sub-sample of the Cg3 horizon) that was submitted for characterization.

# pedons indexed by "pedlabsampnum" ID
pls <- c("04N0610", "04N0611", "04N0612", "04N0613")

# vectorization = implicit iteration
res <- fetchKSSL(pedlabsampnum = pls, returnMorphologicData = TRUE, simplifyColors = TRUE)

# check: OK
str(res, 2)

# extract SPC
pedons <- res$SPC

par(mar = c(1, 0, 3, 2))
plotSPC(pedons, color='estimated_ph_h2o', name='hzn_desgn', label = 'pedlabsampnum', cex.id = 1)

plotSPC(pedons, color='moist_soil_color', name='hzn_desgn', label = 'pedlabsampnum', cex.id = 1)

4 Redoximorphic Features

NEW as of 2020-10-01

library(stringr)

# get data + morphology associated with the Drummer soil series
x <- fetchKSSL(series = 'drummer', returnMorphologicData = TRUE)

# convenience copy of the pedon/horizon/rmf data
rmf <- x$morph$rmf

# keep only moist colors
rmf <- rmf[which(rmf$colormoistst == 'Moist'), ]

# convert Munsell colors to hex notation
rmf$color <- with(rmf, 
     munsell2rgb(the_hue = colorhue, the_value = colorvalue, the_chroma = colorchroma)
     )

# quick check: OK
par(mar = c(0, 0, 0, 0))
previewColors(rmf$color)

# sort RMF kind of frequency
tab <- sort(table(rmf$rdxfeatkind), decreasing = TRUE)
knitr::kable(tab)
Var1 Freq
Masses of oxidized iron (Fe+3) accumulation 36
Masses of iron-manganese accumulation 7
Iron-manganese concretions 5
Iron depletions 4
# set levels for later use
rmf$rdxfeatkind <- factor(rmf$rdxfeatkind, levels = names(tab))

# iterate over RMF kind and compute color quantiles
z <- split(rmf, rmf$rdxfeatkind)

cq <- lapply(z, function(i) {
  colorQuantiles(soilColors = i$color)
})

# extract the L1 median colors
L1 <- lapply(cq, '[[', 'L1')
L1 <- do.call('rbind', L1)

# make clean labels
txt <- gsub(pattern = ' accumulation', replacement = '', x = row.names(L1), fixed = TRUE)
chp <- str_split_fixed(string = L1$L1_chip, pattern = '\n', n = 2)
txt <- sprintf("%s\n%s", chp[, 1], txt)

# L1 color chips + annotation
par(mar = c(0, 0, 3, 0))
soilPalette(L1$L1_color, lab = txt)
title('Drummer Redoximorphic Feature Summary\nL1 Median Colors and Closest Munsell Chip')

5 Example Application

Data can be queried by “taxonname” (typically a soil series), using a geographic bounding-box in WGS84 referenced coordinates, or by MLRA. Taxonname matching is case-insensitive and based on the most current taxonname in NASIS. See ?fetchKSSL() or this GitHub repository for details.

In this example, we are downloading KSSL data for the musick, chaix, and holland soils; commonly found on granitic rocks of the Sierra Nevada Mountain region of California.

# load libraries
library(aqp)
library(soilDB)
library(lattice)
library(tactile)
library(maps)

# fetch KSSL data by series name
sn <- c('musick', 'holland', 'chaix')
g <- fetchKSSL(series = sn, progress = FALSE)

# compute weighted-mean particle diameter for later
g$wmpd <- with(horizons(g), ((vcs * 1.5) + (cs * 0.75) + (ms * 0.375) + (fs * 0.175) + (vfs *0.075) + (silt * 0.026) + (clay * 0.0015)) / (vcs + cs + ms + fs + vfs + silt + clay))

# estimate soil depth based on horizon designations
sdc <- getSoilDepthClass(g, name = 'hzn_desgn')

# splice-into site data
site(g) <- sdc

# viz with box-whisker plot
# note variation in letter case
bwplot(taxonname ~ depth, data = as.data.frame(g), par.settings = tactile.theme(), xlab = 'Soil Depth (cm)')

Check for variation in taxonname and normalize accordingly. In this case we can just convert to all lower-case.

# check
table(g$taxonname)
## 
##   Chaix   CHAIX Holland HOLLAND  Musick  MUSICK 
##       8       4      15       6      10       3
# normalize via lower-case
g$taxonname <- tolower(g$taxonname)

# OK
table(g$taxonname)
## 
##   chaix holland  musick 
##      12      21      13

Sometimes it is useful to split a collection into individual objects. Do this now by subset-ing on taxonname.

musick <- subset(g, taxonname == 'musick')
holland <- subset(g, taxonname == 'holland')
chaix <- subset(g, taxonname == 'chaix')
# generate a basemap of northern California, with county boundaries
map('county', 'California', xlim=c(-123.25, -118.75), ylim=c(36.5, 41))
# add long/lat axes
map.axes()
# add locations of musick
points(y ~ x, data=site(musick), pch=21, bg='RoyalBlue')
# add locations of holland
points(y ~ x, data=site(holland), pch=21, bg='DarkRed')
# add locations of chaix
points(y ~ x, data=site(chaix), pch=21, bg='DarkGreen')
# add a simple legend
legend('topright', pch=21, pt.bg=c('RoyalBlue', 'DarkRed', 'DarkGreen'), legend=c('musick', 'holland', 'chaix'), xpd = NA, inset = -0.1, bg 
 = 'white')

5.1 Profile Plots

Generate “thematic” profile sketches by coloring horizons according to clay content, and gouped by taxonname. Truncate sketches at 200cm.

# convert taxonname to a factor for grouping
g$taxonname <- factor(g$taxonname)

# reset margins
par(mar = c(0, 0, 3, 2))

# set all subsequent sketch style via options
options(.aqp.plotSPC.args = list(color = 'clay', print.id = FALSE, name = NA, max.depth = 200, depth.axis = list(cex = 1, line = -4)))

groupedProfilePlot(g, groups = 'taxonname', group.name.cex = 1)

groupedProfilePlot(g, groups = 'taxonname', group.name.cex = 1)

groupedProfilePlot(g, groups = 'taxonname', group.name.cex = 1)

# reset sketch args
options(.aqp.plotSPC.args = NULL)

5.2 Aggregation

The slab() function is used to aggregate selected variables within collections of soil profiles along depth-slices. In this case, we aggregate clay, pH (1:1 H2O, estimated from saturated paste pH when missing), and base saturation at pH 8.2 along 1-cm thick slices and within groups defined by the variable ‘taxonname’. See ?slab() for details on how this function can be used.

g.slab <- slab(g, taxonname ~ clay + estimated_ph_h2o + bs82 + wmpd)

# inspect stacked data structure
str(g.slab)
## 'data.frame':    4452 obs. of  10 variables:
##  $ variable             : Factor w/ 4 levels "clay","estimated_ph_h2o",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ taxonname            : Factor w/ 3 levels "chaix","holland",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ p.q5                 : num  5.65 5.65 5.65 5.65 5.65 ...
##  $ p.q25                : num  9.7 9.7 9.7 9.7 9.7 ...
##  $ p.q50                : num  12.4 12.4 12.4 12.4 12.4 ...
##  $ p.q75                : num  13.8 13.8 13.8 13.8 13.8 ...
##  $ p.q95                : num  19.2 19.2 19.2 19.2 19.2 ...
##  $ contributing_fraction: num  0.524 0.524 0.524 0.524 0.524 ...
##  $ top                  : int  0 1 2 3 4 5 6 7 8 9 ...
##  $ bottom               : int  1 2 3 4 5 6 7 8 9 10 ...

5.3 Refine Labels

It is convenient to know how many pedons there are within each collection–therefore, we append this value to the series name. Using the same approach, we can rename the soil properties with more useful descriptions and units of measure.

# re-name soils with series name + number of pedons-- order is critical !
new.levels <- c('musick', 'holland', 'chaix')
new.labels <- paste(new.levels, ' [', c(length(musick), length(holland), length(chaix)), ' pedons]', sep = '')
g.slab$taxonname <- factor(g.slab$taxonname, levels = new.levels, labels = new.labels)

# new names should match the order in:
levels(g.slab$variable)
## [1] "clay"             "estimated_ph_h2o" "bs82"             "wmpd"
# re-name soil property labels--order is critical !
levels(g.slab$variable) <- c('Clay (%)', 'pH 1:1 H2O', 'Base Saturation at pH 8.2 (%)', 'WMPD (mm)')

5.4 Graphically Compare

The slice-wise median and 25th/75th percentiles are reasonable estimations of central tendency and spread. “Contributing fraction” values (% of pedons with data at a given depth) are printed along the right-hand side of each panel. These values provide both an indication of how deep the soils are, and, how much confidence can be placed in the aggregate data at any given depth.

# define plotting style
tps <- list(superpose.line = list(col = c('RoyalBlue', 'DarkRed', 'DarkGreen'), lwd = 2))

# plot grouped, aggregate data
xyplot(top ~ p.q50 | variable, groups=taxonname, data=g.slab, ylab='Depth',
       xlab='median bounded by 25th and 75th percentiles',
       lower=g.slab$p.q25, upper=g.slab$p.q75, ylim=c(155,-5),
       panel=panel.depth_function, alpha=0.25, sync.colors=TRUE,
       prepanel=prepanel.depth_function,
       cf=g.slab$contributing_fraction,
       par.strip.text=list(cex=0.8),
       strip=strip.custom(bg=grey(0.85)),
       layout=c(4,1), scales=list(x=list(alternating=1, relation='free'), y=list(alternating=3)),
       par.settings=tps,
       auto.key=list(columns=3, lines=TRUE, points=FALSE)
)


This document is based on aqp version 2.0.2 and soilDB version 2.7.8.