Prologue

The soil series concept is a practical grouping of soil/landscape patches that require a unique set of management / conservation practices. The revision of old, establishment of new, and correlation among series requires data. A consistent set of data promotes consistency in the definition and use of series concepts. There are over 24,000 soil series in the US. Do we need them all? Do we need more? Which ones need work? How do we keep track of data and specific exemplars used to define the series? These are all questions that USDA-NRCS soil scientists struggle with. A more clearly defined soil series concept directly impacts our ability to communicate a “package” of soil / landscape parameters to other scientists and, most importantly, to those who work with the soil. Soil survey is much more than making maps and little sketches. It is a systematic catalog of our best attempt at figuring out how soils vary on the landscape, so that we can make the best use of the soil resource. Why this matters is not, but should be obvious: food, fiber, fuel.

Practical Matters

This is rough outline for evalutating potential differences between soil series within the same family, e.g. competing series. For this example, start the process by loading data within the same family as the Drummer series; fine-silty, mixed, superactive, mesic typic endoaquolls.

Why would a soil scientist do this?

Setup

Load relevant packages and define some helper functions.

# you will need the latest version of soilDB
library(soilDB)
library(aqp)
library(sharpshootR)
library(latticeExtra)
library(RColorBrewer)
library(reshape2)
library(cluster)
library(ape)
library(scales)
library(maps)
library(tactile)

Get Data

Get extended summaries for the Drummer soil series. As of soilDB 2.3 this includes a list of “competing” soil series.

soil <- 'DRUMMER'
s <- fetchOSD(soil, extended = TRUE)

Get extended summaries for Drummer and competing series.

# get competing series OSD data
spc <- fetchOSD(c(soil, s$competing$competing))

# this will only work for established series, e.g. those that have been "mapped" somewhere
idx <- which(spc$series_status == 'established')
spc <- spc[idx, ]

# save family taxa and set of series names for later
fm.name <- unique(na.omit(spc$family))
s.names <- unique(site(spc)$id)

First Look

Plot sketches of the soils in this family, data are from OSDs via SoilWeb.

# plot
par(mar=c(0.25,0,1,1))
plotSPC(spc, name.style = 'center-center', width = 0.3, max.depth = 210)
mtext(fm.name, side = 3, at = 0.5, adj = 0, line = -1, font=4)
mtext('source: Official Series Descriptions', side = 1, at = 0.5, adj = 0, line = -1, font=3, cex=1)

Climate Summaries

Compare annual climate summaries using divisive hierarchical clustering of median values.

# get OSD + extended summaries for all competing series
s.competing.data <- fetchOSD(s.names, extended = TRUE)

Evaluate individual climate variables using select percentiles. Soil series labels are sorted according to divisive hierarchical clustering. The two main groupings are clearly visible in those climate varibles linked to mean annual air temperature.

# control color like this
trellis.par.set(plot.line=list(col='RoyalBlue'))

# control centers symbol and size here
res <- vizAnnualClimate(s.competing.data$climate.annual, s=soil, IQR.cex = 1.1, cex=1.1, pch=18)

print(res$fig)

Visualize pair-wise distances using a dendrogram and profile sketches. What do the two main groupings tell us about this family?

par(mar=c(0,0,1,1))
plotProfileDendrogram(s.competing.data$SPC, clust = res$clust, scaling.factor = 0.075, width = 0.2, y.offset = 0.5)
mtext(fm.name, side = 1, at = 0.5, adj = 0, line = -1.5, font=4)
mtext('sorted by annual climate summaries', side = 3, at = 0.5, adj = 0, line = -1.5, font=3)

Monthly Climate Summaries

library(ggplot2)
# reasonable colors for a couple of groups
cols <- brewer.pal(9, 'Set1') 
cols <- cols[c(1:5,7,9)]

idx <- which(s.competing.data$climate.monthly$series %in% c('DRUMMER', 'MARCUS', 'PATTON', 'MAXCREEK') & s.competing.data$climate.monthly$variable == 'Precipitation (mm)')
s.sub <- s.competing.data$climate.monthly[idx, ]

ggplot(s.sub, aes(x = month, group=series)) + 
  geom_ribbon(aes(ymin = q25, ymax = q75, fill=series)) + 
  geom_line(aes(month, q25)) + 
  geom_line(aes(month, q75)) + 
  geom_abline(intercept=0, slope = 0, lty=2) +
  xlab('') + ylab('mm') + 
  ggtitle('Monthly IQR') +
  scale_fill_manual(values=alpha(cols, 0.75)) +
  facet_wrap(vars(variable), scales = 'free_y') +
  theme_bw()

Geomorphic Summaries

Series are sorted according to hierarchical clustering of proportions and relative hydrologic position within an idealized landform (e.g. top to bottom). Most soil series (SSURGO components) are associated with a hillslope position and one or more landform-specific positions: hills, mountain slopes, terraces, and/or flats. Proportions can be interpreted as an aggregate representation of geomorphic membership. The values printed to the left (number of component records) and right (Shannon entropy) of stacked bars can be used to judge the reliability of trends. Shannon entropy values close to 0 represent soil series with relatively consistent geomorphic association, while values close to 1 suggest lack thereof. Source: SSURGO component records.

NCCPI

National Commodity Crop Productivity Index user manual.

nccpi <- s.competing.data$NCCPI
nccpi$series <- factor(nccpi$series, levels = nccpi$series[order(nccpi$nccpi_q50)])

segplot(
  series ~ nccpi_q25 + nccpi_q75, 
  data = nccpi, 
  centers = nccpi_q50, 
  xlab = 'NCCPI: 25th-50th-75th Percentiles',
  sub = 'FY23 SSURGO Snapshot',
  main = fm.name,
  col.symbol = 'black',
  band.height = 0.75,
  par.settings = tactile.theme(),
  scales = list(x = list(tick.number = 10)),
  panel = function(...) {
    panel.grid(-1, -1)
    panel.segplot(...)
  }
)

Color Signatures

Develop a color signature for this family.

# extract a local copy of the SPC
spc <- s.competing.data$SPC

# convert moist Munsell colors -> sRGB
s.rgb <- munsell2rgb(spc$hue, spc$value, spc$chroma, return_triplets = TRUE)

# copy over to hz-level attributes
spc$r <- s.rgb$r
spc$g <- s.rgb$g
spc$b <- s.rgb$b

# get color signature for each series
pig <- soilColorSignature(spc, RescaleLightnessBy = 5, method = 'pam', pam.k = 3)

# move row names over for distance matrix
row.names(pig) <- pig[, 1]
d <- daisy(pig[, -1])
dd <- diana(d)

Hang profile sketches from divisive hierarchical clustering of soil color signatures.

par(mar = c(1,0,1,1))
plotProfileDendrogram(spc, dd, dend.y.scale = max(d) * 2, scaling.factor = 0.28, y.offset = 2, width=0.2, cex.names=0.45)

# annotate with family name
mtext(fm.name, side = 1, at = 0.5, adj = 0, line = -1, font=4)

Re-make, this time as a work of art. Note that the Elvira sketch is missing a color for the Ap horizon due to a minor formatting error in the OSD (10YR2/1, e.g. missing a space). I adjusted the default color used by plotSPC to dark grey (   ) as an act of artistic license.

# adjust margins and default colors
par(mar = c(0,0,0,0), bg = 'black', fg = 'white')

# hang profiles from dendrogram
# note customization via arguments passed to plotSPC
plotProfileDendrogram(spc, dd, dend.y.scale = max(d) * 2, scaling.factor = 0.28, y.offset = 2, width=0.4, cex.names=0.45, dend.color = 'white', dend.width = 2, divide.hz=FALSE, print.id=FALSE, depth.axis = FALSE, name=NA, default.color = grey(0.125), max.depth = 210)

# annotate with family name
mtext(fm.name, side = 1, at = 0.5, adj = 0, line = -1, font=4)

Morphologic Differences

There isn’t much morphological data available in the parsed OSD data delivered by fetchOSD, apart from:

We can add indicators of soil moisture dynamics and potential stratification with a little regular expression applied to horizon designations.

# flag horizons with significant gleying
spc$gley <- factor(grepl('g', spc$hzname))

# flag discontinuities ~ stratifcation or abrupt change in parent material origin
spc$discontinuity <- factor(grepl('^[0-9]', spc$hzname))

Thematic profile sketches illustrate the main differences. Horizon designations are omitted to highlight major differences within the family. Make additional vertial space for annotation by increasing the maximum depth within the figure to 220cm.

par(mar = c(0,0,3,1))
plotSPC(spc, color='gley', col.label='Gleyed Horizons', name='', max.depth = 220)
mtext(fm.name, side = 1, at = 0.5, adj = 0, line = -1.5, font=4)

plotSPC(spc, color='discontinuity', col.label='Discontinuity', name='', max.depth = 220)
mtext(fm.name, side = 1, at = 0.5, adj = 0, line = -1.5, font=4)

plotSPC(spc, color='texture_class', col.label='Texture Class', name='', max.depth = 220)
mtext(fm.name, side = 1, at = 0.5, adj = 0, line = -1.5, font=4)

plotSPC(spc, color='pH_class', col.label='pH Class', name='', max.depth = 220)
mtext(fm.name, side = 1, at = 0.5, adj = 0, line = -1.5, font=4)

It is possible to quantitatively compare how these qualitative (and categorical) data vary with depth using aqp::NCSP(). This algorithm evaluates pair-wise distance between profiles along 1-cm intervals. The resulting distance matrix can be visualized using various ordination or clustering-based approaches. Divisive hierarchical clustering generates reasonable dendrograms, from which we can “hang” some profile sketches.

# compute pair-wise distances using 4 morphologic indicators
# to a maximum depth of 200cm
pd <- NCSP(spc, vars = c('texture_class', 'pH_class', 'gley', 'discontinuity'), maxDepth = 200)

# divisive hierarchical clustering of the distance matrix
dd <- diana(pd)

Sketches made from moist colors, arranged according to select indices of morphology as extracted from the OSDs. Profiles are truncated at 200cm.

par(mar = c(1,0,1,1))
plotProfileDendrogram(spc, dd, dend.y.scale = max(d) * 4, scaling.factor = 0.5, y.offset = 5, width=0.2, cex.names=0.45, max.depth = 200)
mtext(fm.name, side = 1, at = 0.5, adj = 0, line = -1, font = 4)

Thematic sketches, arranged according to select indices of morphology as extracted from the OSDs. Trends are clear

par(mar = c(1,0,3,1))
plotProfileDendrogram(spc, dd, dend.y.scale = max(d) * 4, scaling.factor = 0.5, y.offset = 5, width=0.2, cex.names=0.45, color='texture_class', col.label='Texture Class', max.depth = 200)
mtext(fm.name, side = 1, at = 0.5, adj = 0, line = -1, font=4)

plotProfileDendrogram(spc, dd, dend.y.scale = max(d) * 4, scaling.factor = 0.5, y.offset = 5, width=0.2, cex.names=0.45, color='pH_class', col.label='pH Class', max.depth = 200)
mtext(fm.name, side = 1, at = 0.5, adj = 0, line = -1, font=4)

plotProfileDendrogram(spc, dd, dend.y.scale = max(d) * 4, scaling.factor = 0.5, y.offset = 5, width=0.2, cex.names=0.45, color='gley', col.label='Gleyed Horizons', max.depth = 200)
mtext(fm.name, side = 1, at = 0.5, adj = 0, line = -1, font=4)

plotProfileDendrogram(spc, dd, dend.y.scale = max(d) * 4, scaling.factor = 0.5, y.offset = 5, width=0.2, cex.names=0.45, color='discontinuity', col.label='Discontinuity', max.depth = 200)
mtext(fm.name, side = 1, at = 0.5, adj = 0, line = -1, font=4)

KSSL Data

Get KSSL data from these series via fetchKSSL.

# get associated KSSL data for these series
kssl.data <- fetchKSSL(series = s.names)

# normalize soil series names via pattern matching and conversion to upper case
for(i in s.names) {
  kssl.data$taxonname[grep(i, kssl.data$taxonname, ignore.case = TRUE)] <- toupper(i)
}

Tabulate number of KSSL pedons / series name.

kableExtra::kable_styling(knitr::kable(table(kssl.data$taxonname), format = 'html'), full_width = FALSE)
Var1 Freq
CHALMERS 11
DOLBEE 1
DRUMMER 41
DUNHAM 4
ELPASO 2
ELVIRA 2
GARWIN 2
HARTSBURG 8
MADELIA 3
MARCUS 1
MAXCREEK 1
MAXFIELD 4
OSSIAN 10
PATTON 35
PELLA 23
SABLE 50

Aggregate over all data within this family via slab, and display median bounded by inter-quartile range.

# aggregate over entire collection, marginal quantiles of select properties
g.slab <- slab(kssl.data, ~ clay + sand + estimated_ph_h2o + cec7 + bs82)
levels(g.slab$variable) <- c('Clay (%)', 'Sand (%)', 'pH 1:1 H2O', 'CEC at pH 7 (cmol[+]/kg)' ,'Base Saturation at pH 8.2 (%)')

# define plotting style
tps <- list(superpose.line=list(col=c('RoyalBlue', 'DarkRed', 'DarkGreen'), lwd=2))

xyplot(top ~ p.q50 | variable, data=g.slab, ylab='Depth', main=fm.name, sub='source: KSSL',
       xlab='median bounded by 25th and 75th percentiles',
       lower=g.slab$p.q25, upper=g.slab$p.q75, ylim=c(205,-5),
       panel=panel.depth_function, alpha=0.25, sync.colors=TRUE,
       prepanel=prepanel.depth_function,
       cf=g.slab$contributing_fraction,
       par.strip.text=list(cex=0.8),
       strip=strip.custom(bg=grey(0.85)),
       layout=c(5,1), scales=list(x=list(alternating=1, relation='free'), y=list(alternating=3)),
       par.settings=tps,
       auto.key=list(columns=3, lines=TRUE, points=FALSE)
)

Aggregate just Drummer data and compare with entire family. Note that Drummer records make up about 20% of the data within this family.

# compare a single profile to the group-level aggregate values
a <- slab(kssl.data[which(kssl.data$taxonname == soil), ], ~ clay + sand + estimated_ph_h2o + cec7 + bs82)
levels(a$variable) <- c('Clay (%)', 'Sand (%)', 'pH 1:1 H2O', 'CEC at pH 7 (cmol[+]/kg)' ,'Base Saturation at pH 8.2 (%)')


# manually update the group column
a$group <- soil
# a$p.q25 <- NA
# a$p.q75 <- NA
g.slab$group <- 'ALL'

# combine into a single data.frame:
g <- rbind(g.slab, a)
g$group <- factor(g$group)

xyplot(top ~ p.q50 | variable, groups=group, data=g, ylab='Depth', main=fm.name,
       xlab='median bounded by 25th and 75th percentiles', sub='source: KSSL' ,
       lower=g$p.q25, upper=g$p.q75, ylim=c(205,-5),
       panel=panel.depth_function, alpha=0.25, sync.colors=TRUE,
       prepanel=prepanel.depth_function,
       # cf=g$contributing_fraction,
       par.strip.text=list(cex=0.8),
       strip=strip.custom(bg=grey(0.85)),
       layout=c(5,1), scales=list(x=list(alternating=1, relation='free'), y=list(alternating=3)),
       par.settings=tps,
       auto.key=list(columns=2, lines=TRUE, points=FALSE)
)

Investigate Spatial Patterns

Finish this

Make a rough sketch of the joint extent of the series in this family.

# define some nice colors
cols <- brewer.pal('Set1', n=3)

## TODO: split into groups identified above
# cutree(s.ca.h, 2)

# get list of extents
l <- lapply(unique(spc$id), seriesExtent)

# map of CONUS
par(mar=c(1 ,1,1,1))
map('state')

# rough estimation of joint density via transparency  / overlay
lapply(l, function(i) {
  plot(i, border=NA, col=alpha(cols[2], 0.33), add=TRUE)
  })

# add queried series
idx <- match(soil, sapply(l, function(i) i$series[1]))
plot(l[[idx]], border=NA, col=alpha(cols[1], 0.5), add=TRUE)

# legend
legend('topright', legend=soil, col=alpha(cols[1], 0.5), pch=15, bty='n')

# finish map
box()
title(main=fm.name, line=1.25)

MLRA and Parent Material Summaries

This needs more work.

Parent Material

Quickly tabulate parent material kind.

kableExtra::kable_styling(kable(sort(table(s.competing.data$pmkind$pmkind), decreasing = TRUE), format = 'html'))
Var1 Freq
Loess 17
Lacustrine deposits 9
Till 8
Alluvium 7
Glaciolacustrine deposits 5
Outwash 5
Glaciofluvial deposits 4
Subglacial till 4
Residuum 3
Eolian deposits 2
Drift 1
Eolian sands 1
Slope alluvium 1
Supraglacial till 1

Quickly tabulate parent material origin.

kable(sort(table(s.competing.data$pmorigin$pmorigin), decreasing = TRUE), format = 'html')
Var1 Freq
Calcareous shale 2
Dolomite 1
Limestone 1
Limestone and dolomite 1

MLRA

Summarize acreage by MLRA.

plyr::ddply(s.competing.data$mlra, 'mlra', .fun=plyr::summarize, total_ac=sum(area_ac))

This document is based on aqp version 2.0 and soilDB version 2.7.8.