An Example

Consider this situation: you have a collection of pedons that have been correlated to a named soil series (or component) and would like to objectively compute a range in characteristics ("low-rv-high" values) and horizon depths. As with most collections of pedon data, there may be considerable variation in description style and horizons used, horizon depths, and number of horizons described:

alt text

alt text

In this scenario, there are several obvious “micro-correlation” decisions that need to be made before horizons can be grouped for aggregation. For example, what horizonation prototype scheme (e.g., A-Bt1-Bt2-Bt3-Cr-R) best conveys the concept of this soil series or soil component? Does it make sense to group [Bt3, Bt4, BCt, CBt] horizons for aggregation? Or what about grouping [Bt3, 2Bt3] horizons? Do [BA, AB] horizons occur frequently enough to be included in the horizonation prototype?

Based on your knowledge of the area, pedon 2 might be a good "typical" pedon to use in developing a horizonation prototype. After careful review of the data and consultation with your crew, a new set of labels are assigned to each horizon (red labels in figure above) that define groups over which soil properties will be aggregated. These new labels define functionally similar groups that may span multiple genetic horizons.

Formalized Approach

This document describes an aggregation strategy based on the use of "generalized horizon labels" (GHL) through a combination of narrative, R code, and figures. You can follow along in an R session by copying code from the grey boxes and pasting it into the R console.

Here is a basic outline of the process:

  1. Select a set of GHL that best represents a group of pedons to be aggregated. This could be based on series descriptions, expert knowledge, or even inspection of the most frequently occurring horizon designations.

  2. Assign GHL to each horizon using whatever information is available for grouping horizons. This micro-correlation of horizon designations will likely require slightly different "rules" in each instance. Careful inspection of horizon designation and observed properties is critical.

  3. Evaluate GHL assignments and manually refine as needed.

  4. Keep track of final GHL assignments in NASIS or text file.

  5. Estimate a most likely horizonation e.g., top and bottom depths for each generalized horizon label.

  6. Compute range in characteristics, aka low-rv-high values, for clay, sand, pH, etc. over GHL. (next document in this series)

Setup R Envionment

If you have never used the aqp or soildb packages before, you will likely need to install them. This only needs to be done once.

# stable version from CRAN + dependencies
install.packages('ape', dep=TRUE) 
install.packages('latticeExtra', dep=TRUE)
install.packages('plyr', dep=TRUE) 
install.packages('aqp', dep=TRUE) 
install.packages('soilDB', dep=TRUE)
install.packages('sharpshootR', dep=TRUE)

Once you have all of the R packages on which this document depends, it is a good idea to load them. R packages must be installed anytime you change versions of R (e.g., after an upgrade) and loaded anytime you want to access functions from within those packages.


Sample Data

While the methods outlined in this document can be applied to any collection of pedons, it is convenient to work with a standardized set of data. You can follow along with the analysis by copying code from the following blocks and running it in your R session. The sample data used in this document is based on 15 soil profiles that have been correlated to the Loafercreek soil series from the Sierra Nevada Foothill Region of California. Note that the internal structure of the loafercreek data is identical to the structure returned by fetchNASIS() from the soilDB package. All horizon-level values are pulled from the pedon horizon table of the pedons being analyzed.

# load sample data from the soilDB package
data(loafercreek, package = 'soilDB')
# keep only the first 15 pedons
pedons <- loafercreek[1:15, ]
# plot profile sketches
plot(pedons, name='hzname',, cex.names=0.75, axis.line.offset=-4)

15 pedons correlated to the Loafercreek soil series.

Optional: Follow Along with Your Data

The following code block demonstrates how to pull data in using the fetchNASIS() function from the soilDB package.

# first load the desired data set within NASIS into your NASIS selected set
# then load data from the NASIS selected set into R
pedons <- fetchNASIS(from='pedons')
# optionally subset the data by taxon name - enter your taxon name
pedons <- pedons[grep(pattern='ENTER_YOUR_TAXON_NAME', pedons$taxonname,, ]


Selection of Generalized Horizon Labels

Generalized horizon labels represent an expert-guided selection of designations that were consistently observed in the field and are meaningful in terms of soil morphology and management. The very first step in this process is to tabulate the number of times each horizon designation occurs.

sort(table(pedons$hzname), decreasing=TRUE)
##    A  Bt1  Bt2  Bt3   Cr    R  BCt  Crt   Oi  BAt   Bw 2Bt3  2Cr   BA   Bt 2BCt 2Bt4  Bt4  CBt 
##   15   14   14    9    9    9    5    4    4    3    3    2    2    2    2    1    1    1    1

In this case, [A, Bt1, Bt2, Bt3] horizon designations appear to be a good starting point. However, relying on field experience and expert knowledge of these soils, cross-checking against the OSD, and talking with other soil scientists may be more important than the previous tabulation.

Constructing a graph of transitions from one horizon to the next ("transition probabilities") can highlight those horizon designations that are most commonly used together. A follow-up tutorial on transition probability matrix interpretation is planned.

tp <- hzTransitionProbabilities(pedons, 'hzname')
plotSoilRelationGraph(tp, graph.mode = 'directed', edge.arrow.size=0.5, edge.scaling.factor=2, vertex.label.cex=0.75,'sans')

Graph constructed from transition probabilities.

A quick summary of horizon depth mid-points (e.g., average depth of horizon) can help in organizing the various designations and possibly give some clues as to how they can be grouped. The following plot is called a box and whisker plot.

# compute horizon mid-points
pedons$mid <- with(horizons(pedons), (hzdept + hzdepb) / 2)

# sort horizon designation by group-wise median values <- names(sort(tapply(pedons$mid, pedons$hzname, median)))

# plot the distribution of horizon mid-points by designation
bwplot(mid ~ factor(hzname,, 
       ylim=c(155, -5), ylab='Horizon Mid-Point Depth (cm)', 
       panel=function(...) {
  panel.abline(h=seq(0, 140, by=10), v=1:length(, col=grey(0.8), lty=3)

Range in horizon depth mid-point for original horizon designations.

Next, a similar summary of soil properties (clay content and total rock fragment volume) is presented. The goal is to determine which horizons designations can be grouped and which generalized horizon labels to assign to each group.

# box and wisker plot by clay content
bwplot(clay ~ factor(hzname,, 
       ylab='Clay Content (%)', 
       panel=function(...) {
  panel.abline(h=seq(0, 100, by=5), v=1:length(, col=grey(0.8), lty=3)

Range in clay content for original horizon designations.

# box and wisker plot by total rock fragment volume
bwplot(total_frags_pct ~ factor(hzname,, 
       ylab='Total Rock Fragment Volume (%)', 
       panel=function(...) {
  panel.abline(h=seq(0, 100, by=10), v=1:length(, col=grey(0.8), lty=3)

Range in total rock fragment content for original horizon designations.

Sometimes looking at thematic soil profile sketches can be informative.

# color horizons by clay content
plot(pedons, name='hzname',, cex.names=0.75, axis.line.offset=-4, color='clay')

Horizon colors are based on clay content.

Assignment of Generalized Horizon Labels

Once a set of generalized horizon labels have been determined, a corresponding set of regular expression (REGEX) rules are developed to convert field-described designations into GHL. Pattern matching with REGEX will typically assign useful GHL; however, there will always be cases where manual intervention is required. More on that will follow.

From the above analysis and the OSD, it seems like the following sequence of GHL is appropriate: (A, Bt1, Bt2, Bt3, Cr, R) -- an A horizon, followed by 3 Bt horizons, then Cr, and finally R. For each GHL, a corresponding REGEX rule is needed. For example, '^A$|Ad|Ap' will match 'A', 'Ad', and 'Ap'.

# save our GHL
n <- c('A','Bt1','Bt2','Bt3','Cr','R')
# REGEX rules
p <- c('^A$|Ad|Ap',

Apply GHL pattern-matching rules, save to a new column called genhz, and cross-tabulate the occurrence of GHL and original designations.

pedons$genhz <- generalize.hz(pedons$hzname, n, p)
# cross-tabulate original horizon designations and GHL
tab <- table(pedons$genhz, pedons$hzname)
##            2BCt 2Bt3 2Bt4 2Cr   A  BA BAt BCt  Bt Bt1 Bt2 Bt3 Bt4  Bw CBt  Cr Crt  Oi   R Sum
##   A           0    0    0   0  15   0   0   0   0   0   0   0   0   0   0   0   0   0   0  15
##   Bt1         0    0    0   0   0   0   0   0   0  14   0   0   0   0   0   0   0   0   0  14
##   Bt2         0    0    0   0   0   0   0   0   0   0  14   0   0   0   0   0   0   0   0  14
##   Bt3         1    2    1   0   0   0   0   5   0   0   0   9   1   0   1   0   0   0   0  20
##   Cr          0    0    0   2   0   0   0   0   0   0   0   0   0   0   0   9   4   0   0  15
##   R           0    0    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   9   9
##   not-used    0    0    0   0   0   2   3   0   2   0   0   0   0   3   0   0   0   4   0  14
##   Sum         1    2    1   2  15   2   3   5   2  14  14   9   1   3   1   9   4   4   9 101

In the above cross-tabulation, we can see that a couple of original designations were not matched (not-used in the table) by our REGEX rules, namely, BA, Bw, and Oi horizons. For this example, we are going to assume that those horizons are not common enough for inclusion in our set of GHL.

It is also possible to display GHL assignment as a network graph.

# convert contingency table -> adj. matrix
m <- genhzTableToAdjMat(tab)
# plot using a function from the sharpshootR package
plotSoilRelationGraph(m, graph.mode = 'directed', edge.arrow.size=0.5)

Evaluation of Generalized Horizon Labels

An initial evaluation of our GHL assignment can be accomplished by plotting profile sketches with horizons colored by GHL. The result looks correct, but further investigation may be warranted.

# make a palette of colors, last color is for not-used class
cols <- c(grey(0.33), 'orange', 'orangered', 'chocolate', 'green', 'blue', 'yellow')
# assign a color to each generalized horizon label
hz.names <- levels(pedons$genhz)
pedons$genhz.soil_color <- cols[match(pedons$genhz, hz.names)]
# plot generalized horizons via color and add a legend
plot(pedons, name='hzname',, cex.names=0.75, axis.line.offset=-4, color='genhz.soil_color')
legend('bottomleft', legend=hz.names,, pch=22, bty='n', cex=1)

Horizon colors are based on assigned GHL.

A box and whisker plot of the depth-ranges for each of the generalized horizon labels helps to visualize the degree of overlap. Slicing the collection of profiles along 1-cm intervals generates a more precise figure.

# slice profile collection from 0-150 cm
s <- slice(pedons, 0:150 ~ genhz)
# convert horizon name back to factor, using original levels
s$genhz <- factor(s$genhz, levels = hz.names)
# plot depth-ranges of generalized horizon slices
bwplot(hzdept ~ genhz, data=horizons(s), 
       ylim=c(155, -5), ylab='Generalized Horizon Depth (cm)', 
       scales=list(y=list(tick.number=10)), asp=1, 
       panel=function(...) {
          panel.abline(h=seq(0, 140, by=10), v=1:length(hz.names),col=grey(0.8), lty=3)