Consider this situation: you have a collection of pedons that have been correlated to a named soil series (or component) and would like to objectively compute a range in characteristics (“low-rv-high” values) and horizon depths. As with most collections of pedon data, there may be considerable variation in description style and horizons used, horizon depths, and number of horizons described:
In this scenario, there are several obvious “micro-correlation” decisions that need to be made before horizons can be grouped for aggregation. For example, what horizonation prototype scheme (e.g., A-Bt1-Bt2-Bt3-Cr-R) best conveys the concept of this soil series or soil component? Does it make sense to group [Bt3, Bt4, BCt, CBt] horizons for aggregation? Or what about grouping [Bt3, 2Bt3] horizons? Do [BA, AB] horizons occur frequently enough to be included in the horizonation prototype?
Based on your knowledge of the area, pedon 2 might be a good “typical” pedon to use in developing a horizonation prototype. After careful review of the data and consultation with your crew, a new set of labels are assigned to each horizon (red labels in figure above) that define groups over which soil properties will be aggregated. These new labels define functionally similar groups that may span multiple genetic horizons.
This document describes an aggregation strategy based on the use of “generalized horizon labels” (GHL) through a combination of narrative, R code, and figures. You can follow along in an R session by copying code from the grey boxes and pasting it into the R console.
Here is a basic outline of the process:
Select a set of GHL that best represents a group of pedons to be aggregated. This could be based on series descriptions, expert knowledge, or even inspection of the most frequently occurring horizon designations.
Assign GHL to each horizon using whatever information is available for grouping horizons. This micro-correlation of horizon designations will likely require slightly different “rules” in each instance. Careful inspection of horizon designation and observed properties is critical.
Evaluate GHL assignments and manually refine as needed.
Keep track of final GHL assignments in NASIS or text file.
Estimate a most likely horizonation e.g., top and bottom depths for each generalized horizon label.
Compute range in characteristics, aka low-rv-high values, for clay, sand, pH, etc. over GHL. (next document in this series)
If you have never used the aqp or soildb packages before, you will likely need to install them. This only needs to be done once.
# stable version from CRAN + dependencies
install.packages('ape', dep=TRUE)
install.packages('latticeExtra', dep=TRUE)
install.packages('plyr', dep=TRUE)
install.packages('aqp', dep=TRUE)
install.packages('soilDB', dep=TRUE)
install.packages('sharpshootR', dep=TRUE)
Once you have all of the R packages on which this document depends, it is a good idea to load them. R packages must be installed anytime you change versions of R (e.g., after an upgrade) and loaded anytime you want to access functions from within those packages.
library(aqp)
library(soilDB)
library(sharpshootR)
library(igraph)
library(ape)
library(latticeExtra)
library(lattice)
library(cluster)
library(MASS)
While the methods outlined in this document can be applied to any
collection of pedons, it is convenient to work with a standardized set
of data. You can follow along with the analysis by copying code from the
following blocks and running it in your R session. The
sample data used in this document is based on 15 soil profiles that have
been correlated to the Loafercreek
soil series from the Sierra Nevada Foothill Region of California. Note
that the internal structure of the loafercreek
data is
identical to the structure returned by fetchNASIS()
from the soilDB package. All horizon-level values are pulled from
the pedon horizon table of the pedons being analyzed.
# load sample data from the soilDB package
data(loafercreek, package = 'soilDB')
# keep only the first n pedons
pedons <- loafercreek[1:20, ]
# tighter margins
par(mar = c(0, 0, 0, 1))
# profile sketches with customization of style
plotSPC(pedons, print.id = FALSE, max.depth = 150, cex.names = 0.66, width = 0.35, name.style = 'center-center', depth.axis = list(style = 'compact', line = -2))
The following code block demonstrates how to pull data in using the
fetchNASIS()
function from the soilDB package.
# first load the desired data set within NASIS into your NASIS selected set
# then load data from the NASIS selected set into R
pedons <- fetchNASIS(from='pedons')
# optionally subset the data by taxon name - enter your taxon name
pedons <- pedons[grep(pattern='ENTER_YOUR_TAXON_NAME', pedons$taxonname, ignore.case=TRUE), ]
Generalized horizon labels represent an expert-guided selection of designations that were consistently observed in the field and are meaningful in terms of soil morphology and management. The very first step in this process is to tabulate the number of times each horizon designation occurs.
sort(table(pedons$hzname), decreasing=TRUE)
##
## A Bt1 Bt2 BA Cr R Bt3 Bw 2Bt2 A1 A2 BAt BCt Bt 2Bt1 2Bt3 2R Bt4 C 2BC
## 16 15 14 10 9 8 7 4 3 3 3 3 3 3 2 2 2 2 2 1
## 2BCt 2Cr 2Crt BC Bw1 Bw2 Crt H1 Oi
## 1 1 1 1 1 1 1 1 1
In this case, [A, Bt1, Bt2, BA] horizon designations appear to be a good starting point. However, relying on field experience and expert knowledge of these soils, cross-checking against the OSD, and talking with other soil scientists may be more important than the previous tabulation.
Constructing a graph of transitions from one horizon to the next (“transition probabilities”) can highlight those horizon designations that are most commonly used together. A follow-up tutorial on transition probability matrix interpretation is planned.
tp <- hzTransitionProbabilities(pedons, 'hzname')
par(mar = c(1, 1, 1, 1))
plotSoilRelationGraph(tp, graph.mode = 'directed', edge.arrow.size = 0.5, edge.scaling.factor = 2, vertex.label.cex = 0.75, vertex.label.family = 'sans')
A quick summary of horizon depth mid-points (e.g., average depth of horizon) can help in organizing the various designations and possibly give some clues as to how they can be grouped. The following plot is called a box and whisker plot.
# compute horizon mid-points
pedons$mid <- with(horizons(pedons), (hzdept + hzdepb) / 2)
# sort horizon designation by group-wise median values
hz.designation.by.median.depths <- names(sort(tapply(pedons$mid, pedons$hzname, median)))
# plot the distribution of horizon mid-points by designation
bwplot(mid ~ factor(hzname, levels=hz.designation.by.median.depths),
data=horizons(pedons),
ylim=c(155, -5), ylab='Horizon Mid-Point Depth (cm)',
scales=list(y=list(tick.number=10)),
panel=function(...) {
panel.abline(h=seq(0, 140, by=10), v=1:length(hz.designation.by.median.depths), col=grey(0.8), lty=3)
panel.bwplot(...)
})
Next, a similar summary of soil properties (clay content and total rock fragment volume) is presented. The goal is to determine which horizons designations can be grouped and which generalized horizon labels to assign to each group.
# box and wisker plot by clay content
bwplot(clay ~ factor(hzname, levels=hz.designation.by.median.depths),
data=horizons(pedons),
ylab='Clay Content (%)',
scales=list(y=list(tick.number=10)),
panel=function(...) {
panel.abline(h=seq(0, 100, by=5), v=1:length(hz.designation.by.median.depths), col=grey(0.8), lty=3)
panel.bwplot(...)
})
# box and wisker plot by total rock fragment volume
bwplot(total_frags_pct ~ factor(hzname, levels=hz.designation.by.median.depths),
data=horizons(pedons),
ylab='Total Rock Fragment Volume (%)',
scales=list(y=list(tick.number=10)),
panel=function(...) {
panel.abline(h=seq(0, 100, by=10), v=1:length(hz.designation.by.median.depths), col=grey(0.8), lty=3)
panel.bwplot(...)
})
Sometimes looking at thematic soil profile sketches can be informative.
# color horizons by clay content
par(mar = c(0, 0, 3, 1))
plotSPC(pedons, print.id = FALSE, max.depth = 150, cex.names = 0.66, width = 0.35, name.style = 'center-center', depth.axis = list(style = 'compact', line = -2), color='clay')
Once a set of generalized horizon labels have been determined, a corresponding set of regular expression (REGEX) rules are developed to convert field-described designations into GHL. Pattern matching with REGEX will typically assign useful GHL; however, there will always be cases where manual intervention is required. More on that will follow.
From the above analysis and the OSD,
it seems like the following sequence of GHL is appropriate: (A, Bt1,
Bt2, Bt3, Cr, R) – an A horizon, followed by 3 Bt horizons, then Cr, and
finally R. For each GHL, a corresponding REGEX rule is needed. For
example, '^A$|Ad|Ap'
will match ‘A’, ‘Ad’, and ‘Ap’.
# save our GHL
n <- c('A','Bt1','Bt2','Bt3','Cr','R')
# REGEX rules
p <- c('^A$|Ad|Ap|A1|A2',
'Bt1$|^Bt$',
'^Bt2$',
'^Bt3|^Bt4|CBt$|BCt$|2Bt|2CB$|^C$|2BC',
'Cr',
'R')
Apply GHL pattern-matching rules, save to a new column called
genhz
, and cross-tabulate the occurrence of GHL and
original designations.
pedons$genhz <- generalize.hz(pedons$hzname, n, p)
# cross-tabulate original horizon designations and GHL
tab <- table(pedons$genhz, pedons$hzname)
addmargins(tab)
##
## 2BC 2BCt 2Bt1 2Bt2 2Bt3 2Cr 2Crt 2R A A1 A2 BA BAt BC BCt Bt Bt1 Bt2 Bt3 Bt4 Bw
## A 0 0 0 0 0 0 0 0 16 3 3 0 0 0 0 0 0 0 0 0 0
## Bt1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 15 0 0 0 0
## Bt2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 0 0 0
## Bt3 1 1 2 3 2 0 0 0 0 0 0 0 0 0 3 0 0 0 7 2 0
## Cr 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## R 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0
## not-used 0 0 0 0 0 0 0 0 0 0 0 10 3 1 0 0 0 0 0 0 4
## Sum 1 1 2 3 2 1 1 2 16 3 3 10 3 1 3 3 15 14 7 2 4
##
## Bw1 Bw2 C Cr Crt H1 Oi R Sum
## A 0 0 0 0 0 0 0 0 22
## Bt1 0 0 0 0 0 0 0 0 18
## Bt2 0 0 0 0 0 0 0 0 14
## Bt3 0 0 2 0 0 0 0 0 23
## Cr 0 0 0 9 1 0 0 0 12
## R 0 0 0 0 0 0 0 8 10
## not-used 1 1 0 0 0 1 1 0 22
## Sum 1 1 2 9 1 1 1 8 121
In the above cross-tabulation, we can see that a couple of original
designations were not matched (not-used
in the table) by
our REGEX rules, namely, BA, Bw, and Oi horizons. For this example, we
are going to assume that those horizons are not common enough for
inclusion in our set of GHL.
It is also possible to display GHL assignment as a network graph.
# convert contingency table -> adj. matrix
m <- genhzTableToAdjMat(tab)
# plot using a function from the sharpshootR package
par(mar = c(1,1,1,1))
plotSoilRelationGraph(m, graph.mode = 'directed', edge.arrow.size = 0.5)
An initial evaluation of our GHL assignment can be accomplished by plotting profile sketches with horizons colored by GHL. The result looks correct, but further investigation may be warranted.
# make a palette of colors, last color is for not-used class
cols <- c(grey(0.33), 'orange', 'orangered', 'chocolate', 'green', 'blue', 'yellow')
# assign a color to each generalized horizon label
hz.names <- levels(pedons$genhz)
pedons$genhz.soil_color <- cols[match(pedons$genhz, hz.names)]
# plot generalized horizons via color and add a legend
par(mar = c(2,1,0,2))
plotSPC(pedons, print.id = FALSE, max.depth = 150, cex.names = 0.66, width = 0.35, name.style = 'center-center', depth.axis = list(style = 'compact', line = -2), color = 'genhz.soil_color')
legend('bottomleft', legend=hz.names, pt.bg=c(cols), pch=22, bty='n', cex=1)
A box and whisker plot of the depth-ranges for each of the generalized horizon labels helps to visualize the degree of overlap. Slicing the collection of profiles along 1-cm intervals generates a more precise figure.
# slice profile collection from 0-150 cm
s <- dice(pedons, 0:150 ~ genhz)
# convert horizon name back to factor, using original levels
s$genhz <- factor(s$genhz, levels = hz.names)
# plot depth-ranges of generalized horizon slices
bwplot(hzdept ~ genhz, data=horizons(s),
ylim=c(155, -5), ylab='Generalized Horizon Depth (cm)',
scales=list(y=list(tick.number=10)), asp=1,
panel=function(...) {
panel.abline(h=seq(0, 140, by=10), v=1:length(hz.names),col=grey(0.8), lty=3)
panel.bwplot(...)
}
)
The evaluation of generalized horizon labels typically requires a review of more information than field-described horizon labels and depth. For example, clay content, sand content, pH, total rock fragment volume, and horizon mid-points are soil properties that can be used to differentiate horizons– as long as the data are populated. In this document, clay content, total rock fragment volume, and horizon mid-points are used. Multivariate comparisons are commonly based on the concept of pair-wise dissimilarity and subsequent reduction of the resulting distance matrix into a simpler representation. Non-metric multidimensional scaling (nMDS) is one method for reducing the distance matrix into a new set of coordinates in two-dimensional space. A related document explores this idea further.
# store the column names of our variables of interest
vars <- c('clay', 'mid', 'total_frags_pct')
# result is a list of several items
hz.eval <- evalGenHZ(pedons, 'genhz', vars)
# extract MDS coords
pedons$mds.1 <- hz.eval$horizons$mds.1
pedons$mds.2 <- hz.eval$horizons$mds.2
# extract silhouette widths and neighbor
pedons$sil.width <- hz.eval$horizons$sil.width
pedons$neighbor <- hz.eval$horizons$neighbor
In the following figure, generalized horizon labels are plotted at nMDS coordinates as colored circles, along with original horizon designations and pedon IDs. Ideally, groupings of GHL should form relatively homogeneous clusters in this figure; however, there will always be some overlap at the edges. Heterogeneous regions should be inspected in cases where the assigned GHL may not make sense. In this example, the original designation of “Bt3” for pedon ID 09CKS036 does not seem to fit into the “Bt3” generalized horizon. According to the soil properties used (clay content, total rock fragments, horizon mid-point), the horizon may better fit the “Bt2” generalized horizon. In addition, it appears that “Bw” and “BA” horizons can be included into the “A” generalized horizon. Decisions on issues such as this can only be made based on field experience or at least some level of expert knowledge about the soils in question. Once a decision has been made, additions to the GHL rules (e.g. adding “Bw” and “BA” to the “A” GHL) and possibly manual adjustment can be used to accomodate the changes.
# convert pedons to a data.frame
pedons.df <- as(pedons, 'data.frame')
# plot generalized horizon labels at MDS coordinates
mdsplot <- xyplot(mds.2 ~ mds.1, groups=genhz, data=pedons.df,
xlab='', ylab='', aspect=1,
scales=list(draw=FALSE),
auto.key=list(columns=length(levels(pedons.df$genhz))),
par.settings=list(
superpose.symbol=list(pch=16, cex=3, alpha=0.5)
)
)
# annotate with original hzname and pedon ID
mdsplot +
layer(panel.abline(h=0, v=0, col='grey', lty=3)) +
layer(panel.text(pedons.df$mds.1, pedons.df$mds.2, pedons.df$hzname, cex=0.85, font=2, pos=3)) +
layer(panel.text(pedons.df$mds.1, pedons.df$mds.2, pedons.df$pedon_id, cex=0.55, font=1, pos=1))
The silhouette width metric is a useful indicator of how consistently group labels partition observations in a dissimilarity matrix. Silhouette widths closer to 1 indicate excellent partitioning, while values closer to 0 indicate less effective partitioning. Silhouette widths less than 0 are often associated with inappropriate label assignment. A sketch of profiles with horizons colored by silhouette width can indicate possible horizons with inappropriate GHL assignment or a pedon that does not fit within the concept of Loafercreek. For example, many of the horizons associated with pedon ID 09CKS036 do not appear to fit within the collection of soil properties associated with generalized horizon labels “Bt1”, “Bt2”, and “Bt3”. Keep in mind that this evaluation is only based on three soil properties– expert judgement will be required for a final assignment of GHL.
# plot silhouette width metric
par(mar = c(0, 0, 3, 1))
plot(pedons, label='pedon_id', cex.id = 0.5, max.depth = 150, cex.names = 0.66, width = 0.35, name.style = 'center-center', depth.axis = list(style = 'compact', line = -2), color='sil.width')
The following table contains information on those horizons with silhouette widths less than 0. The “neighbor” column contains the next closest GHL (in terms of the soil properties used in the pair-wise dissimilarity calculation), which may be more appropriate to use.
# index those horizons with silhouette widths less than 0
check.idx <- which(pedons.df$sil.width < 0)
# sort this index based on min sil.width
check.idx.sorted <- check.idx[order(pedons.df$sil.width[check.idx])]
# list those pedons/horizons that may need some further investigation
pedons.df[check.idx.sorted, c('peiid', 'pedon_id', 'hzname', 'genhz', 'neighbor', 'sil.width', vars)]
## peiid pedon_id hzname genhz neighbor sil.width clay mid total_frags_pct
## 5 115595 300J Bt2 Bt2 Bt3 -0.304496846 35 62.5 0
## 72 338022 07RJV089 Bt2 Bt2 Bt1 -0.260129476 25 38.0 10
## 65 338021 07RJV086 Bt2 Bt2 Bt1 -0.250695590 22 29.0 30
## 60 307963 07SKC014 Bt Bt1 Bt3 -0.243816842 29 66.0 8
## 85 338024 07RJV105 Bt1 Bt1 A -0.232022190 20 24.0 5
## 86 338024 07RJV105 Bt2 Bt2 Bt1 -0.226833882 20 38.0 25
## 66 338021 07RJV086 Bt3 Bt3 Bt2 -0.206187598 25 44.5 35
## 16 115827 S2000CA007012 Bt1 Bt1 A -0.196307412 20 22.5 10
## 55 307961 07SKC016 Bt2 Bt2 Bt3 -0.191380187 32 54.5 15
## 37 268820 07SKC003 Bt2 Bt2 Bt1 -0.174281799 32 28.0 5
## 10 115819 S2000CA007009 Bt2 Bt2 Bt1 -0.163204476 29 39.5 5
## 112 338028 07RJV115 Bt2 Bt2 Bt3 -0.156297021 32 43.0 5
## 73 338022 07RJV089 Bt3 Bt3 Bt2 -0.140498159 25 52.0 25
## 118 338029 07RJV119 2Bt1 Bt3 Bt1 -0.127800749 32 32.0 10
## 71 338022 07RJV089 Bt1 Bt1 A -0.121385623 22 24.0 5
## 105 338027 07RJV111 Bt2 Bt2 Bt1 -0.109751782 32 35.5 5
## 120 338029 07RJV119 2BC Bt3 Bt2 -0.097935655 30 57.0 40
## 36 268820 07SKC003 Bt1 Bt1 A -0.090766960 19 16.5 24
## 27 207242 S2007CA109005 BCt Bt3 Bt2 -0.090679589 16 96.5 45
## 9 115819 S2000CA007009 Bt1 Bt1 A -0.085728747 25 19.0 5
## 74 338022 07RJV089 Bt4 Bt3 Bt2 -0.085165952 30 66.0 45
## 17 115827 S2000CA007012 Bt2 Bt2 Bt1 -0.056866078 28 44.0 25
## 39 268820 07SKC003 2BCt Bt3 Bt2 -0.042981244 28 56.0 23
## 104 338027 07RJV111 Bt1 Bt1 Bt2 -0.037714883 30 20.0 50
## 26 207242 S2007CA109005 Bt2 Bt2 Bt3 -0.033069807 18 89.0 20
## 111 338028 07RJV115 Bt1 Bt1 Bt2 -0.024421743 25 25.5 50
## 80 338023 07RJV096 Bt2 Bt2 Bt1 -0.013245810 25 40.5 50
## 25 207242 S2007CA109005 Bt1 Bt1 Bt2 -0.007499182 18 66.0 15
The folowing table contains mean and standard deviations (values in parenthesis) of soil properties (clay content, total rock fragments, horizon mid-point) and silhouette widths summarized by GHL. Generalized horizons with the largest silhouette widths are the most internally consistent, while those with silhouette widths close to 0 are the least consistent.
hz.eval$stats
## genhz clay mid total_frags_pct sil.width
## 1 A 16.19 (3.08) 3.91 (2.35) 7.73 (7.67) 0.59 (0.16)
## 2 Bt1 24.38 (4) 29.39 (15.3) 15.89 (15.48) -0.01 (0.14)
## 3 Bt2 28.36 (5.51) 43.75 (15.96) 22.64 (21.1) -0.13 (0.11)
## 4 Bt3 34.67 (9.84) 54.87 (13.3) 22.17 (26.89) 0.08 (0.16)
## 5 Cr NaN (NA) 86.96 (21.86) 0 (0) NaN (NA)
## 6 R NaN (NA) 131.75 (9.41) 0 (0) NaN (NA)
## 7 not-used 19.7 (3.57) 16.75 (13.06) 13.18 (18.12) NaN (NA)
Plot profiles with only those horizons with silhouette widths less than 0 flagged.
# add a column containing a color (red) that flags horizons with silhouette width less than 0
pedons$sil.flag <- ifelse(pedons$sil.width < 0, 'red', 'white')
par(mar = c(0, 0, 1, 1))
plotSPC(pedons, label = 'pedon_id', cex.id = 0.5, max.depth = 150, cex.names = 0.66, width = 0.35, name.style = 'center-center', depth.axis = list(style = 'compact', line = -2), color='sil.flag')
GHL assignment as described thus far applies only to
in-memory objects in the current R session. In
order to use these GHL in further analysis or perform manual
adjustments, the GHL need to be saved externally. If you are a NASIS
user (working with data derived from NASIS) then the following code will
create a text file that can be read by NASIS and stored in the
dspcomplayerid
field of the pedon horizon table. The
“Update horizon group aggregations” calculation (“Calculations” ->
“Pedon Horizon” -> “Update horizon group aggregations”) expects a
text file at the C:/data/horizon_agg.txt
location on your
system, containing phiid and generalized horizon labels.
Note that some people prefer to adjust GHL assignments in R while others prefer to make adjustments after loading the data into NASIS.
# clear-out any existing files
rules.file <- 'C:/data/horizon_agg.txt'
write.table(data.frame(), file=rules.file, row.names=FALSE, quote=FALSE, na='', col.names=FALSE, sep='|')
# extract horizon data
h <- horizons(pedons)
# strip-out 'not-used' genhz labels and retain horizon ID and genhz assignment
h <- h[which(h$genhz != 'not-used'), c('phiid', 'genhz')]
# append to NASIS import file
write.table(h, file=rules.file, row.names=FALSE, quote=FALSE, na='', col.names=FALSE, sep='|', append=TRUE)
In the next document, we will work with pedon data from NASIS and use our GHL to generate estimates of “low-rv-high” style range in characteristics. An alternative approach to assignment of GHL can be performed with the “Update comp layer id by hzname (horizon group Aggregation)” calculation. This calculation, however, performs only basic pattern matching and likely requires considerable manual intervention.
This document is based on aqp
version 2.0 and
soilDB
version 2.7.8.