This tutorial outlines a simple method for summarizing soil color
from a collection of soil profiles using the aqp
and
sharpshootR
packages for R. Suppose you
have a collection of soil profiles and would like to compare the
relative frequencies of soil color among different horizons, depths,
hillslope positions, or bedrock. How would you determine the most
frequent color or a reasonable range in colors for a given group? One
solution to this problem is presented in the figure below. In this case,
relative proportions of soil color are arranged by genetic horizon.
Larger “color bars” are the most frequently described colors, both in
terms of number of observed horizons and the thickness of those
horizons. The small numbers in parenthesis denote the number of horizons
associated with any given color. See examples below for more ideas. The
manual page for aggregateColor
and
aggregateColorPlot
contain additional details.
If you have never used the aqp or sharpshootR packages before, you will likely need to install them. This only needs to be done once.
# stable version from CRAN and all dependencies
install.packages('aqp', dep=TRUE)
install.packages('soilDB', dep=TRUE)
install.packages('sharpshootR', dep=TRUE)
Now that you have all of the R packages that this document depends on, it would be a good idea to load them. R packages must be installed anytime you change versions of R (e.g. after an upgrade), and loaded anytime you want to access functions from within those packages.
library(soilDB)
library(aqp)
library(sharpshootR)
library(cluster)
While the methods outlined in this document can be applied to any
collection of pedons, it is convenient to work with a standardized set
of data. You can follow along with the analysis by copying code from the
following blocks and running it in your R session. The
sample data used in this document is based on 30 soil profiles that have
been correlated to the Loafercreek
soil series from the Sierra Nevada Foothill Region of California. Note
that the internal structure of the loafercreek
data is
identical to the structure returned by fetchNASIS()
from the soilDB package. All horizon-level values are pulled from
the pedon horizon table of the pedons being analyzed.
# load sample data from the soilDB package
data(loafercreek, package = 'soilDB')
# graphical check
par(mar=c(0,0,0,4))
plotSPC(loafercreek, name = NA, print.id = FALSE, cex.names = 0.8, axis.line.offset = -1, max.depth = 100, divide.hz = FALSE)
These examples can be readily adapted to your own data. Note that you
may need to adjust object (change loafercreek to the name of
your SoilProfileCollection
object) and column
(genhz, etc.) names. Data loaded from NASIS via
fetchNASIS
can be used without modification (except for
object name) in these examples.
Typically color summaries are grouped by some kind of “horizon-level” attribute: depth slice, horizon designation, generalized horizon label, or within a diagnostic feature. However, it is possible to group colors by a “site-level” attribute such as bedrock kind, taxonname, or hillslope position. Note that grouping colors by a site-level attribute will pool colors from all depths.
# slice color data at select depths
s <- slice(loafercreek, c(5, 10, 15, 25, 50, 75) ~ soil_color, strict = FALSE)
# make horizon labels based on slice depth
s$slice <- paste0(s$hzdept, ' cm')
s$slice <- factor(s$slice, levels=guessGenHzLevels(s, 'slice')$levels)
par(mar=c(4.5,2.5,4.5,0))
aggregateColorPlot(aggregateColor(s, 'slice'), label.cex=0.65, main='Loafercreek Dry Colors\nDepth Slices', print.n.hz = TRUE)
# generalize horizon names using REGEX rules
n <- c('Oi', 'A', 'BA','Bt1','Bt2','Bt3','Cr','R')
p <- c('O', '^A$|Ad|Ap|AB','BA$|Bw', 'Bt1$|^B$','^Bt$|^Bt2$','^Bt3|^Bt4|CBt$|BCt$|2Bt|2CB$|^C$','Cr','R')
loafercreek$genhz <- generalize.hz(loafercreek$hzname, n, p)
# remove non-matching generalized horizon names
loafercreek$genhz[loafercreek$genhz == 'not-used'] <- NA
loafercreek$genhz <- factor(loafercreek$genhz)
aggregateColorPlot(aggregateColor(loafercreek, 'genhz'), main='Loafercreek Series Dry Colors\nGeneralized Horizon Labels', print.n.hz = TRUE, label.cex = 0.65)
par(mar=c(4.5,4,4.5,0))
aggregateColorPlot(aggregateColor(loafercreek, 'hillslopeprof'), main='Loafercreek Series Dry Colors\nHillslope Position')
par(mar=c(4.5,8,4.5,0))
aggregateColorPlot(aggregateColor(loafercreek, 'bedrckkind'), main='Loafercreek Series Dry Colors\nBedrock Kind')
Note that when using data from NASIS, you must first establish a selected set of site and pedon data.
In this example, we are generating color signatures for groups of horizons defined by k-medoids clustering of clay content, CEC, and natural log transform of total carbon.
# use a sample dataset from aqp
data(sp3)
depths(sp3) <- id ~ top + bottom
# define function for repeated k-medoids fitting
pam1 <- function(x,k) list(cluster = pam(x, k, cluster.only=TRUE, stand=TRUE))
d <- horizons(sp3)[, c('clay', 'cec', 'ln_tc')]
# empirically determine optimal number of clusters
g <- clusGap(d, FUN=pam1, K.max=10)
plot(g)
# cluster data
sp3.clara <- clara(d, k=6, stand=TRUE)
plot(silhouette(sp3.clara))
# check clustering vector on profile sketches
sp3$genhz <- sp3.clara$clustering
plotSPC(sp3, name='genhz')
# color signatures by cluster of horizon
aggregateColorPlot(aggregateColor(sp3, 'genhz'))
## get data from NASIS or similar source
# f <- soilDB::fetchNASIS(rmHzErrors = TRUE)
# # filter out select series
# soils <- c('ahwahnee', 'auberry', 'musick', 'holland', 'shaver', 'chaix', 'canisrocks')
#
# # extract these soils and normalize taxonname
# f.sub <- f[grep(paste0(soils, collapse = '|'), f$taxonname, ignore.case = TRUE), ]
# for(x in soils) {
# f.sub$taxonname[grep(x, f.sub$taxonname, ignore.case = TRUE)] <- x
# }
# # reset levels to order specified above
# f.sub$taxonname <- factor(f.sub$taxonname, levels=soils)
#
# # re-name for cached copy
# sierra_transect <- f.sub
# diorite <- f[grep('diorite', f$bedrckkind, ignore.case = TRUE), ]
#
# save(sierra_transect, diorite, file='cached-pedons-agg-color-example.rda')
#
load('cached-pedons-agg-color-example.rda')
par(mar=c(4.5,5,4.5,0))
aggregateColorPlot(aggregateColor(sierra_transect, 'taxonname'), label.cex=0.65, main='Soil Color Signatures\nAll Colors', print.n.hz = TRUE, rect.border = NA, print.label = FALSE, horizontal.borders = TRUE)
par(mar=c(4.5,5,4.5,0))
aggregateColorPlot(aggregateColor(sierra_transect, 'taxonname', k=8), label.cex=0.65, main='Soil Color Signatures\n8 Colors', print.n.hz = TRUE, rect.border = NA, print.label = FALSE, horizontal.borders = TRUE)
# geology + depth slices
diorite <- slice(diorite, c(5, 10, 15, 25, 50, 100, 150) ~ soil_color)
# make fake label
diorite$slice <- paste0(diorite$hzdept, ' cm')
diorite$slice <- factor(diorite$slice, levels=guessGenHzLevels(diorite, 'slice')$levels)
aggregateColorPlot(aggregateColor(diorite, 'slice'), label.cex=0.65, main='Soils Formed on "Diorite"')
will post code soon
Color “bar” proportions are calculated using a combination of horizon thickness and number of observations, normalized to sum to 1 within each group. Within each group weights are computed:
\[ w_i = \sqrt{\sum{d_i}} * n_i \]
where \(w_i\) is the weight or width of color bar \(i\), \(d_i\) is the thickness of all horizon data associated with color \(i\), and \(n\) is the number of horizons in which color \(i\) has been described. Thicker “bars” suggest more frequent and larger (total) thickness of soil material with any given color.
The following block of code demonstrates how the
aggregateColor
works and the type of data that are
returned.
# load some example data
data(sp1, package='aqp')
# upgrade to SoilProfileCollection and convert Munsell colors
sp1$soil_color <- with(sp1, munsell2rgb(hue, value, chroma))
depths(sp1) <- id ~ top + bottom
site(sp1) <- ~ group
# generalize horizon names
n <- c('O', 'A', 'B', 'C')
p <- c('O', 'A', 'B', 'C')
sp1$genhz <- generalize.hz(sp1$name, n, p)
# inspect the results
a <- aggregateColor(sp1, 'genhz')
# str(a)
soil_color | weight | n.hz | munsell | .id |
---|---|---|---|---|
#3C2C22FF | 0.678 | 2 | 7.5YR 2/2 | O |
#3A2D20FF | 0.177 | 1 | 10YR 2/2 | O |
#564436FF | 0.145 | 1 | 7.5YR 3/2 | O |
soil_color | weight | n.hz | munsell | .id |
---|---|---|---|---|
#3A2D20FF | 0.351 | 4 | 10YR 2/2 | A |
#564436FF | 0.234 | 3 | 7.5YR 3/2 | A |
#745C40FF | 0.107 | 1 | 10YR 4/3 | A |
#3C2C22FF | 0.075 | 2 | 7.5YR 2/2 | A |
#715D4EFF | 0.062 | 1 | 7.5YR 4/2 | A |
#775B44FF | 0.053 | 1 | 7.5YR 4/3 | A |
#544535FF | 0.049 | 1 | 10YR 3/2 | A |
#5B422EFF | 0.047 | 1 | 7.5YR 3/3 | A |
#604026FF | 0.020 | 1 | 7.5YR 3/4 | A |
soil_color | weight | n.hz | munsell | .id |
---|---|---|---|---|
#564436FF | 0.284 | 5 | 7.5YR 3/2 | B |
#745C40FF | 0.203 | 3 | 10YR 4/3 | B |
#544535FF | 0.126 | 3 | 10YR 3/2 | B |
#58432CFF | 0.082 | 2 | 10YR 3/3 | B |
#412B1AFF | 0.055 | 2 | 7.5YR 2/3 | B |
#5B422EFF | 0.053 | 2 | 7.5YR 3/3 | B |
#5D4131FF | 0.052 | 2 | 5YR 3/3 | B |
#715D4EFF | 0.027 | 1 | 7.5YR 4/2 | B |
#7F583FFF | 0.023 | 1 | 5YR 4/4 | B |
#633F2AFF | 0.021 | 1 | 5YR 3/4 | B |
#584338FF | 0.020 | 1 | 5YR 3/2 | B |
#3A2D20FF | 0.018 | 1 | 10YR 2/2 | B |
#8E775BFF | 0.018 | 1 | 10YR 5/3 | B |
#725C50FF | 0.016 | 1 | 5YR 4/2 | B |
soil_color | weight | n.hz | munsell | .id |
---|---|---|---|---|
#564436FF | 0.279 | 3 | 7.5YR 3/2 | C |
#725C50FF | 0.152 | 2 | 5YR 4/2 | C |
#8E775BFF | 0.131 | 2 | 10YR 5/3 | C |
#795B36FF | 0.131 | 2 | 10YR 4/4 | C |
#745C40FF | 0.089 | 1 | 10YR 4/3 | C |
#715D4EFF | 0.066 | 1 | 7.5YR 4/2 | C |
#58432CFF | 0.060 | 1 | 10YR 3/3 | C |
#5B422EFF | 0.054 | 1 | 7.5YR 3/3 | C |
#A89173FF | 0.038 | 1 | 10YR 6/3 | C |
knitr::kable(a$aggregate.data, row.names=FALSE, digits=3)
genhz | hue | value | chroma | munsell | distance | col | n | H |
---|---|---|---|---|---|---|---|---|
O | 7.5YR | 2 | 2 | 7.5YR 2/2 | 1.164 | #3C2C22FF | 3 | 1.226 |
A | 7.5YR | 3 | 2 | 7.5YR 3/2 | 2.203 | #564436FF | 9 | 2.657 |
B | 7.5YR | 3 | 3 | 7.5YR 3/3 | 2.737 | #5B422EFF | 14 | 3.136 |
C | 7.5YR | 4 | 3 | 7.5YR 4/3 | 2.876 | #775B44FF | 9 | 2.914 |
This document is based on aqp
version 1.43,
soilDB
version 2.7.3, and sharpshootR
version
1.10.