Introduction

This tutorial outlines a simple method for summarizing soil color from a collection of soil profiles using the aqp and sharpshootR packages for R. Suppose you have a collection of soil profiles and would like to compare the relative frequencies of soil color among different horizons, depths, hillslope positions, or bedrock. How would you determine the most frequent color or a reasonable range in colors for a given group? One solution to this problem is presented in the figure below. In this case, relative proportions of soil color are arranged by genetic horizon. Larger “color bars” are the most frequently described colors, both in terms of number of observed horizons and the thickness of those horizons. The small numbers in parenthesis denote the number of horizons associated with any given color. See examples below for more ideas. The manual page for aggregateColor and aggregateColorPlot contain additional details.

Setup R Envionment

If you have never used the aqp or sharpshootR packages before, you will likely need to install them. This only needs to be done once.

# stable version from CRAN and all dependencies
install.packages('aqp', dep=TRUE) 
install.packages('soilDB', dep=TRUE)
install.packages('sharpshootR', dep=TRUE)

Now that you have all of the R packages that this document depends on, it would be a good idea to load them. R packages must be installed anytime you change versions of R (e.g. after an upgrade), and loaded anytime you want to access functions from within those packages.

library(soilDB)
library(aqp)
library(sharpshootR)
library(cluster)

Sample Data

While the methods outlined in this document can be applied to any collection of pedons, it is convenient to work with a standardized set of data. You can follow along with the analysis by copying code from the following blocks and running it in your R session. The sample data used in this document is based on 30 soil profiles that have been correlated to the Loafercreek soil series from the Sierra Nevada Foothill Region of California. Note that the internal structure of the loafercreek data is identical to the structure returned by fetchNASIS() from the soilDB package. All horizon-level values are pulled from the pedon horizon table of the pedons being analyzed.

# load sample data from the soilDB package
data(loafercreek, package = 'soilDB')
# graphical check
par(mar=c(0,0,0,4))
plotSPC(loafercreek, name = NA, print.id = FALSE, cex.names = 0.8, axis.line.offset = -1, max.depth = 100, divide.hz = FALSE)

Examples

These examples can be readily adapted to your own data. Note that you may need to adjust object (change loafercreek to the name of your SoilProfileCollection object) and column (genhz, etc.) names. Data loaded from NASIS via fetchNASIS can be used without modification (except for object name) in these examples.

Typically color summaries are grouped by some kind of “horizon-level” attribute: depth slice, horizon designation, generalized horizon label, or within a diagnostic feature. However, it is possible to group colors by a “site-level” attribute such as bedrock kind, taxonname, or hillslope position. Note that grouping colors by a site-level attribute will pool colors from all depths.

Depth Slices

# slice color data at select depths
s <- slice(loafercreek, c(5, 10, 15, 25, 50, 75) ~ soil_color, strict = FALSE)

# make horizon labels based on slice depth
s$slice <- paste0(s$hzdept, ' cm')
s$slice <- factor(s$slice, levels=guessGenHzLevels(s, 'slice')$levels)

par(mar=c(4.5,2.5,4.5,0))
aggregateColorPlot(aggregateColor(s, 'slice'), label.cex=0.65, main='Loafercreek Dry Colors\nDepth Slices', print.n.hz = TRUE)

Generalized Horizon Labels

# generalize horizon names using REGEX rules
n <- c('Oi', 'A', 'BA','Bt1','Bt2','Bt3','Cr','R')
p <- c('O', '^A$|Ad|Ap|AB','BA$|Bw', 'Bt1$|^B$','^Bt$|^Bt2$','^Bt3|^Bt4|CBt$|BCt$|2Bt|2CB$|^C$','Cr','R')
loafercreek$genhz <- generalize.hz(loafercreek$hzname, n, p)

# remove non-matching generalized horizon names
loafercreek$genhz[loafercreek$genhz == 'not-used'] <- NA
loafercreek$genhz <- factor(loafercreek$genhz)

aggregateColorPlot(aggregateColor(loafercreek, 'genhz'), main='Loafercreek Series Dry Colors\nGeneralized Horizon Labels', print.n.hz = TRUE, label.cex = 0.65)

Site-Level Data

par(mar=c(4.5,4,4.5,0))
aggregateColorPlot(aggregateColor(loafercreek, 'hillslopeprof'), main='Loafercreek Series Dry Colors\nHillslope Position')

par(mar=c(4.5,8,4.5,0))
aggregateColorPlot(aggregateColor(loafercreek, 'bedrckkind'), main='Loafercreek Series Dry Colors\nBedrock Kind')

Additional Examples

Note that when using data from NASIS, you must first establish a selected set of site and pedon data.

Color Signatures by Cluster

In this example, we are generating color signatures for groups of horizons defined by k-medoids clustering of clay content, CEC, and natural log transform of total carbon.

# use a sample dataset from aqp
data(sp3)
depths(sp3) <- id ~ top + bottom

# define function for repeated k-medoids fitting
pam1 <- function(x,k) list(cluster = pam(x, k, cluster.only=TRUE, stand=TRUE))
d <- horizons(sp3)[, c('clay', 'cec', 'ln_tc')]

# empirically determine optimal number of clusters
g <- clusGap(d, FUN=pam1, K.max=10)
plot(g)

# cluster data
sp3.clara <- clara(d, k=6, stand=TRUE)
plot(silhouette(sp3.clara))

# check clustering vector on profile sketches
sp3$genhz <- sp3.clara$clustering
plotSPC(sp3, name='genhz')

# color signatures by cluster of horizon
aggregateColorPlot(aggregateColor(sp3, 'genhz'))

Soil Color Signatures

## get data from NASIS or similar source
# f <- soilDB::fetchNASIS(rmHzErrors = TRUE)

# # filter out select series
# soils <- c('ahwahnee', 'auberry', 'musick', 'holland', 'shaver', 'chaix', 'canisrocks')
# 
# # extract these soils and normalize taxonname
# f.sub <- f[grep(paste0(soils, collapse = '|'), f$taxonname, ignore.case = TRUE), ]
# for(x in soils) {
#   f.sub$taxonname[grep(x, f.sub$taxonname, ignore.case = TRUE)] <- x
# }
# # reset levels to order specified above
# f.sub$taxonname <- factor(f.sub$taxonname, levels=soils)
# 
# # re-name for cached copy
# sierra_transect <- f.sub
# diorite <- f[grep('diorite', f$bedrckkind, ignore.case = TRUE), ]
# 
# save(sierra_transect, diorite, file='cached-pedons-agg-color-example.rda')
# 


load('cached-pedons-agg-color-example.rda')

par(mar=c(4.5,5,4.5,0))
aggregateColorPlot(aggregateColor(sierra_transect, 'taxonname'), label.cex=0.65, main='Soil Color Signatures\nAll Colors', print.n.hz = TRUE, rect.border = NA, print.label = FALSE, horizontal.borders = TRUE)

par(mar=c(4.5,5,4.5,0))
aggregateColorPlot(aggregateColor(sierra_transect, 'taxonname', k=8), label.cex=0.65, main='Soil Color Signatures\n8 Colors', print.n.hz = TRUE, rect.border = NA, print.label = FALSE, horizontal.borders = TRUE)

Many Soils Associated with Common Bedrock

# geology + depth slices

diorite <- slice(diorite, c(5, 10, 15, 25, 50, 100, 150) ~ soil_color)

# make fake label
diorite$slice <- paste0(diorite$hzdept, ' cm')
diorite$slice <- factor(diorite$slice, levels=guessGenHzLevels(diorite, 'slice')$levels)

aggregateColorPlot(aggregateColor(diorite, 'slice'), label.cex=0.65, main='Soils Formed on "Diorite"')

OSD Colors from MLRAs 13, 133A, 137, 130B, 148

will post code soon

How Does it Work?

Color “bar” proportions are calculated using a combination of horizon thickness and number of observations, normalized to sum to 1 within each group. Within each group weights are computed:

\[ w_i = \sqrt{\sum{d_i}} * n_i \]

where \(w_i\) is the weight or width of color bar \(i\), \(d_i\) is the thickness of all horizon data associated with color \(i\), and \(n\) is the number of horizons in which color \(i\) has been described. Thicker “bars” suggest more frequent and larger (total) thickness of soil material with any given color.

The following block of code demonstrates how the aggregateColor works and the type of data that are returned.

# load some example data
data(sp1, package='aqp')

# upgrade to SoilProfileCollection and convert Munsell colors
sp1$soil_color <- with(sp1, munsell2rgb(hue, value, chroma))
depths(sp1) <- id ~ top + bottom
site(sp1) <- ~ group

# generalize horizon names
n <- c('O', 'A', 'B', 'C')
p <- c('O', 'A', 'B', 'C')
sp1$genhz <- generalize.hz(sp1$name, n, p)

# inspect the results
a <- aggregateColor(sp1, 'genhz')

# str(a)
soil_color weight n.hz munsell .id
#3C2C22FF 0.678 2 7.5YR 2/2 O
#3A2D20FF 0.177 1 10YR 2/2 O
#564436FF 0.145 1 7.5YR 3/2 O


soil_color weight n.hz munsell .id
#3A2D20FF 0.351 4 10YR 2/2 A
#564436FF 0.234 3 7.5YR 3/2 A
#745C40FF 0.107 1 10YR 4/3 A
#3C2C22FF 0.075 2 7.5YR 2/2 A
#715D4EFF 0.062 1 7.5YR 4/2 A
#775B44FF 0.053 1 7.5YR 4/3 A
#544535FF 0.049 1 10YR 3/2 A
#5B422EFF 0.047 1 7.5YR 3/3 A
#604026FF 0.020 1 7.5YR 3/4 A


soil_color weight n.hz munsell .id
#564436FF 0.284 5 7.5YR 3/2 B
#745C40FF 0.203 3 10YR 4/3 B
#544535FF 0.126 3 10YR 3/2 B
#58432CFF 0.082 2 10YR 3/3 B
#412B1AFF 0.055 2 7.5YR 2/3 B
#5B422EFF 0.053 2 7.5YR 3/3 B
#5D4131FF 0.052 2 5YR 3/3 B
#715D4EFF 0.027 1 7.5YR 4/2 B
#7F583FFF 0.023 1 5YR 4/4 B
#633F2AFF 0.021 1 5YR 3/4 B
#584338FF 0.020 1 5YR 3/2 B
#3A2D20FF 0.018 1 10YR 2/2 B
#8E775BFF 0.018 1 10YR 5/3 B
#725C50FF 0.016 1 5YR 4/2 B


soil_color weight n.hz munsell .id
#564436FF 0.279 3 7.5YR 3/2 C
#725C50FF 0.152 2 5YR 4/2 C
#8E775BFF 0.131 2 10YR 5/3 C
#795B36FF 0.131 2 10YR 4/4 C
#745C40FF 0.089 1 10YR 4/3 C
#715D4EFF 0.066 1 7.5YR 4/2 C
#58432CFF 0.060 1 10YR 3/3 C
#5B422EFF 0.054 1 7.5YR 3/3 C
#A89173FF 0.038 1 10YR 6/3 C


knitr::kable(a$aggregate.data, row.names=FALSE, digits=3)
genhz hue value chroma munsell distance col n H
O 7.5YR 2 2 7.5YR 2/2 1.164 #3C2C22FF 3 1.226
A 7.5YR 3 2 7.5YR 3/2 2.203 #564436FF 9 2.657
B 7.5YR 3 3 7.5YR 3/3 2.737 #5B422EFF 14 3.136
C 7.5YR 4 3 7.5YR 4/3 2.876 #775B44FF 9 2.914

This document is based on aqp version 1.43, soilDB version 2.7.3, and sharpshootR version 1.10.