D.E. Beaudette, J.M. Skovlin
2024-03-05
This document is
based on aqp
version 2.0.3 and soilDB
version
2.8.1.
Consider this situation: you have a collection of pedons that have been correlated to a named soil series (or component) and would like to objectively compute a range in characteristics (“low-rv-high” values) and horizon depths. As with most collections of pedon data, there may be considerable variation in description style and horizons used, horizon depths, and number of horizons described:
In this scenario, there are several obvious “micro-correlation” decisions that need to be made before horizons can be grouped for aggregation. For example, what horizonation prototype scheme (e.g., A-Bt1-Bt2-Bt3-Cr-R) best conveys the concept of this soil series or soil component? Does it make sense to group [Bt3, Bt4, BCt, CBt] horizons for aggregation? Or what about grouping [Bt3, 2Bt3] horizons? Do [BA, AB] horizons occur frequently enough to be included in the horizonation prototype?
Based on your knowledge of the area, pedon 2 might be a good “typical” pedon to use in developing a horizonation prototype. After careful review of the data and consultation with your crew, a new set of labels are assigned to each horizon (red labels in figure above) that define groups over which soil properties will be aggregated. These new labels define functionally similar groups that may span multiple genetic horizons.
Once you have assigned generalized horizon labels (GHL) to your pedons it is possible to generate a set of “most-likely” horizon depths for each GHL. An evaluation GHL probability along 1-cm depth slices results in a set of probability depth functions; the points at which these curves cross are good estimates of “most-likely” horizon depths. For example, in the figure below approximate horizon depths of 8, 28, 50, and 90cm can be readily estimated from the probability depth functions. It is also possible to visually determine when less common horizons (such as the BA horizon in the figure below) may not be common enough to include in a representative profile.
If you have never used the aqp or soildb packages before, you will likely need to install them. This only needs to be done once.
# stable version from CRAN + dependencies
install.packages('aqp')
install.packages('soilDB')
Once you have all of the R packages on which this document depends, it is a good idea to load them. R packages must be installed anytime you change versions of R (e.g., after an upgrade) and loaded anytime you want to access functions from within those packages.
library(aqp)
library(soilDB)
library(latticeExtra)
library(rms)
While the methods outlined in this document can be applied to any
collection of pedons, it is convenient to work with a standardized set
of data. You can follow along with the analysis by copying code from the
following blocks and running it in your R session. The
sample data used in this document is based on 15 soil profiles that have
been correlated to the Loafercreek
soil series from the Sierra Nevada Foothill Region of California. Note
that the internal structure of the loafercreek
data is
identical to the structure returned by fetchNASIS()
from the soilDB package. All horizon-level values are pulled from
the pedon horizon table of the pedons being analyzed.
# load sample data from the soilDB package
data(loafercreek, package = 'soilDB')
pedons <- loafercreek
# plot the first 15 profiles
par(mar=c(0,0,0,0))
plot(pedons[1:15, ], name = 'hzname', print.id = FALSE, cex.names = 0.8, depth.axis = list(line = -4))
The following code block demonstrates how to pull data in using the
fetchNASIS()
function from the soilDB package.
# first load the desired data set within NASIS into your NASIS selected set
# then load data from the NASIS selected set into R
pedons <- fetchNASIS()
# optionally subset the data by taxon name - enter your taxon name
pedons <- pedons[grep(pattern='ENTER_YOUR_TAXON_NAME', pedons$taxonname, ignore.case=TRUE), ]
# apply GHL rules
n <- c("A", "BA", "Bt1", "Bt2", "Bt3", "Cr", "R")
p <- c("^A$|Ad|Ap|^ABt$", "AB$|BA$|Bw", "Bt1$|^Bt$", "^Bt2$", "^Bt3|^Bt4|CBt$|BCt$|2Bt|2CB$|^C$", "Cr", "R")
pedons$genhz <- generalize.hz(pedons$hzname, n, p)
# remove non-matching generalized horizon names
pedons$genhz[pedons$genhz == "not-used"] <- NA
pedons$genhz <- factor(pedons$genhz)
# keep track of generalized horizon names for later
hz.names <- levels(pedons$genhz)
# slice GHL into 1cm intervals: no aggregation
max.depth <- 150
slice.resolution <- 1
slice.vect <- seq(from = 0, to = max.depth, by = slice.resolution)
s <- dice(pedons, slice.vect ~ genhz)
# convert GHL to factor
s$genhz <- factor(s$genhz, levels = hz.names)
# plot depth-ranges of generalized horizon slices
bwplot(hzdept ~ genhz, data = horizons(s), ylim = c(155, -5), ylab = "Generalized Horizon Depth (cm)",
scales = list(y = list(tick.number = 10)), asp = 1, panel = function(...) {
panel.abline(h = seq(0, 140, by = 10), v = 1:length(hz.names), col = grey(0.8), lty = 3)
panel.bwplot(...)
})
# proportional-odds logistics regression: fits well, ignore standard errors using sliced data
# properly weights observations... but creates optimistic SE rcs required when we include depths >
# 100 cm... should we use penalized PO-LR? see pentrace()
dd <- datadist(horizons(s))
options(datadist = "dd")
(l.genhz <- orm(genhz ~ rcs(hzdept), data = horizons(s), x = TRUE, y = TRUE))
## Frequencies of Missing Values Due to Each Variable
## genhz hzdept
## 4222 0
##
## Logistic (Proportional Odds) Ordinal Regression Model
##
## orm(formula = genhz ~ rcs(hzdept), data = horizons(s), x = TRUE,
## y = TRUE)
##
##
## Frequencies of Responses
##
## A BA Bt1 Bt2 Bt3 Cr R
## 668 599 1879 1812 1986 1568 3272
##
##
## Model Likelihood Discrimination Rank Discrim.
## Ratio Test Indexes Indexes
## Obs 11784 LR chi2 19864.93 R2 0.837 rho 0.912
## Distinct Y 7 d.f. 4 R2(4,11784) 0.815
## Median Y 5 Pr(> chi2) <0.0001 R2(4,11353.3) 0.826
## max |deriv| 5e-05 Score chi2 18397.03 |Pr(Y>=median)-0.5| 0.405
## Pr(> chi2) <0.0001
##
## Coef S.E. Wald Z Pr(>|Z|)
## y>=BA -0.8805 0.0674 -13.07 <0.0001
## y>=Bt1 -2.2527 0.0741 -30.41 <0.0001
## y>=Bt2 -5.2425 0.1102 -47.57 <0.0001
## y>=Bt3 -7.6052 0.1269 -59.93 <0.0001
## y>=Cr -10.1042 0.1358 -74.42 <0.0001
## y>=R -11.9939 0.1418 -84.59 <0.0001
## hzdept 0.1762 0.0040 43.80 <0.0001
## hzdept' -0.1991 0.0208 -9.59 <0.0001
## hzdept'' 0.4051 0.0634 6.39 <0.0001
## hzdept''' -0.4686 0.0911 -5.15 <0.0001
# predict along same depths: columns are the class-wise probability fitted.ind --> return all
# probability estimates
p <- data.frame(predict(l.genhz, data.frame(hzdept = slice.vect), type = "fitted.ind"))
# re-name, rms model output give funky names
names(p) <- hz.names
# add depths
p$top <- slice.vect
p.ml <- get.ml.hz(p, o.names = hz.names)
p.ml
## hz top bottom confidence pseudo.brier mean.H
## 1 A 0 10 47 0.3651021 1.4962620
## 2 Bt1 10 32 53 0.3228127 1.6427389
## 3 Bt2 32 47 49 0.3791655 1.6904646
## 4 Bt3 47 71 50 0.3619597 1.7297704
## 5 Cr 71 85 42 0.4887502 1.7022853
## 6 R 85 151 81 0.0854379 0.7510338