Consider this situation: you have a collection of pedons that have been correlated to a named soil series (or component) and would like to *objectively* compute a range in characteristics ("low-rv-high" values) and horizon depths. As with most collections of pedon data, there may be considerable variation in description style and horizons used, horizon depths, and number of horizons described:

In this scenario, there are several obvious “micro-correlation” decisions that need to be made before horizons can be grouped for aggregation. For example, what horizonation prototype scheme (e.g., A-Bt1-Bt2-Bt3-Cr-R) best conveys the concept of this soil series or soil component? Does it make sense to group [Bt3, Bt4, BCt, CBt] horizons for aggregation? Or what about grouping [Bt3, 2Bt3] horizons? Do [BA, AB] horizons occur frequently enough to be included in the horizonation prototype?

Based on your knowledge of the area, pedon 2 might be a good "typical" pedon to use in developing a horizonation prototype. After careful review of the data and consultation with your crew, a new set of labels are assigned to each horizon (red labels in figure above) that define groups over which soil properties will be aggregated. These new labels define *functionally similar* groups that may span multiple genetic horizons.

This document describes an aggregation strategy based on the use of "generalized horizon labels" (GHL) through a combination of narrative, **R** code, and figures. You can follow along in an **R** session by copying code from the grey boxes and pasting it into the **R** console.

Here is a basic outline of the process:

Select a set of GHL that best represents a group of pedons to be aggregated. This could be based on series descriptions, expert knowledge, or even inspection of the most frequently occurring horizon designations.

Assign GHL to each horizon using whatever information is available for grouping horizons. This micro-correlation of horizon designations will likely require slightly different "rules" in each instance. Careful inspection of horizon designation and observed properties is critical.

Evaluate GHL assignments and manually refine as needed.

Keep track of final GHL assignments in NASIS or text file.

Estimate a most likely horizonation e.g., top and bottom depths for each generalized horizon label.

Compute range in characteristics, aka low-rv-high values, for clay, sand, pH, etc. over GHL. (next document in this series)

If you have never used the aqp or soildb packages before, you will likely need to install them. This only needs to be done once.

```
# stable version from CRAN + dependencies
install.packages('ape', dep=TRUE)
install.packages('latticeExtra', dep=TRUE)
install.packages('plyr', dep=TRUE)
install.packages('aqp', dep=TRUE)
install.packages('soilDB', dep=TRUE)
install.packages('sharpshootR', dep=TRUE)
```

Once you have all of the R packages on which this document depends, it is a good idea to load them. R packages must be **installed** anytime you change versions of R (e.g., after an upgrade) and **loaded** anytime you want to access functions from within those packages.

```
library(aqp)
library(soilDB)
library(sharpshootR)
library(igraph)
library(ape)
library(latticeExtra)
library(plyr)
library(lattice)
library(cluster)
library(MASS)
```

While the methods outlined in this document can be applied to any collection of pedons, it is convenient to work with a standardized set of data. You can follow along with the analysis by copying code from the following blocks and running it in your **R** session. The sample data used in this document is based on 15 soil profiles that have been correlated to the Loafercreek soil series from the Sierra Nevada Foothill Region of California. Note that the internal structure of the `loafercreek`

data is identical to the structure returned by `fetchNASIS()`

from the soilDB package. All horizon-level values are pulled from the pedon horizon table of the pedons being analyzed.

```
# load sample data from the soilDB package
data(loafercreek, package = 'soilDB')
# keep only the first 15 pedons
pedons <- loafercreek[1:15, ]
# plot profile sketches
par(mar=c(0,0,0,0))
plot(pedons, name='hzname', print.id=FALSE, cex.names=0.75, axis.line.offset=-4)
```

The following code block demonstrates how to pull data in using the `fetchNASIS()`

function from the soilDB package.

```
# first load the desired data set within NASIS into your NASIS selected set
# then load data from the NASIS selected set into R
pedons <- fetchNASIS(from='pedons')
# optionally subset the data by taxon name - enter your taxon name
pedons <- pedons[grep(pattern='ENTER_YOUR_TAXON_NAME', pedons$taxonname, ignore.case=TRUE), ]
```

Generalized horizon labels represent an expert-guided selection of designations that were consistently observed in the field and are meaningful in terms of soil morphology and management. The very first step in this process is to tabulate the number of times each horizon designation occurs.

`sort(table(pedons$hzname), decreasing=TRUE)`

```
##
## A Bt1 Bt2 Cr Bt3 R BA BAt A1 A2 Bt Bw BCt Crt Oi 2BCt 2Bt1 2Bt2 2Bt3 2Cr
## 12 12 12 8 6 6 5 4 3 3 3 3 2 2 2 1 1 1 1 1
## 2Crt BC Bt4 Bw1 Bw2 C
## 1 1 1 1 1 1
```

In this case, [A, Bt1, Bt2, Cr] horizon designations appear to be a good starting point. However, relying on field experience and expert knowledge of these soils, cross-checking against the OSD, and talking with other soil scientists may be more important than the previous tabulation.

Constructing a graph of transitions from one horizon to the next ("transition probabilities") can highlight those horizon designations that are most commonly used together. A follow-up tutorial on transition probability matrix interpretation is planned.

```
tp <- hzTransitionProbabilities(pedons, 'hzname')
par(mar=c(1,1,1,1))
plotSoilRelationGraph(tp, graph.mode = 'directed', edge.arrow.size=0.5, edge.scaling.factor=2, vertex.label.cex=0.75, vertex.label.family='sans')
```

A quick summary of horizon depth mid-points (e.g., average depth of horizon) can help in organizing the various designations and possibly give some clues as to how they can be grouped. The following plot is called a box and whisker plot.

```
# compute horizon mid-points
pedons$mid <- with(horizons(pedons), (hzdept + hzdepb) / 2)
# sort horizon designation by group-wise median values
hz.designation.by.median.depths <- names(sort(tapply(pedons$mid, pedons$hzname, median)))
# plot the distribution of horizon mid-points by designation
bwplot(mid ~ factor(hzname, levels=hz.designation.by.median.depths),
data=horizons(pedons),
ylim=c(155, -5), ylab='Horizon Mid-Point Depth (cm)',
scales=list(y=list(tick.number=10)),
panel=function(...) {
panel.abline(h=seq(0, 140, by=10), v=1:length(hz.designation.by.median.depths), col=grey(0.8), lty=3)
panel.bwplot(...)
})
```

Next, a similar summary of soil properties (clay content and total rock fragment volume) is presented. The goal is to determine which horizons designations can be grouped and which generalized horizon labels to assign to each group.

```
# box and wisker plot by clay content
bwplot(clay ~ factor(hzname, levels=hz.designation.by.median.depths),
data=horizons(pedons),
ylab='Clay Content (%)',
scales=list(y=list(tick.number=10)),
panel=function(...) {
panel.abline(h=seq(0, 100, by=5), v=1:length(hz.designation.by.median.depths), col=grey(0.8), lty=3)
panel.bwplot(...)
})
```

```
# box and wisker plot by total rock fragment volume
bwplot(total_frags_pct ~ factor(hzname, levels=hz.designation.by.median.depths),
data=horizons(pedons),
ylab='Total Rock Fragment Volume (%)',
scales=list(y=list(tick.number=10)),
panel=function(...) {
panel.abline(h=seq(0, 100, by=10), v=1:length(hz.designation.by.median.depths), col=grey(0.8), lty=3)
panel.bwplot(...)
})
```

Sometimes looking at thematic soil profile sketches can be informative.

```
# color horizons by clay content
par(mar=c(0,0,3,0))
plot(pedons, name='hzname', print.id=FALSE, cex.names=0.75, axis.line.offset=-4, color='clay')
```

Once a set of generalized horizon labels have been determined, a corresponding set of regular expression (REGEX) rules are developed to convert field-described designations into GHL. Pattern matching with REGEX will typically assign useful GHL; however, there will always be cases where manual intervention is required. More on that will follow.

From the above analysis and the OSD, it seems like the following sequence of GHL is appropriate: (A, Bt1, Bt2, Bt3, Cr, R) -- an A horizon, followed by 3 Bt horizons, then Cr, and finally R. For each GHL, a corresponding REGEX rule is needed. For example, `'^A$|Ad|Ap'`

will match 'A', 'Ad', and 'Ap'.

```
# save our GHL
n <- c('A','Bt1','Bt2','Bt3','Cr','R')
# REGEX rules
p <- c('^A$|Ad|Ap',
'Bt1$',
'^Bt2$',
'^Bt3|^Bt4|CBt$|BCt$|2Bt|2CB$|^C$',
'Cr',
'R')
```

Apply GHL pattern-matching rules, save to a new column called `genhz`

, and cross-tabulate the occurrence of GHL and original designations.

```
pedons$genhz <- generalize.hz(pedons$hzname, n, p)
# cross-tabulate original horizon designations and GHL
tab <- table(pedons$genhz, pedons$hzname)
addmargins(tab)
```

```
##
## 2BCt 2Bt1 2Bt2 2Bt3 2Cr 2Crt A A1 A2 BA BAt BC BCt Bt Bt1 Bt2 Bt3 Bt4 Bw Bw1 Bw2 C Cr
## A 0 0 0 0 0 0 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## Bt1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 0 0 0 0 0 0 0 0
## Bt2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 0 0 0 0 0 0 0
## Bt3 1 1 1 1 0 0 0 0 0 0 0 0 2 0 0 0 6 1 0 0 0 1 0
## Cr 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8
## R 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## not-used 0 0 0 0 0 0 0 3 3 5 4 1 0 3 0 0 0 0 3 1 1 0 0
## Sum 1 1 1 1 1 1 12 3 3 5 4 1 2 3 12 12 6 1 3 1 1 1 8
##
## Crt Oi R Sum
## A 0 0 0 12
## Bt1 0 0 0 12
## Bt2 0 0 0 12
## Bt3 0 0 0 14
## Cr 2 0 0 12
## R 0 0 6 6
## not-used 0 2 0 26
## Sum 2 2 6 94
```

In the above cross-tabulation, we can see that a couple of original designations were not matched (`not-used`

in the table) by our REGEX rules, namely, BA, Bw, and Oi horizons. For this example, we are going to assume that those horizons are not common enough for inclusion in our set of GHL.

It is also possible to display GHL assignment as a network graph.

```
# convert contingency table -> adj. matrix
m <- genhzTableToAdjMat(tab)
# plot using a function from the sharpshootR package
par(mar=c(1,1,1,1))
plotSoilRelationGraph(m, graph.mode = 'directed', edge.arrow.size=0.5)
```