1 Introduction

This is a very basic introduction to the SoilProfileCollection class object defined in the aqp package for R. The SoilProfileCollection class was designed to simplify the process of working with the collection of data associated with soil profiles: site-level data, horizon-level data, spatial data, diagnostic horizon data, metadata, etc. Examples listed below are meant to be copied/pasted from this document and interactively run within R. Comments (green text) briefly describe what the code in each line does. This document assumes a basic level of proficiency with R which can be gained by reviewing some of the material in tutorials like this. Further documentation on objects and functions from the aqp package can be accessed by typing help(aqp) (or more generally, ?function_name) at the R console.

2 Object Creation

SoilProfileCollection objects are typically created by “promoting” data.frame objects (rectangular tables of data) that contain at least three essential columns:

  1. an ID column uniquely identifying groups of horizons (e.g. pedons)
  2. horizon top boundaries
  3. horizon bottom boundaries

The data.frame should be pre-sorted according to the profile ID and horizon top boundary. Formula notation is used to define the columns used to promote a data.frame object:

idcolumn ~ hz_top_column + hz_bottom_column

In this tutorial we will use some sample data included with the aqp package, based on characterization data from 10 soils sampled on serpentinitic parent material as described in McGahan et al, 2009.

# load required packages, you may have to install these if missing:
# install.packages('aqp', dep = TRUE)
# remotes::install_github("ncss-tech/aqp", dependencies=FALSE, upgrade=FALSE, build=FALSE)
library(aqp)
library(Hmisc)
library(lattice)
library(MASS)

# load sample data set, a data.frame object with horizon-level data from 10 profiles
data(sp4)
str(sp4)
## 'data.frame':    30 obs. of  13 variables:
##  $ id         : chr  "colusa" "colusa" "colusa" "colusa" ...
##  $ name       : chr  "A" "ABt" "Bt1" "Bt2" ...
##  $ top        : int  0 3 8 30 0 9 0 4 13 0 ...
##  $ bottom     : int  3 8 30 42 9 34 4 13 40 6 ...
##  $ K          : num  0.3 0.2 0.1 0.1 0.2 0.3 0.2 0.6 0.8 0.4 ...
##  $ Mg         : num  25.7 23.7 23.2 44.3 21.9 18.9 12.1 12.1 17.7 16.4 ...
##  $ Ca         : num  9 5.6 1.9 0.3 4.4 4.5 1.4 7 4.4 24.1 ...
##  $ CEC_7      : num  23 21.4 23.7 43 18.8 27.5 23.7 18 20 31.1 ...
##  $ ex_Ca_to_Mg: num  0.35 0.23 0.08 0.01 0.2 0.2 0.58 0.51 0.25 1.47 ...
##  $ sand       : int  46 42 40 27 54 49 43 36 27 43 ...
##  $ silt       : int  33 31 28 18 20 18 55 49 45 42 ...
##  $ clay       : int  21 27 32 55 25 34 3 15 27 15 ...
##  $ CF         : num  0.12 0.27 0.27 0.16 0.55 0.84 0.5 0.75 0.67 0.02 ...
# optionally read about it...
# ?sp4

# upgrade to SoilProfileCollection
# 'id' is the name of the column containing the profile ID
# 'top' is the name of the column containing horizon upper boundaries
# 'bottom' is the name of the column containing horizon lower boundaries
depths(sp4) <- id ~ top + bottom

# register horizon designation column
hzdesgnname(sp4) <- 'name'

# check it out:
class(sp4)
## [1] "SoilProfileCollection"
## attr(,"package")
## [1] "aqp"
print(sp4)
## SoilProfileCollection with 10 profiles and 30 horizons
## profile ID: id  |  horizon ID: hzID 
## Depth range: 16 - 49 cm
## 
## ----- Horizons (6 / 30 rows  |  10 / 14 columns) -----
##  name     id hzID top bottom   K   Mg  Ca CEC_7 ex_Ca_to_Mg
##     A colusa    1   0      3 0.3 25.7 9.0  23.0        0.35
##   ABt colusa    2   3      8 0.2 23.7 5.6  21.4        0.23
##   Bt1 colusa    3   8     30 0.1 23.2 1.9  23.7        0.08
##   Bt2 colusa    4  30     42 0.1 44.3 0.3  43.0        0.01
##     A  glenn    5   0      9 0.2 21.9 4.4  18.8        0.20
##    Bt  glenn    6   9     34 0.3 18.9 4.5  27.5        0.20
## [... more horizons ...]
## 
## ----- Sites (6 / 10 rows  |  1 / 1 columns) -----
##         id
##     colusa
##      glenn
##      kings
##   mariposa
##  mendocino
##       napa
## [... more sites ...]
## 
## Spatial Data:
## [EMPTY]

3 Accessing, Setting, and Replacing Data

“Accessor” functions are used to extract specific components from within SoilProfileCollection objects.

  • Methods that return a column name. These are useful for extracting depths, horizon designations, IDs, etc. before taking an SPC apart for a specific task.

    • idname(sp4): extract profile ID name (column name used to init SPC)
    • hzidname(sp4): horizon ID name (typically automatically built at init time)
    • horizonDepths(sp4): horizon top / bottom depth names (used to init SPC)
    • hzdesgnname(sp4): horizon designation name (if set)
  • Methods that return a vector of values.

    • profile_id(sp4): profile IDs, in order
    • hzID(sp4): horizon IDs, in order
    • hzDesgn(sp4): horizon designations, in order
  • Methods that return site/horizon attribute column names.

    • names(sp4): site + horizon names concatenated into a single vector
    • horizonNames(sp4): horizon names
    • siteNames(sp4): site names
  • Profile and horizon totals.

    • length(sp4): number of profiles in collection
    • nrow(sp4): number of horizons in collection
  • Other methods.

    • depth_units(sp4): defaults to ‘cm’ at SPC creation
    • metadata(sp4)

4 Horizon and Site Data

Typically, horizon and site data are the most important components of SoilProfileCollection objects. Both are internally stored as data.frame, data.table, or tbl_df objects; with one or more rows (per profile ID) in the horizon table and one row (per profile ID) in the site table. Columns from either table can be accessed with the $ style notation of data.frames. New data can be assigned to either table in the same manner, as long as the length of the new data is either:

  1. same length as the number of profiles in the collection (target is the site table)
  2. same length as the number of horizons in the collection (target is the horizon table)
# assignment of new data to existing or new attributes
sp4$elevation <- rnorm(n=length(sp4), mean=1000, sd=150) # site-level, based on length of assigned data
sp4$thickness <- sp4$bottom - sp4$top # horizon-level

# extraction of specific attributes by name
sp4$clay # vector of clay content (horizon data)
##  [1] 21 27 32 55 25 34  3 15 27 32 25 31 33 13 21 23 15 17 12 19 14 14 22 25 40 51 67 24 25 32
sp4$elevation # vector of simulated elevation (site data)
##  [1]  932.6315  952.2398 1098.0799 1130.0458  998.7517 1002.1140 1076.7609 1177.8125  933.6698
## [10] 1094.3532
# assign a single single value into horizon-level attributes
sp4$constant <- rep(1, times=nrow(sp4))

# promote horizon-level data to site-level data (when it makes sense to do so)
# note that this _moves_ the named column from horizon to site
site(sp4) <- ~ constant 

Horizon and site data can also be modified via extraction to data.frame followed by replacement (horizon data) or join (site data). Note that while this approach gives the most flexibility, it is also the most dangerous– replacement of horizon data with new data that don’t exactly conform to the original sorting may corrupt your SoilProfileCollection.

# extract horizon data to data.frame
h <- horizons(sp4)

# add a new column and save back to original object
h$random.numbers <- rnorm(n=nrow(h), mean=0, sd=1)

# _replace_ original horizon data with modified version
# ! row-order should not be altered !
horizons(sp4) <- h

# extract site data to data.frame
s <- site(sp4)

# add a fake group to the site data
s$group <- factor(rep(c('A', 'B'), length.out=nrow(s)))

# join new site data with previous data: old data are _not_ replaced
site(sp4) <- s

# check:
sp4
## SoilProfileCollection with 10 profiles and 30 horizons
## profile ID: id  |  horizon ID: hzID 
## Depth range: 16 - 49 cm
## 
## ----- Horizons (6 / 30 rows  |  10 / 16 columns) -----
##  name     id hzID top bottom   K   Mg  Ca CEC_7 ex_Ca_to_Mg
##     A colusa    1   0      3 0.3 25.7 9.0  23.0        0.35
##   ABt colusa    2   3      8 0.2 23.7 5.6  21.4        0.23
##   Bt1 colusa    3   8     30 0.1 23.2 1.9  23.7        0.08
##   Bt2 colusa    4  30     42 0.1 44.3 0.3  43.0        0.01
##     A  glenn    5   0      9 0.2 21.9 4.4  18.8        0.20
##    Bt  glenn    6   9     34 0.3 18.9 4.5  27.5        0.20
## [... more horizons ...]
## 
## ----- Sites (6 / 10 rows  |  4 / 4 columns) -----
##         id elevation constant group
##     colusa  932.6315        1     A
##      glenn  952.2398        1     B
##      kings 1098.0799        1     A
##   mariposa 1130.0458        1     B
##  mendocino  998.7517        1     A
##       napa 1002.1140        1     B
## [... more sites ...]
## 
## Spatial Data:
## [EMPTY]

5 Diagnostic Horizons

Diagnostic horizons typically span several genetic horizons and may or may not be present in all profiles. To accommodate the wide range of possibilities, diagnostic horizon data are stored as a data.frame in “long format”: each row corresponds to a diagnostic horizon, identified with a column matching the ID column used to initialize the SoilProfileCollection object.

# manually create some diagnostic horizon data
# there is no restrictions on data format, as long as each row has an ID that exists within the collection
# be sure to use the ID column name that was used to initialize the SoilProfileCollection object
# check via: idname(sp4)
dh <- data.frame(id='colusa', kind='argillic', top=8, bottom=42, stringsAsFactors=FALSE)

# overwrite any existing diagnostic horizon data
diagnostic_hz(sp4) <- dh

# append to diagnostic horizon data
dh <- diagnostic_hz(sp4)
dh.new <- data.frame(id='napa', kind='argillic', top=6, bottom=20, stringsAsFactors=FALSE)

# overwrite existing diagnostic horizon data with appended data
diagnostic_hz(sp4) <- rbind(dh, dh.new)

6 Root Restrictive Features

Features that restrict root entry (fine or very fine roots) are commonly used to estimate functional soil depth. Restrictive features include duripans, fragipans, paralithic matrials, lithic contact, or an abrupt change in chemical property. Not all soils have restrictive features, therefore these data are stored as a data.frame in “long format”. Each row corresponds to a restrictive feature, associated depths, and identified by profile_id(). There may be more than one restrictive feature per soil profile.

The example data sp4 does not contain restrictive features, so we will simulate some at the bottom of each profile + 20cm.

# get the depth of each profile
rf.top <- profileApply(sp4, max)
rf.bottom <- rf.top + 20

# the profile IDs can be extracted from the names attribute
pIDs <- names(rf.top)

# carefully make data.frame
# note: profile IDs must be stored in a column named for idname(sp4) -> 'id'
rf <- data.frame(
  id = pIDs, 
  top = rf.top, 
  bottom = rf.bottom, 
  kind='fake', 
  stringsAsFactors=FALSE
)

# overwrite any existing diagnostic horizon data
restrictions(sp4) <- rf

# check
restrictions(sp4)
##                id top bottom kind
## 1          colusa  42     62 fake
## 2           glenn  34     54 fake
## 3           kings  40     60 fake
## 4        mariposa  49     69 fake
## 5       mendocino  30     50 fake
## 6            napa  20     40 fake
## 7      san benito  20     40 fake
## 8          shasta  40     60 fake
## 9  shasta-trinity  40     60 fake
## 10         tehama  16     36 fake

7 Spatial Data

Spatial data can be explicitly stored within a SoilProfileCollection object and accessed with methods imported from the sp package. The use of sp classes (in this case SpatialPoints and SpatialPointsDataFrame objects) simplifies operations such as plotting spatial data, coordinate system transformations, and spatial queries.

# generate some fake coordinates as site level attributes
sp4$x <- rnorm(n = length(sp4), mean = 354000, sd = 100)
sp4$y <- rnorm(n = length(sp4), mean = 4109533, sd = 100)

# initialize spatial coordinates
coordinates(sp4) <- ~ x + y

# extract coordinates as matrix
coordinates(sp4)
##           x       y
## 1  354027.6 4109391
## 2  354150.6 4109509
## 3  353998.8 4109520
## 4  353901.5 4109661
## 5  354020.4 4109620
## 6  353807.6 4109473
## 7  354003.5 4109514
## 8  354119.0 4109347
## 9  354269.9 4109525
## 10 354021.0 4109523
# get/set spatial reference system using PROJ4 syntax
proj4string(sp4) <- '+proj=utm +zone=11 +datum=NAD83'
proj4string(sp4)
## [1] "+proj=utm +zone=11 +datum=NAD83 +units=m +no_defs"
# extract spatial data + site level attribtutes
# see ?SpatialPointsDataFrame for details
sp4.sp <- as(sp4, 'SpatialPointsDataFrame')

# plot the fake coordinates
plot(sp4.sp)
box()

# in the presence of spatial data, subsetting can either result in 
# 1. a new SoilProfileCollection (> 1 horizon / profile)
# 2. a new SpatialPointsDataFrame (1 horizon / profile)
class(sp4[1, ]) # profile 1 from the collection
## [1] "SoilProfileCollection"
## attr(,"package")
## [1] "aqp"
class(sp4[, 1]) # horizon 1 from each profile
## [1] "SoilProfileCollection"
## attr(,"package")
## [1] "aqp"

8 Subsetting SoilProfileCollection Objects

SoilProfileCollection objects can be subset using the familiar [-style notation used by matrix and data.frame objects, such that: spc[i, j] will return profiles identified by the integer vector i, and horizons identified by the integer vector j. Omitting either index will result in all profiles (i omitted) or all horizons (j omitted). Typically, site-level attributes will be used as the subsetting criteria. Functions that return an index to matches (such as grep() or which()) provide the link between attributes and an index to matching profiles. Some examples:

# explicit string matching
idx <- which(sp4$group == 'A')

# numerical expressions
idx <- which(sp4$elevation < 1000)

# regular expression, matches any profile ID containing 'shasta'
idx <- grep('shasta', profile_id(sp4), ignore.case=TRUE)

# perform subset based on index
sp4[idx, ]

Simpler subsetting with subset:

subset(sp4, group == 'A')
subset(sp4, elevation < 1000)
subset(sp4, grepl('shasta', id, ignore.case = TRUE))

9 Splitting, Duplication, and Selection of Unique Profiles

SoilProfileCollection objects are combined by passing a list of objects to the combine function. Ideally all objects share the same internal structure, profile ID, horizon ID, depth units, and other parameters of a SoilProfileCollection. Manually subset the example data into 3 pieces, compile into a list, and then combine back together.

# subset data into chunks
s1 <- sp4[1:2, ]
s2 <- sp4[4, ]
s3 <- sp4[c(6, 8, 9), ]

# combine subsets
s <- combine(list(s1, s2, s3))

# double-check result
plotSPC(s)

It is possible to combine SoilProfileCollection objects with different internal structure. The final object will contain the all site and horizon columns from the inputs, possibly creating sparse tables. IDs and horizon depth names are taken from the first object.

# sample data as data.frame objects
data(sp1)
data(sp3)

# rename IDs horizon top / bottom columns
sp3$newid <- sp3$id
sp3$hztop <- sp3$top
sp3$hzbottom <- sp3$bottom

# remove originals
sp3$id <- NULL
sp3$top <- NULL
sp3$bottom <- NULL

# promote to SoilProfileCollection
depths(sp1) <- id ~ top + bottom
depths(sp3) <- newid ~ hztop + hzbottom

# label each group via site-level attribute
site(sp1)$g <- 'sp1'
site(sp3)$g <- 'sp3'

# combine
x <- combine(list(sp1, sp3))

# make grouping variable into a factor for groupedProfilePlot
x$g <- factor(x$g)

# check results
str(x)

# graphical check
# convert character horizon IDs into numeric
x$.horizon_ids_numeric <- as.numeric(hzID(x))

par(mar = c(0, 0, 3, 1))
plotSPC(x, color='.horizon_ids_numeric', col.label = 'Horizon ID')
groupedProfilePlot(x, 'g', color='.horizon_ids_numeric', col.label = 'Horizon ID', group.name.offset = -15)

9.1 Splitting

The inverse of combine is split: subsets of the SoilProfileCollection are split into list elements, each containing a new SoilProfileCollection.

# continuing from above
# split subsets of x into a list of SoilProfileCollection objcts using site-level attribute 'g'
res <- split(x, 'g')
str(res, 1)

9.2 Duplication

d <- duplicate(sp4[1, ], times = 8)
par(mar=c(0,2,0,4))
plotSPC(d, color = 'ex_Ca_to_Mg')

9.3 Selecting Unique Profiles

# an example soil profile  
x <- data.frame(
id = 'A',
name = c('A', 'E', 'Bhs', 'Bt1', 'Bt2', 'BC', 'C'),
top = c(0, 10, 20, 30, 40, 50, 100),
bottom = c(10, 20, 30, 40, 50, 100, 125),
z = c(8, 5, 3, 7, 10, 2, 12)
)

# init SPC
depths(x) <- id ~ top + bottom

# horizon depth variability for simulation
horizons(x)$.sd <- 2

# duplicate several times
x.dupes <- duplicate(x, times = 5)

# simulate some new profiles based on example
x.sim <- perturb(x, n = 5, thickness.attr = '.sd')

# graphical check
par(mar=c(0,2,0,4))

# inspect unique results
plotSPC(unique(x.dupes, vars = c('top', 'bottom')), name.style = 'center-center')

# uniqueness is a function of variable selection
plotSPC(unique(x.sim, vars = c('top', 'bottom')), name.style = 'center-center')

plotSPC(unique(x.sim, vars = c('name')), name.style = 'center-center')

10 Plotting SoilProfileCollection Objects

The plotSPC() method for SoilProfileCollection objects generates sketches of profiles within the collection based on horizon boundaries, vertically aligned to an integer sequence from 1 to the number of profiles. Horizon names are automatically extracted from a horizon-level attribute name (if present), or via an alternate attributed given as an argument: name='column.name'. Horizon colors are automatically generated from the horizon-level attribute soil_color, or any other attribute of R-compatible color description given as an argument: color='column.name'. This function is highly customizable, therefore, it is prudent to consult help(plotSPC) from time to time. Soil colors in Munsell notation can be converted to R-compatible colors via munsell2rgb().

10.1 Making Adjustments

The explainPlotSPC() function is helpful for adjusting some of the more obscure arguments to plotSPC().

A basic plot with debugging information overlayed.

par(mar=c(4,3,2,2))
explainPlotSPC(sp4, name='name')

Make sketches wider.

par(mar=c(4,3,2,2))
explainPlotSPC(sp4, name='name', width=0.3)

Move soil surface at 0cm “down” 5cm.

par(mar=c(4,3,2,2))
explainPlotSPC(sp4, name='name', y.offset=5)

Move soil surface at 0cm “up” 10cm; useful for sketches of shallow profiles.

par(mar=c(4,3,2,2))
explainPlotSPC(sp4, name='name', y.offset=-10)

Scale depths by 50%.

par(mar=c(4,3,2,2))
explainPlotSPC(sp4, name='name', scaling.factor=0.5)

A graphical explanation of how profiles are re-arranged via plot.order argument.

par(mar=c(4,3,2,2))
explainPlotSPC(sp4, name='name', plot.order=length(sp4):1)

Leave room for an additional 2 profile sketches.

par(mar=c(4,3,2,2))
explainPlotSPC(sp4, name='name', n=length(sp4) + 2)