This is a very basic introduction to the SoilProfileCollection
class object defined in the aqp
package for R. The SoilProfileCollection
class was designed to simplify the process of working with the collection of data associated with soil profiles: site-level data, horizon-level data, spatial data, diagnostic horizon data, metadata, etc. Examples listed below are meant to be copied/pasted from this document and interactively run within R. Comments (green text) briefly describe what the code in each line does. This document assumes a basic level of proficiency with R which can be gained by reviewing some of the material in tutorials like this. Further documentation on objects and functions from the aqp
package can be accessed by typing help(aqp)
(or more generally, ?function_name
) at the R console.
SoilProfileCollection
objects are typically created by “promoting” data.frame
objects (rectangular tables of data) that contain at least three essential columns:
The data.frame
should be pre-sorted according to the profile ID and horizon top boundary. Formula notation is used to define the columns used to promote a data.frame
object:
idcolumn ~ hz_top_column + hz_bottom_column
In this tutorial we will use some sample data included with the aqp
package, based on characterization data from 10 soils sampled on serpentinitic parent material as described in McGahan et al, 2009.
# load required packages, you may have to install these if missing:
# install.packages('aqp', dep = TRUE)
# remotes::install_github("ncss-tech/aqp", dependencies=FALSE, upgrade=FALSE, build=FALSE)
library(aqp)
library(Hmisc)
library(lattice)
library(MASS)
# load sample data set, a data.frame object with horizon-level data from 10 profiles
data(sp4)
str(sp4)
## 'data.frame': 30 obs. of 13 variables:
## $ id : chr "colusa" "colusa" "colusa" "colusa" ...
## $ name : chr "A" "ABt" "Bt1" "Bt2" ...
## $ top : int 0 3 8 30 0 9 0 4 13 0 ...
## $ bottom : int 3 8 30 42 9 34 4 13 40 6 ...
## $ K : num 0.3 0.2 0.1 0.1 0.2 0.3 0.2 0.6 0.8 0.4 ...
## $ Mg : num 25.7 23.7 23.2 44.3 21.9 18.9 12.1 12.1 17.7 16.4 ...
## $ Ca : num 9 5.6 1.9 0.3 4.4 4.5 1.4 7 4.4 24.1 ...
## $ CEC_7 : num 23 21.4 23.7 43 18.8 27.5 23.7 18 20 31.1 ...
## $ ex_Ca_to_Mg: num 0.35 0.23 0.08 0.01 0.2 0.2 0.58 0.51 0.25 1.47 ...
## $ sand : int 46 42 40 27 54 49 43 36 27 43 ...
## $ silt : int 33 31 28 18 20 18 55 49 45 42 ...
## $ clay : int 21 27 32 55 25 34 3 15 27 15 ...
## $ CF : num 0.12 0.27 0.27 0.16 0.55 0.84 0.5 0.75 0.67 0.02 ...
# optionally read about it...
# ?sp4
# upgrade to SoilProfileCollection
# 'id' is the name of the column containing the profile ID
# 'top' is the name of the column containing horizon upper boundaries
# 'bottom' is the name of the column containing horizon lower boundaries
depths(sp4) <- id ~ top + bottom
# register horizon designation column
hzdesgnname(sp4) <- 'name'
# check it out:
class(sp4)
## [1] "SoilProfileCollection"
## attr(,"package")
## [1] "aqp"
print(sp4)
## SoilProfileCollection with 10 profiles and 30 horizons
## profile ID: id | horizon ID: hzID
## Depth range: 16 - 49 cm
##
## ----- Horizons (6 / 30 rows | 10 / 14 columns) -----
## name id hzID top bottom K Mg Ca CEC_7 ex_Ca_to_Mg
## A colusa 1 0 3 0.3 25.7 9.0 23.0 0.35
## ABt colusa 2 3 8 0.2 23.7 5.6 21.4 0.23
## Bt1 colusa 3 8 30 0.1 23.2 1.9 23.7 0.08
## Bt2 colusa 4 30 42 0.1 44.3 0.3 43.0 0.01
## A glenn 5 0 9 0.2 21.9 4.4 18.8 0.20
## Bt glenn 6 9 34 0.3 18.9 4.5 27.5 0.20
## [... more horizons ...]
##
## ----- Sites (6 / 10 rows | 1 / 1 columns) -----
## id
## colusa
## glenn
## kings
## mariposa
## mendocino
## napa
## [... more sites ...]
##
## Spatial Data:
## [EMPTY]
“Accessor” functions are used to extract specific components from within SoilProfileCollection
objects.
Methods that return a column name. These are useful for extracting depths, horizon designations, IDs, etc. before taking an SPC apart for a specific task.
idname(sp4)
: extract profile ID name (column name used to init SPC)hzidname(sp4)
: horizon ID name (typically automatically built at init time)horizonDepths(sp4)
: horizon top / bottom depth names (used to init SPC)hzdesgnname(sp4)
: horizon designation name (if set)Methods that return a vector of values.
profile_id(sp4)
: profile IDs, in orderhzID(sp4)
: horizon IDs, in orderhzDesgn(sp4)
: horizon designations, in orderMethods that return site/horizon attribute column names.
names(sp4)
: site + horizon names concatenated into a single vectorhorizonNames(sp4)
: horizon namessiteNames(sp4)
: site namesProfile and horizon totals.
length(sp4)
: number of profiles in collectionnrow(sp4)
: number of horizons in collectionOther methods.
depth_units(sp4)
: defaults to ‘cm’ at SPC creationmetadata(sp4)
Typically, horizon and site data are the most important components of SoilProfileCollection
objects. Both are internally stored as data.frame
, data.table
, or tbl_df
objects; with one or more rows (per profile ID) in the horizon table and one row (per profile ID) in the site table. Columns from either table can be accessed with the $
style notation of data.frames
. New data can be assigned to either table in the same manner, as long as the length of the new data is either:
# assignment of new data to existing or new attributes
sp4$elevation <- rnorm(n=length(sp4), mean=1000, sd=150) # site-level, based on length of assigned data
sp4$thickness <- sp4$bottom - sp4$top # horizon-level
# extraction of specific attributes by name
sp4$clay # vector of clay content (horizon data)
## [1] 21 27 32 55 25 34 3 15 27 32 25 31 33 13 21 23 15 17 12 19 14 14 22 25 40 51 67 24 25 32
sp4$elevation # vector of simulated elevation (site data)
## [1] 890.5822 867.3535 997.8802 1030.1389 1035.8318 1280.7510 974.4870 935.6453 822.4480
## [10] 1205.5147
# assign a single single value into horizon-level attributes
sp4$constant <- rep(1, times=nrow(sp4))
# promote horizon-level data to site-level data (when it makes sense to do so)
# note that this _moves_ the named column from horizon to site
site(sp4) <- ~ constant
Horizon and site data can also be modified via extraction to data.frame
followed by replacement (horizon data) or join (site data). Note that while this approach gives the most flexibility, it is also the most dangerous– replacement of horizon data with new data that don’t exactly conform to the original sorting may corrupt your SoilProfileCollection
.
# extract horizon data to data.frame
h <- horizons(sp4)
# add a new column and save back to original object
h$random.numbers <- rnorm(n=nrow(h), mean=0, sd=1)
# _replace_ original horizon data with modified version
# ! row-order should not be altered !
horizons(sp4) <- h
# extract site data to data.frame
s <- site(sp4)
# add a fake group to the site data
s$group <- factor(rep(c('A', 'B'), length.out=nrow(s)))
# join new site data with previous data: old data are _not_ replaced
site(sp4) <- s
# check:
sp4
## SoilProfileCollection with 10 profiles and 30 horizons
## profile ID: id | horizon ID: hzID
## Depth range: 16 - 49 cm
##
## ----- Horizons (6 / 30 rows | 10 / 16 columns) -----
## name id hzID top bottom K Mg Ca CEC_7 ex_Ca_to_Mg
## A colusa 1 0 3 0.3 25.7 9.0 23.0 0.35
## ABt colusa 2 3 8 0.2 23.7 5.6 21.4 0.23
## Bt1 colusa 3 8 30 0.1 23.2 1.9 23.7 0.08
## Bt2 colusa 4 30 42 0.1 44.3 0.3 43.0 0.01
## A glenn 5 0 9 0.2 21.9 4.4 18.8 0.20
## Bt glenn 6 9 34 0.3 18.9 4.5 27.5 0.20
## [... more horizons ...]
##
## ----- Sites (6 / 10 rows | 4 / 4 columns) -----
## id elevation constant group
## colusa 890.5822 1 A
## glenn 867.3535 1 B
## kings 997.8802 1 A
## mariposa 1030.1389 1 B
## mendocino 1035.8318 1 A
## napa 1280.7510 1 B
## [... more sites ...]
##
## Spatial Data:
## [EMPTY]
Diagnostic horizons typically span several genetic horizons and may or may not be present in all profiles. To accommodate the wide range of possibilities, diagnostic horizon data are stored as a data.frame
in “long format”: each row corresponds to a diagnostic horizon, identified with a column matching the ID column used to initialize the SoilProfileCollection
object.
# manually create some diagnostic horizon data
# there is no restrictions on data format, as long as each row has an ID that exists within the collection
# be sure to use the ID column name that was used to initialize the SoilProfileCollection object
# check via: idname(sp4)
dh <- data.frame(id='colusa', kind='argillic', top=8, bottom=42, stringsAsFactors=FALSE)
# overwrite any existing diagnostic horizon data
diagnostic_hz(sp4) <- dh
# append to diagnostic horizon data
dh <- diagnostic_hz(sp4)
dh.new <- data.frame(id='napa', kind='argillic', top=6, bottom=20, stringsAsFactors=FALSE)
# overwrite existing diagnostic horizon data with appended data
diagnostic_hz(sp4) <- rbind(dh, dh.new)
Features that restrict root entry (fine or very fine roots) are commonly used to estimate functional soil depth. Restrictive features include duripans, fragipans, paralithic matrials, lithic contact, or an abrupt change in chemical property. Not all soils have restrictive features, therefore these data are stored as a data.frame
in “long format”. Each row corresponds to a restrictive feature, associated depths, and identified by profile_id()
. There may be more than one restrictive feature per soil profile.
The example data sp4
does not contain restrictive features, so we will simulate some at the bottom of each profile + 20cm.
# get the depth of each profile
rf.top <- profileApply(sp4, max)
rf.bottom <- rf.top + 20
# the profile IDs can be extracted from the names attribute
pIDs <- names(rf.top)
# carefully make data.frame
# note: profile IDs must be stored in a column named for idname(sp4) -> 'id'
rf <- data.frame(
id = pIDs,
top = rf.top,
bottom = rf.bottom,
kind='fake',
stringsAsFactors=FALSE
)
# overwrite any existing diagnostic horizon data
restrictions(sp4) <- rf
# check
restrictions(sp4)
## id top bottom kind
## 1 colusa 42 62 fake
## 2 glenn 34 54 fake
## 3 kings 40 60 fake
## 4 mariposa 49 69 fake
## 5 mendocino 30 50 fake
## 6 napa 20 40 fake
## 7 san benito 20 40 fake
## 8 shasta 40 60 fake
## 9 shasta-trinity 40 60 fake
## 10 tehama 16 36 fake
Spatial data can be explicitly stored within a SoilProfileCollection
object and accessed with methods imported from the sp package. The use of sp
classes (in this case SpatialPoints
and SpatialPointsDataFrame
objects) simplifies operations such as plotting spatial data, coordinate system transformations, and spatial queries.
# generate some fake coordinates as site level attributes
sp4$x <- rnorm(n = length(sp4), mean = 354000, sd = 100)
sp4$y <- rnorm(n = length(sp4), mean = 4109533, sd = 100)
# initialize spatial coordinates
coordinates(sp4) <- ~ x + y
# extract coordinates as matrix
coordinates(sp4)
## x y
## 1 354043.1 4109368
## 2 354138.5 4109509
## 3 353866.4 4109507
## 4 353812.7 4109486
## 5 353837.2 4109565
## 6 354011.2 4109571
## 7 354055.2 4109561
## 8 353923.1 4109481
## 9 354074.4 4109555
## 10 353920.5 4109555
# get/set spatial reference system using PROJ4 syntax
proj4string(sp4) <- '+proj=utm +zone=11 +datum=NAD83'
proj4string(sp4)
## [1] "+proj=utm +zone=11 +datum=NAD83 +units=m +no_defs"
# extract spatial data + site level attribtutes
# see ?SpatialPointsDataFrame for details
sp4.sp <- as(sp4, 'SpatialPointsDataFrame')
# in the presence of spatial data, subsetting can either result in
# 1. a new SoilProfileCollection (> 1 horizon / profile)
# 2. a new SpatialPointsDataFrame (1 horizon / profile)
class(sp4[1, ]) # profile 1 from the collection
## [1] "SoilProfileCollection"
## attr(,"package")
## [1] "aqp"
class(sp4[, 1]) # horizon 1 from each profile
## [1] "SoilProfileCollection"
## attr(,"package")
## [1] "aqp"
SoilProfileCollection
objects can be subset using the familiar [
-style notation used by matrix and data.frame
objects, such that: spc[i, j]
will return profiles identified by the integer vector i
, and horizons identified by the integer vector j
. Omitting either index will result in all profiles (i
omitted) or all horizons (j
omitted). Typically, site-level attributes will be used as the subsetting criteria. Functions that return an index to matches (such as grep()
or which()
) provide the link between attributes and an index to matching profiles. Some examples:
Using the i index, select one or more profiles by numeric index. An index greater than the number of profiles will return an empty SoilProfileCollection
object.
# explicit string matching
idx <- which(sp4$group == 'A')
# numerical expressions
idx <- which(sp4$elevation < 1000)
# regular expression, matches any profile ID containing 'shasta'
idx <- grep('shasta', profile_id(sp4), ignore.case = TRUE)
# perform subset based on index
sp4[idx, ]
In an interactive session, it is often simpler to use subset()
directly:
subset(sp4, group == 'A')
subset(sp4, elevation < 1000)
subset(sp4, grepl('shasta', id, ignore.case = TRUE))
Using the j index, select the 2nd horizon from each profile in the collection. Use with caution, asking for a horizon beyond the length of any profile in the collection will result in an error.
sp4[, 2]
Using the k index, select the .FIRST
first or .LAST
horizons (by profile) within a collection. Note that .FIRST
and .LAST
are special keywords used by the SoilProfileCollection
subset methods.
sp4[, , .FIRST]
sp4[, , .LAST]
SoilProfileCollection
objects are combined by passing a list of objects to the combine
function. Ideally all objects share the same internal structure, profile ID, horizon ID, depth units, and other parameters of a SoilProfileCollection
. Manually subset the example data into 3 pieces, compile into a list, and then combine
back together.
# subset data into chunks
s1 <- sp4[1:2, ]
s2 <- sp4[4, ]
s3 <- sp4[c(6, 8, 9), ]
# combine subsets
s <- combine(list(s1, s2, s3))
# double-check result
plotSPC(s)
It is possible to combine SoilProfileCollection
objects with different internal structure. The final object will contain the all site and horizon columns from the inputs, possibly creating sparse tables. IDs and horizon depth names are taken from the first object.
# sample data as data.frame objects
data(sp1)
data(sp3)
# rename IDs horizon top / bottom columns
sp3$newid <- sp3$id
sp3$hztop <- sp3$top
sp3$hzbottom <- sp3$bottom
# remove originals
sp3$id <- NULL
sp3$top <- NULL
sp3$bottom <- NULL
# promote to SoilProfileCollection
depths(sp1) <- id ~ top + bottom
depths(sp3) <- newid ~ hztop + hzbottom
# label each group via site-level attribute
site(sp1)$g <- 'sp1'
site(sp3)$g <- 'sp3'
# combine
x <- combine(list(sp1, sp3))
# make grouping variable into a factor for groupedProfilePlot
x$g <- factor(x$g)
# check results
str(x)
# graphical check
# convert character horizon IDs into numeric
x$.horizon_ids_numeric <- as.numeric(hzID(x))
par(mar = c(0, 0, 3, 1))
plotSPC(x, color='.horizon_ids_numeric', col.label = 'Horizon ID')
groupedProfilePlot(x, 'g', color='.horizon_ids_numeric', col.label = 'Horizon ID', group.name.offset = -15)
The inverse of combine
is split
: subsets of the SoilProfileCollection
are split into list elements, each containing a new SoilProfileCollection
.
# continuing from above
# split subsets of x into a list of SoilProfileCollection objects using site-level attribute 'g'
res <- split(x, 'g')
str(res, 1)
Duplicate the first profile in sp4
(Colusa) 8 times, resulting in a new SoilProfileCollection
object containing unique profile IDs.
d <- duplicate(sp4[1, ], times = 8)
par(mar = c(0,2,0,1))
plotSPC(d, color = 'ex_Ca_to_Mg')
# an example soil profile
x <- data.frame(
id = 'A',
name = c('A', 'E', 'Bhs', 'Bt1', 'Bt2', 'BC', 'C'),
top = c(0, 10, 20, 30, 40, 50, 100),
bottom = c(10, 20, 30, 40, 50, 100, 125),
z = c(8, 5, 3, 7, 10, 2, 12)
)
# init SPC
depths(x) <- id ~ top + bottom
hzdesgnname(x) <- 'name'
# horizon depth variability for simulation
horizons(x)$.sd <- 2
# duplicate several times
x.dupes <- duplicate(x, times = 5)
# simulate some new profiles based on example
x.sim <- perturb(x, n = 5, thickness.attr = '.sd')
# graphical check
par(mar=c(0,2,0,4))
# inspect unique results
plotSPC(unique(x.dupes, vars = c('top', 'bottom')), name.style = 'center-center', width = 0.15)
# uniqueness is a function of variable selection
plotSPC(unique(x.sim, vars = c('top', 'bottom')), name.style = 'center-center')
plotSPC(unique(x.sim, vars = c('name')), name.style = 'center-center', width = 0.15)
The plotSPC()
method for SoilProfileCollection
objects generates sketches of profiles within the collection based on horizon boundaries, vertically aligned to an integer sequence from 1 to the number of profiles. Horizon names are automatically extracted from a horizon-level attribute name
(if present), or via an alternate attributed given as an argument: name='column.name'
. Horizon colors are automatically generated from the horizon-level attribute soil_color
, or any other attribute of R-compatible color description given as an argument: color='column.name'
. This function is highly customizable, therefore, it is prudent to consult help(plotSPC)
from time to time. Soil colors in Munsell notation can be converted to R-compatible colors via munsell2rgb()
.
The explainPlotSPC()
function is helpful for adjusting some of the more obscure arguments to plotSPC()
.
A basic plot with debugging information overlayed.
par(mar=c(4,3,2,2))
explainPlotSPC(sp4, name='name')
Make sketches wider.
par(mar=c(4,3,2,2))
explainPlotSPC(sp4, name='name', width=0.3)
Move soil surface at 0cm “down” 5cm.
par(mar=c(4,3,2,2))
explainPlotSPC(sp4, name='name', y.offset=5)
Move soil surface at 0cm “up” 10cm; useful for sketches of shallow profiles.
par(mar=c(4,3,2,2))
explainPlotSPC(sp4, name='name', y.offset=-10)
Scale depths by 50%.
par(mar=c(4,3,2,2))
explainPlotSPC(sp4, name='name', scaling.factor=0.5)
A graphical explanation of how profiles are re-arranged via plot.order
argument.
par(mar=c(4,3,2,2))
explainPlotSPC(sp4, name='name', plot.order=length(sp4):1)
Leave room for an additional 2 profile sketches.
par(mar=c(4,3,2,2))
explainPlotSPC(sp4, name='name', n=length(sp4) + 2)
Making quality figures with fewer than 5 soil profiles usually requires more customization of the basic call to plotSPC
. In general, the following are a good starting point:
par
png()
) dimensions and resolutioncex.names
width
, typically within 0.15-0.35axis.line.offset
valuesGet some example data from the Official Series Descriptions.
library(soilDB)
x <- fetchOSD(c('appling', 'cecil', 'bonneau'))
Using 5x6 inch output device.
# set margins and turn off clipping
par(mar = c(0,2,0,4), xpd = NA)
plotSPC(x[1, ], cex.names = 1)
Using 7x6 inch output device, slight adjustments to width
usually required.
# set margins and turn off clipping
par(mar = c(0,2,0,4), xpd = NA)
plotSPC(x[1:2, ], cex.names = 1, width = 0.25)
Using 8x6 inch output device, slight adjustments to axis.line.offset
and width
are usually required.
# set margins and turn off clipping
par(mar = c(0,2,0,4), xpd = NA)
plotSPC(x, cex.names = 1, axis.line.offset = -0.1, width = 0.3)
Horizon depths can be labeled for each profile as an alternative to a single depth axis. As of aqp version 1.41, it is possible to “fix” overlapping horizon depth labels with the new fixLabelCollisions
argument. This approach to labeling depths works best when moving horizon designations with the name.style
argument. Note that setting fixLabelCollisions = TRUE
may incur a significant performance penalty when plotting large SoilProfileCollection
objects.
# set margins and turn off clipping
par(mar = c(0, 0, 1, 1))
plotSPC(x, cex.names = 1, name.style = 'center-center', width = 0.3, plot.depth.axis = FALSE, hz.depths = TRUE, fixLabelCollisions = TRUE, hz.depths.offset = 0.08)
Relative horizontal positioning via relative.pos
argument. Can be used with plot.order
but be careful: relative.pos
must be specified in the final ordering of profiles. See ?plotSPC
for details.
par(mar=c(4,3,2,2))
pos <- jitter(1:length(sp4))
explainPlotSPC(sp4, name = 'name', relative.pos = pos)
Relative positioning works well when the vector of positions is close to the default spacing along an integer sequence, but not when positions are closer than the width of a profile sketch.
par(mar = c(4,3,2,2))
pos <- c(1, 1.2, 3, 4, 5, 5.2, 7, 8, 9, 10)
explainPlotSPC(sp4, name = 'name', relative.pos = pos)
The fixOverlap()
function can be used to find a suitable arrangement of profiles based on a compromise between the suggested relative positions and minimization of overlap. This is an iterative procedure, based on random perturbations of overlapping profiles, therefore it is possible for the algorithm to stop at a sub-optimal configuration. Results can be controlled using set.seed()
.
par(mar = c(4,3,2,2))
new.pos <- fixOverlap(pos)
explainPlotSPC(sp4, name = 'name', relative.pos = new.pos)
There are several parameters available for optimizing horizontal position in the presence of overlap. See ?fixOverlap
for details and further examples. The SPC plotting ideas tutorial contains several practical examples.
par(mar = c(4,3,2,2))
new.pos <- fixOverlap(pos, thresh = 0.7)
explainPlotSPC(sp4, name = 'name', relative.pos = new.pos)
Horizon-level attributes can be symbolized with color, in this case using the horizon-level attribute “clay”:
# plot again, this time using new colors
par(mar=c(0,0,3,0)) # tighter figure margins
plotSPC(sp4, name='name', color='clay', col.label='Clay Content (%)')
Use a different set of colors.
# plot again, this time using new colors
par(mar=c(0,0,3,0)) # tighter figure margins
plotSPC(sp4, name='name', color='clay', col.palette=viridis::viridis(10), col.label='Clay Content (%)')
Categorical properties can also be used to make a thematic sketch. Colors are interpolated when there are more classes than colors provided by col.palette
.
par(mar=c(0,0,3,0)) # tighter figure margins
plotSPC(sp4, name='name', color='name', col.palette=RColorBrewer::brewer.pal(5, 'Set1'), col.label='Original Horizon Name')
Try with generalized horizon labels.
par(mar=c(0,0,3,0)) # tighter figure margins
# generalize horizon names into 3 groups
sp4$genhz <- generalize.hz(sp4$name, new = c('A', 'AB', 'Bt'), pat=c('A[0-9]?', 'AB', '^Bt'))
plotSPC(sp4, name='name', color='genhz', col.palette=RColorBrewer::brewer.pal(3, 'Spectral'), col.label='Generalized Horizon Name')
Horizon-level attributes that represent a volume fraction (e.g. coarse-fragment percentage) can be added to an existing figure. See ?addVolumeFraction
for additional customization ideas.
# tighter figure margins
par(mar=c(0,0,3,0))
# convert fraction -> percent
sp4$frag_pct <- sp4$CF * 100
# label horizons with fragment percent
plotSPC(sp4, name = 'frag_pct', color = 'frag_pct')
# symbolize volume fraction data
addVolumeFraction(sp4, colname = 'frag_pct')
Annotation of depth-intervals can be accomplished using “brackets”:
# extract top/bottom depths associated with all A horizons
# see ?DepthOf
tops <- profileApply(sp4, FUN = minDepthOf, pattern = '^A', hzdesgn = 'name', top = TRUE)
bottoms <- profileApply(sp4, FUN = maxDepthOf, pattern = '^A', hzdesgn = 'name', top = FALSE)
IDs <- profile_id(sp4)
# assemble data.frame
a <- data.frame(id = IDs, top = tops, bottom = bottoms)
# check: OK
a
## id top bottom
## colusa colusa 0 8
## glenn glenn 0 9
## kings kings 0 4
## mariposa mariposa 0 3
## mendocino mendocino 0 2
## napa napa 0 6
## san benito san benito 0 8
## shasta shasta 0 3
## shasta-trinity shasta-trinity 0 12
## tehama tehama 0 3
par(mar = c(0,0,0,0))
plotSPC(sp4)
# annotate with brackets
addBracket(a, col='red', offset = -0.4)
Labels.
# add a label for each bracket
a$label <- site(sp4)$id
# tighter figure margins
par(mar=c(0,0,0,0))
plotSPC(sp4, name='name')
# note that depth brackets "know which profiles to use"
addBracket(a, col='red', label.cex = 0.75, missing.bottom.depth = 25, offset = -0.4)
It is possible to arrange profile sketches by site-level grouping variable:
# use improvised site-level attribute 'group'
par(mar=c(0,0,0,0)) # tighter figure margins
groupedProfilePlot(sp4, groups='group')
# note that depth brackets "know" which profiles to use
addBracket(a, col='red', offset = -0.4)
There need not be brackets for all profiles in a collection:
# add a label for each bracket
a.sub <- a[1:4, ]
# tighter figure margins
par(mar=c(0,0,0,0))
groupedProfilePlot(sp4, groups='group')
# note that depth brackets "know which profiles to use"
addBracket(a.sub, col='red', offset = -0.4)
When bottom depths are missing an arrow is used:
a$bottom <- NA
# tighter figure margins
par(mar=c(0,0,0,0))
groupedProfilePlot(sp4, groups='group')
# note that depth brackets "know which profiles to use"
addBracket(a, col='red', offset = -0.4)
Manually define bottom depth
# tighter figure margins
par(mar=c(0,0,0,0))
groupedProfilePlot(sp4, groups='group')
# note that depth brackets "know which profiles to use"
addBracket(a, col='red', label.cex = 0.75, missing.bottom.depth = 25, offset = -0.4)
Further customization of brackets:
# copy root-restricting features
a <- restrictions(sp4)
# add a label for each bracket, using restrictive feature 'kind'
# addBracket() looks for a column called `label`
a$label <- a$kind
# plot with tighter margings
par(mar=c(0,0,0,0))
plotSPC(sp4, max.depth = 75)
# add restrictions using vertical bars
addBracket(a, col='red', label.cex = 0.75, tick.length = 0, lwd=3, offset = -0.4)
These functions (addBracket
, addDiagnosticBracket
, and addVolumeFraction
) will automatically compensate for alternative sketch ordering or relative positioning.
Sketches for use in layout tools such as Adobe Illustrator or Inkscape should be stored in a vector file format. SVG is compatible with most software titles. WMF output is compatible with MS Office tools and can be written with the win.metafile()
function.
library(svglite)
svglite(filename = 'e:/temp/fig.svg', width = 7, height = 6, pointsize = 12)
par(mar = c(0,2,0,4), xpd = NA)
plotSPC(x, cex.names=1, axis.line.offset = -0.2, width=0.3)
dev.off()
The profileApply()
function is an extension of the familiar apply() family of functions that operate on vectors (sapply
and tapply
), matrices (apply
), and lists (lapply
)– extended to SoilProfileCollection
objects. The function named in the FUN
argument is evaluated once for each profile in the collection, typically returning a single value per profile. In this case, the ordering of the results would match the ordering of values in the site level attribute table.
# max() returns the depth of a soil profile
sp4$soil.depth <- profileApply(sp4, FUN = max)
# max() with additional argument give max depth to non-missing 'clay'
sp4$soil.depth.clay <- profileApply(sp4, FUN = max, v = 'clay')
# nrow() returns the number of horizons
sp4$n.hz <- profileApply(sp4, FUN = nrow)
# compute the mean clay content by profile using an inline function
sp4$mean.clay <- profileApply(sp4, FUN=function(i) mean(i$clay))
# estimate soil depth based on horizon designation
sp4$soil.depth <- profileApply(sp4, estimateSoilDepth, name = 'name')
When FUN
returns a vector of the same length as the number of horizons in a profile, profileApply()
can be used to create new horizon-level attributes. For example, the change in clay content with depth could be calculated via:
# save as horizon-level attribute
sp4$delta.clay <- profileApply(sp4, FUN=function(i) c(NA, diff(i$clay)))
# check results:
horizons(sp4)[1:6, c('id', 'top', 'bottom', 'clay', 'delta.clay')]
## id top bottom clay delta.clay
## 1 colusa 0 3 21 NA
## 2 colusa 3 8 27 6
## 3 colusa 8 30 32 5
## 4 colusa 30 42 55 23
## 5 glenn 0 9 25 NA
## 6 glenn 9 34 34 9
More complex summaries can be generated by writing a custom function that is then called by profileApply()
. Note that each profile is passed into this function and accessed via a temporary variable (i
), which is a SoilProfileCollection
object containing a single profile. A list of SoilProfileCollection
objects returned from a custom function can be re-constituted into a single SoilProfileCollection
object via combine
. See help(profileApply)
for details.
# compute hz-thickness weighted mean exchangeable-Ca:Mg
wt.mean.ca.mg <- function(i) {
# use horizon thickness as a weight
thick <- i$bottom - i$top
# compute the weighted mean, accounting for the possibility of missing data
m <- wtd.mean(i$ex_Ca_to_Mg, weights=thick, na.rm=TRUE)
return(m)
}
# apply our custom function and save results as a site-level attribute
sp4$wt.mean.ca.to.mg <- profileApply(sp4, wt.mean.ca.mg)
We can now use our some of our new site-level attributes to order the profiles when plotting. In this case profiles are ordered based on the horizon-thickness weighted mean, exchangeable Ca:Mg values. Horizons are colored by exchangeable Ca:Mg values.
# the result is an index of rank
new.order <- order(sp4$wt.mean.ca.to.mg)
# plot the data using our new order based on Ca:Mg
par(mar=c(4,0,3,0)) # tighten figure margins
plotSPC(sp4, name='name', color='ex_Ca_to_Mg', plot.order=new.order)
# add an axis labeled with the sorting criteria
axis(1, at=1:length(sp4), labels=round(sp4$wt.mean.ca.to.mg, 3), cex.axis=0.75)
mtext(1, line=2.25, text='Horizon Thickness Weighted Mean Ex. Ca:Mg', cex=0.75)
Collections of soil profiles can be sliced (or re-sampled) into regular depth-intervals with the dice()
function. The slicing structure and variables of interest are defined via formula notation:
# slice select horizon-level attributes
seq ~ var.1 + var.2 + var.3 + ...
# slice all horizon-level attributes
seq ~ .
where seq
is a sequence of integers (e.g. 0:15
, c(5,10,15,20)
, etc.) and var.1 + var.2 + var.3 + ...
are horizon-level attributes to slice. Both continuous and categorical variables can be named on the right-hand-side of the formula. The results returned by dice()
is either a SoilProfileCollection
, or data.frame
when called with the optional argument SPC = FALSE
. For example, to slice our sample data set into 1-cm intervals, from 0–15 cm:
# resample to 1cm slices
s <- dice(sp4, fm= 0:15 ~ sand + silt + clay + name + ex_Ca_to_Mg)
# check the result
class(s)
## [1] "SoilProfileCollection"
## attr(,"package")
## [1] "aqp"
# plot sliced data
par(mar = c(0,0,3,0)) # tighten figure margins
plotSPC(s, color = 'ex_Ca_to_Mg')
Once soil profile data have been sliced, it is simple to extract “chunks” of data by depth interval via [
-subsetting:
# slice from 0 to max depth in the collection
s <- dice(sp4, fm= 0:max(sp4) ~ sand + silt + clay + name + ex_Ca_to_Mg)
# extract all data over the range of 5--10 cm:
plotSPC(s[, 5:10])
# extract all data over the range of 25--50 cm:
plotSPC(s[, 25:50])
# extract all data over the range of 10--20 and 40--50 cm:
plotSPC(s[, c(10:20, 40:50)])
glom
Select all horizons that overlap the interval of 5-15cm.
# truncate to the interval 0-15cm
clods <- profileApply(sp4, glom, z1 = 5, z2 = 15)
clods <- combine(clods)
# check
par(mar=c(0,0,3,0)) # tighten figure margins
plotSPC(clods, name='name', color='ex_Ca_to_Mg')
rect(xleft = 0.5, ybottom = 15, xright = length(sp4) + 0.5, ytop = 0)
trunc
Truncate the SPC to the interval of 5-15cm.
# truncate to the interval 0-15cm
sp4.truncated <- trunc(sp4, 0, 15)
# check
par(mar=c(0,0,3,0)) # tighten figure margins
plotSPC(sp4.truncated, name='name', color='ex_Ca_to_Mg')
Depth-wise summary of horizon-level attributes is performed with the slab()
function. Profile grouping criteria and horizon attribute selection is parametrized via formula: either groups ~ var1 + var2 + var3
where named variables are aggregated within groups
OR where named variables are aggregated across the entire collection ~ var1 + var2 + var3
. The default summary function (slab.fun
) computes the 5th, 25th, 50th, 75th, and 95th percentiles via Harrell-Davis quantile estimator.
The depth structure (“slabs”) over which summaries are computed is defined with the slab.structure
argument using:
10
): data are aggregated over a regular sequence of 10-unit thickness slabsc(50, 60)
): data are aggregated over depths spanning 50–60 unitsc(0, 5, 10, 50, 100)
): data are aggregated over the depths spanning 0–5, 5–10, 10–50, 50–100 units# aggregate a couple of the horizon-level attributes,
# across the entire collection,
# from 0--10 cm
# computing the mean value ignoring missing data
slab(sp4, fm = ~ sand + silt + clay, slab.structure = c(0,10), slab.fun = mean, na.rm = TRUE)
## variable all.profiles value top bottom contributing_fraction
## 1 sand 1 47.63 0 10 1
## 2 silt 1 31.15 0 10 1
## 3 clay 1 21.11 0 10 1
# again, this time within groups defined by a site-level attribute:
slab(sp4, fm = group ~ sand + silt + clay, slab.structure=c(0,10), slab.fun=mean, na.rm = TRUE)
## variable group value top bottom contributing_fraction
## 1 sand A 48.26 0 10 1
## 2 silt A 31.52 0 10 1
## 3 clay A 20.30 0 10 1
## 4 sand B 47.00 0 10 1
## 5 silt B 30.78 0 10 1
## 6 clay B 21.92 0 10 1
# again, this time over several depth ranges
slab(sp4, fm = ~ sand + silt + clay, slab.structure = c(0,10,25,40), slab.fun=mean, na.rm = TRUE)
## variable all.profiles value top bottom contributing_fraction
## 1 sand 1 47.63000 0 10 1.0000000
## 2 sand 1 42.38931 10 25 0.8733333
## 3 sand 1 32.14607 25 40 0.5933333
## 4 silt 1 31.15000 0 10 1.0000000
## 5 silt 1 29.41221 10 25 0.8733333
## 6 silt 1 31.34831 25 40 0.5933333
## 7 clay 1 21.11000 0 10 1.0000000
## 8 clay 1 28.10687 10 25 0.8733333
## 9 clay 1 36.26966 25 40 0.5933333
# again, this time along 1-cm slices, computing quantiles
agg <- slab(sp4, fm= ~ Mg + Ca + ex_Ca_to_Mg + CEC_7 + clay)
# see ?slab for details on the default aggregate function
head(agg)
## variable all.profiles p.q5 p.q25 p.q50 p.q75 p.q95 top bottom contributing_fraction
## 1 Mg 1 6.015 12.175 14.60 21.125 27.13 0 1 1
## 2 Mg 1 6.015 12.175 14.60 21.125 27.13 1 2 1
## 3 Mg 1 6.015 12.175 19.15 25.650 27.76 2 3 1
## 4 Mg 1 6.195 13.175 21.05 25.050 30.73 3 4 1
## 5 Mg 1 6.195 13.175 21.05 25.050 30.73 4 5 1
## 6 Mg 1 6.195 13.175 21.05 26.250 31.72 5 6 1
# plot median +/i bounds defined by the 25th and 75th percentiles
# this is lattice graphics, syntax is a little rough
xyplot(top ~ p.q50 | variable, data=agg, ylab='Depth',
xlab='median bounded by 25th and 75th percentiles',
lower=agg$p.q25, upper=agg$p.q75, ylim=c(42,-2),
panel=panel.depth_function,
alpha=0.25, sync.colors=TRUE,
par.settings=list(superpose.line=list(col='RoyalBlue', lwd=2)),
prepanel=prepanel.depth_function,
cf=agg$contributing_fraction, cf.col='black', cf.interval=5,
layout=c(5,1), strip=strip.custom(bg=grey(0.8)),
scales=list(x=list(tick.number=4, alternating=3, relation='free'))
)
Depth-wise aggregation can be useful for visual evaluation of multivariate similarity among groups of profiles.
# processing the "napa" and tehama profiles
idx <- which(profile_id(sp4) %in% c('napa', 'tehama'))
napa.and.tehama <- slab(sp4[idx, ], fm= ~ Mg + Ca + ex_Ca_to_Mg + CEC_7 + clay)
# combine with the collection-wide aggregate data
g <- make.groups(collection=agg, napa.and.tehama=napa.and.tehama)
# compare graphically:
xyplot(top ~ p.q50 | variable, groups=which, data=g, ylab='Depth',
xlab='median bounded by 25th and 75th percentiles',
lower=g$p.q25, upper=g$p.q75, ylim=c(42,-2),
panel=panel.depth_function,
alpha=0.25, sync.colors=TRUE, cf=g$contributing_fraction, cf.interval=10,
par.settings=list(superpose.line=list(col=c('RoyalBlue', 'Red4'), lwd=2, lty=c(1,2))),
prepanel=prepanel.depth_function,
layout=c(5,1), strip=strip.custom(bg=grey(0.8)),
scales=list(x=list(tick.number=4, alternating=3, relation='free')),
auto.key=list(columns=2, lines=TRUE, points=FALSE)
)
Occasionally there is a need for converting data associated with soil horizons, as described in the field, to a new set of depth intervals. This kind of change of support operation requires some form of aggregation (mean, median, etc.) and possibly interpolation (e.g. splines). The slab
function is the simplest way to implement a change of depth support via aggregation. The following example is based on a set of 9 randomly generated profiles, re-aligned to the Global Soil Map (GSM) standard depths.
library(reshape2)
library(RColorBrewer)
# 9 random profiles
# 1 simulated properties via logistic power peak (LPP) function
# 6, 7, or 8 horizons per profile
# result is a list of single-profile SPC
d <- lapply(
as.character(1:9),
random_profile,
n = c(6, 7, 8),
n_prop = 1,
method = 'LPP',
SPC = TRUE
)
# combine into single SPC
d <- combine(d)
# GSM depths
gsm.depths <- c(0, 5, 15, 30, 60, 100, 200)
# aggregate using mean: wt.mean within slabs
# see ?slab for ideas on how to parameterize slab.fun
d.gsm <- slab(d, fm=id ~ p1, slab.structure = gsm.depths, slab.fun = mean, na.rm=TRUE)
# note: result is in long-format
# note: horizon names are lost due to aggregation
head(d.gsm, 7)
## variable id value top bottom contributing_fraction
## 1 p1 1 8.336717 0 5 1.0000000
## 2 p1 1 8.336717 5 15 1.0000000
## 3 p1 1 8.367195 15 30 1.0000000
## 4 p1 1 12.058945 30 60 1.0000000
## 5 p1 1 16.916313 60 100 1.0000000
## 6 p1 1 15.507304 100 200 0.3157895
## 7 p1 2 29.270351 0 5 1.0000000
A simple graphical comparison of the original and re-aligned soil profile data.
# reshape to wide format
# this scales to > 1 aggregated variables
d.gsm.pedons <- dcast(d.gsm, id + top + bottom ~ variable, value.var = 'value')
depths(d.gsm.pedons) <- id ~ top + bottom
# iterate over aggregated profiles and make new hz names
d.gsm.pedons$hzname <- profileApply(d.gsm.pedons, function(i) {paste0('GSM-', 1:nrow(i))})
# compare original and aggregated
par(mar=c(1,0,3,3), mfrow=c(2,1))
plotSPC(d, color='p1')
plotSPC(d.gsm.pedons, color='p1')
Note that re-aligned data may not represent reality (and should therefore be used with caution) when the original soil depth is shallower than the deepest of the new (re-aligned) horizon depths. The contributing_fraction
metric returned by slab()
can be useful for assessing how much real data were used to generate the new set of re-aligned data.
# reshape to wide format
d.gsm.pedons.2 <- dcast(d.gsm, id + top + bottom ~ variable, value.var = 'contributing_fraction')
depths(d.gsm.pedons.2) <- id ~ top + bottom
# compare original and aggregated
par(mar=c(1,0,3,3), mfrow=c(2,1))
plotSPC(d.gsm.pedons, name='', color='p1')
plotSPC(d.gsm.pedons.2, name='', color='p1', col.label='Contributing Fraction', col.palette=brewer.pal(10, 'Spectral'))
Calculation of between-profile dissimilarity is performed using the profile_compare()
function. Dissimilarity values depend on attributes selection (e.g. clay, CEC, pH , etc.), optional depth-weighting parameter (k
), and a maximum depth of evaluation (max_d
). See the function manual page and this paper for details.
Some additional examples can be found in:
library(cluster)
library(sharpshootR)
# start fresh
data(sp4)
depths(sp4) <- id ~ top + bottom
hzdesgnname(sp4) <- 'name'
# eval dissimilarity:
# using Ex-Ca:Mg and CEC at pH 7
# with no depth-weighting (k=0)
# to a maximum depth of 40 cm
d <- profile_compare(sp4, vars=c('ex_Ca_to_Mg', 'CEC_7'), k=0, max_d=40)
# check distance matrix:
round(d, 1)
## Dissimilarities :
## colusa glenn kings mariposa mendocino napa san benito shasta shasta-trinity
## glenn 13.5
## kings 16.0 12.7
## mariposa 8.4 11.3 16.5
## mendocino 11.5 8.0 16.4 15.0
## napa 30.4 24.1 29.4 29.2 21.6
## san benito 25.7 20.6 26.3 28.2 15.8 18.0
## shasta 17.2 13.3 8.7 17.6 17.1 33.7 22.2
## shasta-trinity 6.4 16.6 22.3 9.6 16.5 29.8 27.2 23.3
## tehama 28.7 22.9 27.9 27.3 20.0 8.8 15.1 31.4 27.9
##
## Metric : mixed ; Types = I, I
## Number of objects : 10
# visualize dissimilarity matrix via divisive hierarchical clustering
d.diana <- diana(d)
# this function is from the sharpshootR package
# requires some manual adjustments
par(mar=c(0, 0, 4, 1))
plotProfileDendrogram(sp4, d.diana, scaling.factor = 0.8, y.offset = 5, cex.names=0.7, width=0.3, color='ex_Ca_to_Mg')
Try again in debug
mode to ensure that profiles are sorted correctly.
plotProfileDendrogram(sp4, d.diana, scaling.factor = 0.8, y.offset = 5, cex.names=0.7, width=0.3, color='ex_Ca_to_Mg', debug = TRUE)
# print metadata:
metadata(sp4)
# alter the depth unit metadata attribute
depth_units(sp4) <- 'inches' # units are really 'cm'
# more generic interface for adjusting metadata
md <- metadata(sp4) # save original metadata
# add columns
md$describer <- 'DGM'
md$date <- as.Date('2009-01-01')
md$citation <- 'McGahan, D.G., Southard, R.J, Claassen, V.P. 2009. Plant-Available Calcium Varies Widely in Soils on Serpentinite Landscapes. Soil Sci. Soc. Am. J. 73: 2087-2095.'
# re-assign user defined metadata to original object
metadata(sp4) <- md
# check:
metadata(sp4)
# fix depth units, set back to 'cm'
depth_units(sp4) <- 'cm'
checkSPC(sp4)
spc_in_sync(sp4)
z <- rebuildSPC(sp4)
checkHzDepthLogic(sp4)
# check our work by viewing the internal structure
str(sp4)
# deconstruct SoilProfileCollection into a data.frame, with horizon+site data
as(sp4, 'data.frame')
# deconstruct SoilProfileCollection into a list containing all slots
as(sp4, 'list')
# extraction of site + spatial data as SpatialPointsDataFrame
as(sp4, 'SpatialPointsDataFrame')
This document is based on aqp
version 1.42.