AQP + DSM = ESD ?

Dylan Beaudette (USDA-NRCS Sonora, CA)
October 7, 2014

This document is based on aqp version 1.7-6, soilDB version 1.3-5, and sharpshootR version 0.6-7.

Caveat emptor

I don't know all that much about ESI / ESD.
Dog and pony show– live demonstration more interesting.
Be sure to follow embedded links to full documentation.
Check out the AQP project website.
You can reproduce much of this document with:

# install stable versions
install.packages('aqp', dep=TRUE) 
install.packages('soilDB', dep=TRUE)
install.packages('sharpshootR', dep=TRUE)

# install latest versions from R-Forge:
install.packages('aqp', repos="http://R-Forge.R-project.org", type='source')
install.packages('soilDB', repos="http://R-Forge.R-project.org", type='source')
install.packages('sharpshootR', repos="http://R-Forge.R-project.org", type='source')

# load libraries
library(aqp)
library(soilDB)
library(sharpshootR)
library(lattice)
library(plyr)
library(Hmisc)

# set options
options(width=100, stringsAsFactors=FALSE)

Preaching to the Choir: why R?

repeatable, self-documenting work:
- uni-variate / multivariate summaries
- graphical representations of complex data
- 2800+ packages on CRAN: 100+ packages on GIS, ecology, and soils!
algorithm development by experts, application by trained users
- testing by eye: simple graphical “grammar” used to plot data
- formalized testing: correlation, regression, classification, ordination, …
- I/O capabilities: file, URL, SOAP, SQL, ODBC, PDF, PNG, SHP, KML, …
- optimizers, matrix operations, custom data structures, …

Outline

AQP family of packages
- aqp
- soilDB
- sharpshootR
Examples
- soil profile simulation
- profile sketches
- “slicing”
- aggregation
Details on numerical classification algorithm (?)
Live demo (?)

aqp package: Algorithms for Quantitative Pedology

a “vocabulary” for soil data analysis

special data structures: avoids annoying book-keeping code
visualization: soil profile sketches
aggregation: depth-slice summaries
classification: pair-wise dissimilarity of profiles

alt text

→ aqp manual page

soilDB package: Soil Database Interface

access to soil data should be simple, often it is not

PedonPC: pedon data
NASIS: pedon, component, map unit, ESD, etc.
KSSL: lab data
SDA: “live” SSURGO tabular data
OSD: official series descriptions
SSURGO: main soil survey product (spatial + tabular)
SoilWeb: snapshot-based window into the above
SCAN: soil climate network

# soilDB one-liners
auburn.lab <- fetchKSSL("auburn")  # KSSL data
auburn.extent <- seriesExtent("auburn")  # SSURGO-based series extent map
auburn.compdata <- SDA_query("<SQL>")  # SSURGO-based tabular data

→ soilDB manual page

sharpshootR package

prototypes, templates, and ideas to be built-upon

like this: example

→ sharpshootR manual page

SoilProfileCollection Objects

custom datatype to store/access hierarchy of soil profile information

# sample dataset, extracted from NASIS
library(soilDB)
data(loafercreek)
str(loafercreek, 2)

Formal class 'SoilProfileCollection' [package "aqp"] with 7 slots
  ..@ idcol     : chr "peiid"
  ..@ depthcols : chr [1:2] "hzdept" "hzdepb"
  ..@ metadata  :'data.frame':  1 obs. of  1 variable:
  ..@ horizons  :'data.frame':  308 obs. of  34 variables:
  ..@ site      :'data.frame':  54 obs. of  59 variables:
  ..@ sp        :Formal class 'SpatialPoints' [package "sp"] with 3 slots
  ..@ diagnostic:'data.frame':  177 obs. of  4 variables:

plot of chunk SPC-2

SoilProfileCollection Objects

Typical pedon/site data:

id, top, bottom, name, group
1,  0,   10,     A,    g1
1,  10,  18,     AB,   g1
          ...
2,  12,   22,    E,    g2
2,  22,   45,    Bhs1, g2

Converting data.frame → SoilProfileCollection:

x <- read.csv(file = ...)
# promote to SoilProfileCollection
depths(x) <- id ~ top + bottom
# move 'site data' into @site
site(x) <- ~group

Functions that return SoilProfileCollection:

x <- fetchOSD()
x <- fetchKSSL()
x <- fetchPedonPC()
x <- fetchNASIS()
x <- fetchNASIS_component_data()

→ SoilProfileCollection tutorial

SoilProfileCollection Objects

object inspection

idname(sp4) # pedon ID name
horizonDepths(sp4) # colum names containing top and bottom depths
depth_units(sp4) # defaults to 'cm'
metadata(sp4) # data.frame with 1 row
profile_id(sp4) # vector of profile IDs

overloads to common functions

length(sp4) # number of profiles in the collection
nrow(sp4) # number of horizons in the collection
names(sp4) # column names from site and horizon data
min(sp4) # shallowest profile depth in collection
max(sp4) # deepest profile depth in collection
sp4[i, j] # get profile "i", horizon "j"

getting / setting of components

horizons(sp4) # get / set horizon data
site(sp4)  # get / set site data
diagnostic_hz(sp4) # get / set diagnostic horizons
proj4string(sp4) # get / set CRS
coordinates(sp4) # get / set coordinates

coercion to SpatialPointsDataFrame or data.frame

as(sp4, 'SpatialPointsDataFrame')
as(sp4, 'data.frame')

Plotting SoilProfileCollection Objects

flexible generation of soil profile sketches using “base graphics”

alt text

→ plotSPC() manual page

Slice-Wise Aggregation

aggregation along regular “depth-slices” and within groups

alt text

→ slab() manual page

Pair-Wise Dissimilarity [see related talk]

alt text

→ profile_compare() manual page

soilDB: OSD Summaries

basic morphologic / taxonomic data from OSD and SC databases [SoilWeb]

# soils of interest
s.list <- c('hornitos', 'argonaut', 'mokelumne', 'dunstone', 'auburn', 'pentz', 'pardee', 'peters', 'amador', 'laniger')

# fetch data from SoilWeb server and return as SoilProfileCollection
s <- fetchOSD(s.list)

# plot
par(mar=c(0,0,0,0))
plot(s, name='hzname', id.style='side', cex.name=0.75, axis.line.offset=-4.5)

plot of chunk soilDB-OSD

soilDB: Soil Series Extent

simplified series extent maps [SoilWeb]

amador <- seriesExtent("amador")  # result is a SpatialPolygonsDataFrame
writeOGR(amador, driver = "ESRI Shapefile", ...)  # save to SHP

alt text

→ series extent tutorial

soilDB: KSSL Data

characterization and current taxonomic data [SoilWeb]

musick <- fetchKSSL("musick")  # result is a SoilProfileCollection

alt text

→ fetchKSSL() tutorial

sharpshootR: Soil Taxonomy Visualization

profile sketches organized by taxonomy

# soils of interest
s.list <- c("hornitos", "perkins", "argonaut", "inks", "mokelumne", "dunstone", "auburn", "pentz", "pardee", 
    "peters", "amador", "laniger")

# fetch data from SoilWeb server and return as SoilProfileCollection
s <- fetchOSD(s.list)

# organize and plot according to subgroup level taxonomic data
SoilTaxonomyDendrogram(s, cex.taxon.labels = 0.8)

plot of chunk sharpshootR-1

→ SoilTaxonomyDendrogram() manual page

sharpshootR: Component Relationships

investigate the relationship between component data from SSURGO / NASIS

# get data from SDA
q <- "SELECT component.mukey, comppct_r, lower(compname) as compname 
FROM legend 
INNER JOIN mapunit ON mapunit.lkey = legend.lkey 
LEFT OUTER JOIN component ON component.mukey = mapunit.mukey 
WHERE legend.areasymbol IN ('CA654') AND compkind IN ('Series', 'Taxadjunct')
ORDER BY mukey, comppct_r DESC"

# run query, process results, and return as data.frame object
res <- SDA_query(q)

# compute adjacency matrix
m <- component.adj.matrix(res)
# plot component relationships as network diagram
par(mar = c(0, 0, 2, 0))
plotSoilRelationGraph(m)
title("CA654 Components")

→ SDA_query() manual page
→ component.adj.matrix() manual page
→ plotSoilRelationGraph() manual page

plot of chunk sharpshootR-comp-relationship-3

sharpshootR: Component Relationships

generate hillslope position probability matrix by series name

top.10 <- names(sort(table(res$compname), decreasing = TRUE)[1:10])
hp <- hillslope.probability(top.10)

      compname Toeslope Footslope Backslope Shoulder Summit
1     ahwahnee       NA        NA      0.80       NA     NA
2      auberry       NA      0.02      0.73     0.01   0.02
3   blasingame       NA      0.02      0.83       NA   0.02
4    fallbrook       NA        NA      1.00       NA     NA
5  grangeville     0.48      0.24      0.04       NA     NA
6      hanford     0.45      0.25      0.13       NA     NA
7     hesperia     0.07      0.45      0.29       NA     NA
8  san joaquin     0.42      0.05      0.07     0.21   0.01
9       sierra       NA        NA      0.64     0.05   0.01
10       vista       NA      0.08      0.73       NA   0.02

Potential uses:

disaggregation clues
stratification / weighting for sampling mission
survey update work / ESD work

→ hillslope.probability() manual page

OK, So What?

how about some examples

simulating data to feed / test models

sim(SPC, n=10, hz.sd=2)
random_profile(id, n=c(3, 4, 5), min_thick=5, max_thick=30, n_prop=5)

profile sketches ordered by meaningful gradient

plot(SPC, plot.order=new.order)

applying functions by profile

profileApply(SPC, <function>)

“slicing”: depth-wise alignment and extraction of data

slice(SPC, 0:50 ~ sand + silt + clay)

aggregating by “slab”: group / depth-wise summaries

# assuming no NA
slab(SPC, ~ sand + silt + clay, slab.fun=mean)
slab(SPC, ~ sand + silt + clay, slab.structure=c(0,10), slab.fun=mean)
slab(SPC, group ~ sand + silt + clay, slab.fun=mean)

Simulated Data: Horizon Depths and Designations

# source data are a single profile description of the Morley series as a data.frame
depths(b) <- id ~ top + bottom
# convert horizon colors into RGB
b$soil_color <- munsell2rgb(b$hue, b$value, b$chroma)
# simulate 15 profiles based on reported horizon thickness standard deviations 
b.sim <- sim(b, n=15, hz.sd=c(2,1,1,2,4,2,2,4))
# set depth units to inches
depth_units(b.sim) <- 'in'

alt text

→ sim() manual page

Simulated Data: Physical Properties

simulation based on a random walk– similar to highly stratified soils

# implicit loop via plyr::ldply, result is a data.frame
d <- ldply(1:10, random_profile, n=c(6, 7, 8), n_prop=1, method='random_walk')
# promote to SoilProfileCollection and plot
depths(d) <- id ~ top + bottom
par(mar=c(0,0,3,0))
plot(d, color='p1', axis.line.offset=-4, max.depth=150)

plot of chunk simulate-profiles-2

→ random_profile() manual page

Simulated Data: Physical Properties

simulation based on the logistic power peak function– more realistic anisotropy

# implicit loop via plyr::ldply, result is a data.frame
d <- ldply(1:10, random_profile, n=c(6, 7, 8), n_prop=1, method='LPP', 
lpp.a=5, lpp.b=10, lpp.d=5, lpp.e=5, lpp.u=25)
# promote to SoilProfileCollection and plot
depths(d) <- id ~ top + bottom
par(mar=c(0,0,3,0))
plot(d, color='p1', axis.line.offset=-4, max.depth=150)

plot of chunk simulate-profiles-3

→ random_profile() manual page

Profile Sketches: Sierra Transects

data and concept from Dahlgren et al. and Rasmussen et al.

x.g <- read.csv('dahlgren-granitics.csv', stringsAsFactors=FALSE)
x.a <- read.csv(file='rasmussen-andisitic-lahar.csv', stringsAsFactors=FALSE)

# convert colors
x.g$soil_color <- with(x.g, munsell2rgb(hue, value, chroma))
x.a$soil_color <- with(x.a, munsell2rgb(hue, value, chroma))

# init Soill Profile Collection objects
depths(x.g) <- id ~ top + bottom
site(x.g) <- ~ elev + MAAT + MAP + geo
# init Soill Profile Collection objects
depths(x.a) <- id ~ top + bottom
site(x.a) <- ~ elev + precip + MAP + MAT + veg + Fe_d_to_Fe_t

# index ordering to elevation in meters
g.new.order <- order(x.g$elev)
a.new.order <- order(x.a$elev)

# plot first figure
par(mfcol=c(1,2), mar=c(3,0,0,0))
plot(x.g, name='name', plot.order=g.new.order, cex.name=0.75, axis.line.offset=-4, id.style='side')
axis(1, at=1:length(x.g), labels=x.g$elev[g.new.order], line=-2)
# plot second figure
plot(x.a, name='name', plot.order=a.new.order, cex.name=0.75,  axis.line.offset=-4, id.style='side')
axis(1, at=1:length(x.a), labels=x.a$elev[a.new.order], line=-2)

Profile Sketches: Sierra Transects

pedogenic “sweet spot” along bio-climatic gradient

alt text

Magnesic Soils of California

data From McGahan et al.

# load sample dataset, comes with aqp package
data(sp4)
# inspect first 4 rows x 12 columns
sp4[1:4, 1:12]

      id name top bottom   K   Mg  Ca CEC_7 ex_Ca_to_Mg sand silt clay
1 colusa    A   0      3 0.3 25.7 9.0  23.0        0.35   46   33   21
2 colusa  ABt   3      8 0.2 23.7 5.6  21.4        0.23   42   31   27
3 colusa  Bt1   8     30 0.1 23.2 1.9  23.7        0.08   40   28   32
4 colusa  Bt2  30     42 0.1 44.3 0.3  43.0        0.01   27   18   55

# upgrade to SoilProfileCollection
depths(sp4) <- id ~ top + bottom

# custom function for computing hz-thick wt. mean, accounting for missing data
wt.mean.ca.mg <- function(i) {
    # use horizon thickness as a weight
    thick <- i$bottom - i$top
    # function is from the Hmisc package
    m <- wtd.mean(i$ex_Ca_to_Mg, weights=thick, na.rm=TRUE)
    return(m)
    }

# apply custom function to each profile, save as "site-level" attribute
sp4$wt.mean.ca.to.mg <- profileApply(sp4, wt.mean.ca.mg)

# generate index ordering from small -> large Ca:Mg
new.order <- order(sp4$wt.mean.ca.to.mg)

→ profileApply() manual page

Magnesic Soils of California

# plot the data using our new order based on Ca:Mg
par(mar=c(4,0,3,0))
plot(sp4, name='name', color='ex_Ca_to_Mg', plot.order=new.order, cex.name=0.75, id.style='side', axis.line.offset=-4,)

# add an axis labeled with the sorting criteria
axis(1, at=1:length(sp4), labels=round(sp4$wt.mean.ca.to.mg, 3), cex.axis=1)
mtext(1, line=2.25, text='Horizon Thickness Weighted Mean Ex. Ca:Mg', cex=1)

plot of chunk magnesic-soils-2

→ sp4 sample data set

Slicing OSD Data = Map of Soil Color

# read in OSD color data as CSV
x <- read.csv('data/osd-colors-original.csv.gz')

Ap,0,20,10YR,4,4,10YR,3,4,CECIL
Bt1,20,66,10R,4,8,10R,3,7,CECIL
Bt2,66,107,10R,4,8,10R,3,7,CECIL
BC,107,127,2.5YR,4,8,2.5YR,3,7,CECIL
C,127,203,2.5YR,4,8,2.5YR,3,7,CECIL

# re-order by series, then depth
x <- x[order(x$series, x$top), ]

# convert Munsell to RGB
x.rgb <- with(x, munsell2rgb(matrix_wet_color_hue, 
                             matrix_wet_color_value, 
                             matrix_wet_color_chroma, 
                             return_triplets=TRUE))

# init SoilProfileCollection object
depths(g) <- series ~ top + bottom

# slice at specific depths, keeping only  r, g, b
# ignore bad horizonation with strict=FALSE
g.slices <- slice(g, c(5, 10, 15, 25) ~ r + g + b, just.the.data=TRUE, strict=FALSE)

# save depth slices as CSV files, import into GIS and make map
write.csv(g.slices[g.slices$top == 5, c('series', 'r', 'g', 'b')], ...)
write.csv(g.slices[g.slices$top == 10, c('series', 'r', 'g', 'b')], ...)
write.csv(g.slices[g.slices$top == 15, c('series', 'r', 'g', 'b')], ...)
write.csv(g.slices[g.slices$top == 25, c('series', 'r', 'g', 'b')], ...)

→ slice() manual page

Slicing OSD Data = Map of Soil Color [5 cm]

alt text

Slicing OSD Data = Map of Soil Color [10 cm]

alt text

Slicing OSD Data = Map of Soil Color [15 cm]

alt text

Slicing OSD Data = Map of Soil Color [25 cm]

alt text

Summarize Clay vs Depth by Geology

pedons <- fetchNASIS()
# ... details on generalization of geologic classes ommitted ...
# aggregate by major geologic type, default slab.fun = hdquantile
a <- slab(pedons, generalized_bedrock ~ clay)

# plot with lattice graphics
xyplot(top ~ p.q50 | generalized_bedrock, upper=a$p.q75, lower=a$p.q25, data=a, ylim=c(180,-5), ylab='Depth (cm)', xlab='Clay Content (%)', strip=strip.custom(bg=grey(0.85)), as.table=TRUE, panel=panel.depth_function, prepanel=prepanel.depth_function, scales=list(y=list(tick.number=7, alternating=3), x=list(alternating=1)), subset=variable == 'clay', layout=c(6,1), cf=a$contributing_fraction, sync.colors=TRUE, alpha=0.25)

alt text

→ slab() manual page

Wrap-Up

getting soils data can be difficult and time-consuming
DSM requires considerable data processing
ESI/ESD work requires detailed analysis of soil profile collections or aggregate soils data

AQP can help:

“fetch” your data from NASIS, KSSL, SDA, SCAN, SoilWeb, etc.
SoilProfileCollection objects are powerful
innovative constructs: slice(), slab(), SPC[i,j]
pair-wise dissimilarity: profile_compare()
visual comparisons and non-parametric summaries vs. hypothesis testing

→ ideas and code welcome, plenty of room for improvement

Thank You

Questions, comments, ideas?

AQP Contributors:

Pierre Roudier (Landcare Research)
Jay Skovlin (USDA-NRCS)
Stephen Roecker (USDA-NRCS)

Join the fun at the AQP r-forge site