AQP + DSM = ESD ?

Dylan Beaudette (USDA-NRCS Sonora, CA)
October 7, 2014











This document is based on aqp version 1.7-6, soilDB version 1.3-5, and sharpshootR version 0.6-7.

Caveat emptor

  • I don't know all that much about ESI / ESD.
  • Dog and pony show– live demonstration more interesting.
  • Be sure to follow embedded links to full documentation.
  • Check out the AQP project website.
  • You can reproduce much of this document with:
# install stable versions
install.packages('aqp', dep=TRUE) 
install.packages('soilDB', dep=TRUE)
install.packages('sharpshootR', dep=TRUE)

# install latest versions from R-Forge:
install.packages('aqp', repos="http://R-Forge.R-project.org", type='source')
install.packages('soilDB', repos="http://R-Forge.R-project.org", type='source')
install.packages('sharpshootR', repos="http://R-Forge.R-project.org", type='source')
# load libraries
library(aqp)
library(soilDB)
library(sharpshootR)
library(lattice)
library(plyr)
library(Hmisc)

# set options
options(width=100, stringsAsFactors=FALSE)

Preaching to the Choir: why R?

  • repeatable, self-documenting work:

    • uni-variate / multivariate summaries
    • graphical representations of complex data
    • 2800+ packages on CRAN: 100+ packages on GIS, ecology, and soils!
  • algorithm development by experts, application by trained users

    • testing by eye: simple graphical “grammar” used to plot data
    • formalized testing: correlation, regression, classification, ordination, …
    • I/O capabilities: file, URL, SOAP, SQL, ODBC, PDF, PNG, SHP, KML, …
    • optimizers, matrix operations, custom data structures, …

Outline

  • AQP family of packages

    • aqp
    • soilDB
    • sharpshootR
  • Examples

    • soil profile simulation
    • profile sketches
    • “slicing”
    • aggregation
  • Details on numerical classification algorithm (?)

  • Live demo (?)

aqp package: Algorithms for Quantitative Pedology

a “vocabulary” for soil data analysis

  • special data structures: avoids annoying book-keeping code
  • visualization: soil profile sketches
  • aggregation: depth-slice summaries
  • classification: pair-wise dissimilarity of profiles

alt text

→ aqp manual page

soilDB package: Soil Database Interface

access to soil data should be simple, often it is not

  • PedonPC: pedon data
  • NASIS: pedon, component, map unit, ESD, etc.
  • KSSL: lab data
  • SDA: “live” SSURGO tabular data
  • OSD: official series descriptions
  • SSURGO: main soil survey product (spatial + tabular)
  • SoilWeb: snapshot-based window into the above
  • SCAN: soil climate network
# soilDB one-liners
auburn.lab <- fetchKSSL("auburn")  # KSSL data
auburn.extent <- seriesExtent("auburn")  # SSURGO-based series extent map
auburn.compdata <- SDA_query("<SQL>")  # SSURGO-based tabular data

→ soilDB manual page

sharpshootR package

prototypes, templates, and ideas to be built-upon

like this: example

→ sharpshootR manual page

SoilProfileCollection Objects

custom datatype to store/access hierarchy of soil profile information

# sample dataset, extracted from NASIS
library(soilDB)
data(loafercreek)
str(loafercreek, 2)
Formal class 'SoilProfileCollection' [package "aqp"] with 7 slots
  ..@ idcol     : chr "peiid"
  ..@ depthcols : chr [1:2] "hzdept" "hzdepb"
  ..@ metadata  :'data.frame':  1 obs. of  1 variable:
  ..@ horizons  :'data.frame':  308 obs. of  34 variables:
  ..@ site      :'data.frame':  54 obs. of  59 variables:
  ..@ sp        :Formal class 'SpatialPoints' [package "sp"] with 3 slots
  ..@ diagnostic:'data.frame':  177 obs. of  4 variables:

plot of chunk SPC-2

SoilProfileCollection Objects

Typical pedon/site data:

id, top, bottom, name, group
1,  0,   10,     A,    g1
1,  10,  18,     AB,   g1
          ...
2,  12,   22,    E,    g2
2,  22,   45,    Bhs1, g2

Converting data.frameSoilProfileCollection:

x <- read.csv(file = ...)
# promote to SoilProfileCollection
depths(x) <- id ~ top + bottom
# move 'site data' into @site
site(x) <- ~group

Functions that return SoilProfileCollection:

x <- fetchOSD()
x <- fetchKSSL()
x <- fetchPedonPC()
x <- fetchNASIS()
x <- fetchNASIS_component_data()

→ SoilProfileCollection tutorial

SoilProfileCollection Objects

  • object inspection
idname(sp4) # pedon ID name
horizonDepths(sp4) # colum names containing top and bottom depths
depth_units(sp4) # defaults to 'cm'
metadata(sp4) # data.frame with 1 row
profile_id(sp4) # vector of profile IDs
  • overloads to common functions
length(sp4) # number of profiles in the collection
nrow(sp4) # number of horizons in the collection
names(sp4) # column names from site and horizon data
min(sp4) # shallowest profile depth in collection
max(sp4) # deepest profile depth in collection
sp4[i, j] # get profile "i", horizon "j"
  • getting / setting of components
horizons(sp4) # get / set horizon data
site(sp4)  # get / set site data
diagnostic_hz(sp4) # get / set diagnostic horizons
proj4string(sp4) # get / set CRS
coordinates(sp4) # get / set coordinates
  • coercion to SpatialPointsDataFrame or data.frame
as(sp4, 'SpatialPointsDataFrame')
as(sp4, 'data.frame')

Plotting SoilProfileCollection Objects

flexible generation of soil profile sketches using “base graphics”

alt text

→ plotSPC() manual page

Slice-Wise Aggregation

aggregation along regular “depth-slices” and within groups

alt text

→ slab() manual page

Pair-Wise Dissimilarity [see related talk]

soilDB: OSD Summaries

basic morphologic / taxonomic data from OSD and SC databases [SoilWeb]

# soils of interest
s.list <- c('hornitos', 'argonaut', 'mokelumne', 'dunstone', 'auburn', 'pentz', 'pardee', 'peters', 'amador', 'laniger')

# fetch data from SoilWeb server and return as SoilProfileCollection
s <- fetchOSD(s.list)

# plot
par(mar=c(0,0,0,0))
plot(s, name='hzname', id.style='side', cex.name=0.75, axis.line.offset=-4.5)

plot of chunk soilDB-OSD

soilDB: Soil Series Extent

simplified series extent maps [SoilWeb]

amador <- seriesExtent("amador")  # result is a SpatialPolygonsDataFrame
writeOGR(amador, driver = "ESRI Shapefile", ...)  # save to SHP

alt text

→ series extent tutorial

soilDB: KSSL Data

characterization and current taxonomic data [SoilWeb]

musick <- fetchKSSL("musick")  # result is a SoilProfileCollection

alt text

→ fetchKSSL() tutorial

sharpshootR: Soil Taxonomy Visualization

profile sketches organized by taxonomy

# soils of interest
s.list <- c("hornitos", "perkins", "argonaut", "inks", "mokelumne", "dunstone", "auburn", "pentz", "pardee", 
    "peters", "amador", "laniger")

# fetch data from SoilWeb server and return as SoilProfileCollection
s <- fetchOSD(s.list)

# organize and plot according to subgroup level taxonomic data
SoilTaxonomyDendrogram(s, cex.taxon.labels = 0.8)

plot of chunk sharpshootR-1

→ SoilTaxonomyDendrogram() manual page

sharpshootR: Component Relationships

investigate the relationship between component data from SSURGO / NASIS

# get data from SDA
q <- "SELECT component.mukey, comppct_r, lower(compname) as compname 
FROM legend 
INNER JOIN mapunit ON mapunit.lkey = legend.lkey 
LEFT OUTER JOIN component ON component.mukey = mapunit.mukey 
WHERE legend.areasymbol IN ('CA654') AND compkind IN ('Series', 'Taxadjunct')
ORDER BY mukey, comppct_r DESC"

# run query, process results, and return as data.frame object
res <- SDA_query(q)
# compute adjacency matrix
m <- component.adj.matrix(res)
# plot component relationships as network diagram
par(mar = c(0, 0, 2, 0))
plotSoilRelationGraph(m)
title("CA654 Components")

→ SDA_query() manual page
→ component.adj.matrix() manual page
→ plotSoilRelationGraph() manual page

plot of chunk sharpshootR-comp-relationship-3

sharpshootR: Component Relationships

generate hillslope position probability matrix by series name

top.10 <- names(sort(table(res$compname), decreasing = TRUE)[1:10])
hp <- hillslope.probability(top.10)
      compname Toeslope Footslope Backslope Shoulder Summit
1     ahwahnee       NA        NA      0.80       NA     NA
2      auberry       NA      0.02      0.73     0.01   0.02
3   blasingame       NA      0.02      0.83       NA   0.02
4    fallbrook       NA        NA      1.00       NA     NA
5  grangeville     0.48      0.24      0.04       NA     NA
6      hanford     0.45      0.25      0.13       NA     NA
7     hesperia     0.07      0.45      0.29       NA     NA
8  san joaquin     0.42      0.05      0.07     0.21   0.01
9       sierra       NA        NA      0.64     0.05   0.01
10       vista       NA      0.08      0.73       NA   0.02

Potential uses:

  • disaggregation clues
  • stratification / weighting for sampling mission
  • survey update work / ESD work

→ hillslope.probability() manual page

OK, So What?

how about some examples

  • simulating data to feed / test models
sim(SPC, n=10, hz.sd=2)
random_profile(id, n=c(3, 4, 5), min_thick=5, max_thick=30, n_prop=5)
  • profile sketches ordered by meaningful gradient
plot(SPC, plot.order=new.order)
  • applying functions by profile
profileApply(SPC, <function>)
  • “slicing”: depth-wise alignment and extraction of data
slice(SPC, 0:50 ~ sand + silt + clay)
  • aggregating by “slab”: group / depth-wise summaries
# assuming no NA
slab(SPC, ~ sand + silt + clay, slab.fun=mean)
slab(SPC, ~ sand + silt + clay, slab.structure=c(0,10), slab.fun=mean)
slab(SPC, group ~ sand + silt + clay, slab.fun=mean)

Simulated Data: Horizon Depths and Designations

# source data are a single profile description of the Morley series as a data.frame
depths(b) <- id ~ top + bottom
# convert horizon colors into RGB
b$soil_color <- munsell2rgb(b$hue, b$value, b$chroma)
# simulate 15 profiles based on reported horizon thickness standard deviations 
b.sim <- sim(b, n=15, hz.sd=c(2,1,1,2,4,2,2,4))
# set depth units to inches
depth_units(b.sim) <- 'in'

alt text

→ sim() manual page

Simulated Data: Physical Properties

simulation based on a random walk– similar to highly stratified soils

# implicit loop via plyr::ldply, result is a data.frame
d <- ldply(1:10, random_profile, n=c(6, 7, 8), n_prop=1, method='random_walk')
# promote to SoilProfileCollection and plot
depths(d) <- id ~ top + bottom
par(mar=c(0,0,3,0))
plot(d, color='p1', axis.line.offset=-4, max.depth=150)

plot of chunk simulate-profiles-2

→ random_profile() manual page

Simulated Data: Physical Properties

simulation based on the logistic power peak function– more realistic anisotropy

# implicit loop via plyr::ldply, result is a data.frame
d <- ldply(1:10, random_profile, n=c(6, 7, 8), n_prop=1, method='LPP', 
lpp.a=5, lpp.b=10, lpp.d=5, lpp.e=5, lpp.u=25)
# promote to SoilProfileCollection and plot
depths(d) <- id ~ top + bottom
par(mar=c(0,0,3,0))
plot(d, color='p1', axis.line.offset=-4, max.depth=150)

plot of chunk simulate-profiles-3

→ random_profile() manual page

Profile Sketches: Sierra Transects

data and concept from Dahlgren et al. and Rasmussen et al.

x.g <- read.csv('dahlgren-granitics.csv', stringsAsFactors=FALSE)
x.a <- read.csv(file='rasmussen-andisitic-lahar.csv', stringsAsFactors=FALSE)

# convert colors
x.g$soil_color <- with(x.g, munsell2rgb(hue, value, chroma))
x.a$soil_color <- with(x.a, munsell2rgb(hue, value, chroma))

# init Soill Profile Collection objects
depths(x.g) <- id ~ top + bottom
site(x.g) <- ~ elev + MAAT + MAP + geo
# init Soill Profile Collection objects
depths(x.a) <- id ~ top + bottom
site(x.a) <- ~ elev + precip + MAP + MAT + veg + Fe_d_to_Fe_t

# index ordering to elevation in meters
g.new.order <- order(x.g$elev)
a.new.order <- order(x.a$elev)

# plot first figure
par(mfcol=c(1,2), mar=c(3,0,0,0))
plot(x.g, name='name', plot.order=g.new.order, cex.name=0.75, axis.line.offset=-4, id.style='side')
axis(1, at=1:length(x.g), labels=x.g$elev[g.new.order], line=-2)
# plot second figure
plot(x.a, name='name', plot.order=a.new.order, cex.name=0.75,  axis.line.offset=-4, id.style='side')
axis(1, at=1:length(x.a), labels=x.a$elev[a.new.order], line=-2)

Profile Sketches: Sierra Transects

pedogenic “sweet spot” along bio-climatic gradient

alt text

Magnesic Soils of California

data From McGahan et al.

# load sample dataset, comes with aqp package
data(sp4)
# inspect first 4 rows x 12 columns
sp4[1:4, 1:12]
      id name top bottom   K   Mg  Ca CEC_7 ex_Ca_to_Mg sand silt clay
1 colusa    A   0      3 0.3 25.7 9.0  23.0        0.35   46   33   21
2 colusa  ABt   3      8 0.2 23.7 5.6  21.4        0.23   42   31   27
3 colusa  Bt1   8     30 0.1 23.2 1.9  23.7        0.08   40   28   32
4 colusa  Bt2  30     42 0.1 44.3 0.3  43.0        0.01   27   18   55
# upgrade to SoilProfileCollection
depths(sp4) <- id ~ top + bottom

# custom function for computing hz-thick wt. mean, accounting for missing data
wt.mean.ca.mg <- function(i) {
    # use horizon thickness as a weight
    thick <- i$bottom - i$top
    # function is from the Hmisc package
    m <- wtd.mean(i$ex_Ca_to_Mg, weights=thick, na.rm=TRUE)
    return(m)
    }

# apply custom function to each profile, save as "site-level" attribute
sp4$wt.mean.ca.to.mg <- profileApply(sp4, wt.mean.ca.mg)

# generate index ordering from small -> large Ca:Mg
new.order <- order(sp4$wt.mean.ca.to.mg)

→ profileApply() manual page

Magnesic Soils of California

# plot the data using our new order based on Ca:Mg
par(mar=c(4,0,3,0))
plot(sp4, name='name', color='ex_Ca_to_Mg', plot.order=new.order, cex.name=0.75, id.style='side', axis.line.offset=-4,)

# add an axis labeled with the sorting criteria
axis(1, at=1:length(sp4), labels=round(sp4$wt.mean.ca.to.mg, 3), cex.axis=1)
mtext(1, line=2.25, text='Horizon Thickness Weighted Mean Ex. Ca:Mg', cex=1)

plot of chunk magnesic-soils-2

→ sp4 sample data set

Slicing OSD Data = Map of Soil Color

# read in OSD color data as CSV
x <- read.csv('data/osd-colors-original.csv.gz')
Ap,0,20,10YR,4,4,10YR,3,4,CECIL
Bt1,20,66,10R,4,8,10R,3,7,CECIL
Bt2,66,107,10R,4,8,10R,3,7,CECIL
BC,107,127,2.5YR,4,8,2.5YR,3,7,CECIL
C,127,203,2.5YR,4,8,2.5YR,3,7,CECIL
# re-order by series, then depth
x <- x[order(x$series, x$top), ]

# convert Munsell to RGB
x.rgb <- with(x, munsell2rgb(matrix_wet_color_hue, 
                             matrix_wet_color_value, 
                             matrix_wet_color_chroma, 
                             return_triplets=TRUE))

# init SoilProfileCollection object
depths(g) <- series ~ top + bottom

# slice at specific depths, keeping only  r, g, b
# ignore bad horizonation with strict=FALSE
g.slices <- slice(g, c(5, 10, 15, 25) ~ r + g + b, just.the.data=TRUE, strict=FALSE)

# save depth slices as CSV files, import into GIS and make map
write.csv(g.slices[g.slices$top == 5, c('series', 'r', 'g', 'b')], ...)
write.csv(g.slices[g.slices$top == 10, c('series', 'r', 'g', 'b')], ...)
write.csv(g.slices[g.slices$top == 15, c('series', 'r', 'g', 'b')], ...)
write.csv(g.slices[g.slices$top == 25, c('series', 'r', 'g', 'b')], ...)

→ slice() manual page

Slicing OSD Data = Map of Soil Color [5 cm]

alt text

Slicing OSD Data = Map of Soil Color [10 cm]

alt text

Slicing OSD Data = Map of Soil Color [15 cm]

alt text

Slicing OSD Data = Map of Soil Color [25 cm]

alt text

Summarize Clay vs Depth by Geology

pedons <- fetchNASIS()
# ... details on generalization of geologic classes ommitted ...
# aggregate by major geologic type, default slab.fun = hdquantile
a <- slab(pedons, generalized_bedrock ~ clay)

# plot with lattice graphics
xyplot(top ~ p.q50 | generalized_bedrock, upper=a$p.q75, lower=a$p.q25, data=a, ylim=c(180,-5), ylab='Depth (cm)', xlab='Clay Content (%)', strip=strip.custom(bg=grey(0.85)), as.table=TRUE, panel=panel.depth_function, prepanel=prepanel.depth_function, scales=list(y=list(tick.number=7, alternating=3), x=list(alternating=1)), subset=variable == 'clay', layout=c(6,1), cf=a$contributing_fraction, sync.colors=TRUE, alpha=0.25)

alt text

→ slab() manual page

Wrap-Up

  • getting soils data can be difficult and time-consuming
  • DSM requires considerable data processing
  • ESI/ESD work requires detailed analysis of soil profile collections or aggregate soils data

AQP can help:

  • “fetch” your data from NASIS, KSSL, SDA, SCAN, SoilWeb, etc.
  • SoilProfileCollection objects are powerful
  • innovative constructs: slice(), slab(), SPC[i,j]
  • pair-wise dissimilarity: profile_compare()
  • visual comparisons and non-parametric summaries vs. hypothesis testing

→  ideas and code welcome, plenty of room for improvement

Thank You

Questions, comments, ideas?





AQP Contributors:

  • Pierre Roudier (Landcare Research)
  • Jay Skovlin (USDA-NRCS)
  • Stephen Roecker (USDA-NRCS)


Join the fun at the AQP r-forge site