Introduction

This document demonstrates how to use the soilDB package to download RaCA via SoilWeb. Comparisons are made via graphical summaries of key soil properties with depth, using data structures and functions from the aqp package.

The SoilWeb RaCA Query API

RaCA data are queried via HTTP request to the SoilWeb server, with query parameters encoded as URL parameters. For example, by appending several parameters to the base URL it is possible to query “site-level” data associated with soils correlated to the Auburn series like this:

http://casoilresource.lawr.ucdavis.edu/soil_web/rca/rca_query.php?what=site&series=auburn

There are 5 URL parameters that can be used to refine RaCA data queries:

  • what=site ——query only the site-level data

  • series=auburn ——return only those data correlated to the Auburn series

  • bbox=-120,37,-122,38 ——return only those data within the specified bounding box, constrained to a 5-degree block

  • state=ca ——return only those data within California

  • plaintext=1 ——return data as plain text instead of gzipped text

The what parameter determines which chunk of data will be returned: * site: site and pedon level data * horizon: horizon level data * veg: vegetation information recorded at each site (relative abundence of 6 most common species) * trees: tree height(feet) and DBH(inches) measured on the first for trees in using RaCA procdures * stock: cumulative soil organic carbon stocks (Mg per ha) to standardized depths * sample: measured/modeled bulk density and carbon concentration at the sample-level * vnir: VNIR spectra [queries by soil series or BBOX only]

Getting RaCA Data in R

The soilDB package abstracts URL-based queries for RaCA data into a single function, fetchRaCA(). Calling this function with user-defined query paramters, e.g. fetchRaCA(series='auburn'), will download all data (site, horizon, veg, trees, stock) associated with the Auburn series and return the results as a list. VNIR spectra can be very large and are therefore not downloaded by default.

Installation

With a recent version of R (>= 2.15), it is possible to get all of the packages that this tutorial depends on via:

# run these commands in the R console
install.packages('soilDB', dep=TRUE)

Example Usage

Data can be queried by two-letter state code, fuzzy-matching of series name, or using a geographic bounding-box in WGS84 referenced coordinates. See ?fetchRaCA() for details.

# load libraries
library(soilDB)
require(aqp)
library(plyr)
library(maps)
library(lattice)
library(reshape2)
library(cluster)

# define a geographic bounding box
bbox <- c(-120, 37, -122, 38)

# fetch data by soil series name
r.series <- fetchRaCA(series='auburn')

# fetch data by bounding box
r.bbox <- fetchRaCA(bbox=bbox)

# fetch data by state
r.state <- fetchRaCA(state='ca')

# map each set of fetched data
par(mar=c(0,0,0,0))
map('county', 'california')
points(y ~ x, data=site(r.state$pedons), col='RoyalBlue', cex=0.5, pch=1)
points(y ~ x, data=site(r.bbox$pedons), col='DarkGreen', cex=0.8, pch=15)
points(y ~ x, data=site(r.series$pedons), col='DarkRed', cex=0.8, pch=16)
# add bounding box
rect(bbox[1], bbox[2], bbox[3], bbox[4])
# add long/lat axes
map.axes()
# add a simple legend
legend('topright', pch=c(1, 15, 16), col=c('RoyalBlue', 'DarkGreen', 'DarkRed'), legend=c('state', 'bbox', 'series'), bty='n', title='query by:')

Exploring the Results

A list of objects and tables (data.frame objects) is returned by fetchRaCA(). Elements can be accessed using the $ operator.

# check structure
str(r.bbox, 2)
## List of 6
##  $ pedons :Formal class 'SoilProfileCollection' [package "aqp"] with 9 slots
##  $ trees  :'data.frame': 28 obs. of  3 variables:
##   ..$ siteiid: int [1:28] 605539 605539 605539 605539 605563 605563 605563 605563 605642 605642 ...
##   ..$ dbh    : num [1:28] 11.5 25 48 50 8 ...
##   ..$ height : num [1:28] 36 39 45 54 30 ...
##  $ veg    :'data.frame': 88 obs. of  6 variables:
##   ..$ siteiid              : int [1:88] 605539 605539 605539 605539 605539 605539 605551 605551 605551 605551 ...
##   ..$ plantsym             : chr [1:88] "AVFA" "ERODI" "LUPIN" "MONTI" ...
##   ..$ plantsciname         : chr [1:88] "Avena fatua" "Erodium" "Lupinus" "Montia" ...
##   ..$ plantnatvernm        : chr [1:88] "wild oat" "storks bill" "lupine" "minerslettuce" ...
##   ..$ vegetationstratalevel: logi [1:88] NA NA NA NA NA NA ...
##   ..$ orderofdominance     : logi [1:88] NA NA NA NA NA NA ...
##  $ stock  :'data.frame': 204 obs. of  13 variables:
##   ..$ rcapid     : chr [1:204] "A0208P941" "A0208P942" "A0208P943" "A0208P944" ...
##   ..$ value_5cm  : num [1:204] 17.8 18.4 14.9 22.8 23.2 ...
##   ..$ sd_5cm     : num [1:204] 11.2 12.2 10.1 14.6 15 ...
##   ..$ value_10cm : num [1:204] 41.5 32.9 40 45.1 40.6 ...
##   ..$ sd_10cm    : num [1:204] 17.7 14.6 18 20.3 19.8 ...
##   ..$ value_20cm : num [1:204] 90.4 58.7 79.2 89.1 77.6 ...
##   ..$ sd_20cm    : num [1:204] 44.6 28.2 39.4 41.2 38.4 ...
##   ..$ value_30cm : num [1:204] 125 85.1 112.2 127.5 114.2 ...
##   ..$ sd_30cm    : num [1:204] 52.4 32.5 44.3 48.9 51.1 ...
##   ..$ value_50cm : num [1:204] 198 139 173 206 196 ...
##   ..$ sd_50cm    : num [1:204] 77.9 55.6 75.6 85 87.1 ...
##   ..$ value_100cm: num [1:204] 248 198 246 255 302 ...
##   ..$ sd_100cm   : num [1:204] 90.9 75.7 91.1 91.4 107.7 ...
##  $ sample :'data.frame': 938 obs. of  16 variables:
##   ..$ sample_id    : int [1:938] 12936 12932 12933 12934 12935 12938 12937 12941 12940 12939 ...
##   ..$ rcapid       : chr [1:938] "A0208P941" "A0208P941" "A0208P941" "A0208P941" ...
##   ..$ sample_top   : int [1:938] 69 0 5 19 54 5 0 72 55 21 ...
##   ..$ sample_bottom: int [1:938] 69 5 19 54 69 21 5 72 72 55 ...
##   ..$ hzname       : chr [1:938] "Btqm" "A1" "A2" "BAt" ...
##   ..$ texture      : chr [1:938] "" "sl" "sl" "sl" ...
##   ..$ frag_vol     : int [1:938] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ bd           : num [1:938] 1.29 1.29 1.71 1.6 1.37 ...
##   ..$ bd_sd        : num [1:938] 0.2085 0.0652 0.0652 0.0652 0.2085 ...
##   ..$ bd_source    : chr [1:938] "modeled" "measured" "measured" "measured" ...
##   ..$ bd_method    : chr [1:938] "" "scoop" "core, constant volume" "core, constant volume" ...
##   ..$ soc          : num [1:938] 1.19 2.82 2.87 2.22 2.02 ...
##   ..$ soc_sd       : num [1:938] 2.17 1.75 1.78 1.38 1.25 ...
##   ..$ soc_measured : chr [1:938] "modeled" "modeled" "modeled" "modeled" ...
##   ..$ soc_scan     : logi [1:938] NA NA NA NA NA NA ...
##   ..$ soc_lab      : num [1:938] NA NA NA NA NA NA NA NA NA NA ...
##  $ spectra: NULL

Site/Pedon Data

Site and pedon data are returned as a SoilProfileCollection object within the pedons list element.

# extract site/pedon data
s <- r.series$pedons

# how many pedons in the collection?
length(s)

# how many pedons / RCA site?
table(s$rcasiteid)

# generate soil colors from moist sRGB
s$soil_color <- rep(NA, times=nrow(horizons(s)))
idx <- complete.cases(s$m_r)
s$soil_color[idx] <- with(horizons(s)[idx, ], rgb(m_r, m_g, m_b)) # moist colors

# plot first 20 pedons
par(mar=c(0,0,0,0))
plot(s[1:20, ], name='hzname', max.depth=100)

Vegetation Data

# tabulate vegetation data across the entire collection of RaCA data queried by BBOX
sort(table(r.bbox$veg$plantnatvernm), decreasing=TRUE)
## 
##                 wild oat              storks bill                 blue oak               soft brome 
##                        7                        6                        5                        5 
## California foothill pine        interior live oak       Pacific poison oak             ripgut brome 
##                        3                        3                        3                        3 
##                    brome     California black oak                   fescue                   lupine 
##                        2                        2                        2                        2 
##            minerslettuce             mouse barley              slender oat                  thistle 
##                        2                        2                        2                        2 
##               wheatgrass             Bermudagrass            bigleaf maple            black mustard 
##                        2                        1                        1                        1 
##                bluegrass   bristly dogstail grass                 brodiaea             bull thistle 
##                        1                        1                        1                        1 
##       California juniper        California laurel      California live oak                   clover 
##                        1                        1                        1                        1 
##   coastal sage scrub oak        coastal sagebrush                     corn               curly dock 
##                        1                        1                        1                        1 
##                     dock            Grass; annual            incense cedar         Italian ryegrass 
##                        1                        1                        1                        1 
##                 knotweed                 live oak               medusahead             monkeyflower 
##                        1                        1                        1                        1 
##                  mullein                      oat           ponderosa pine                  redwood 
##                        1                        1                        1                        1 
##                    sedge               sugar pine                   tanoak                  tarweed 
##                        1                        1                        1                        1 
##                    vetch                   walnut      western brackenfern               woodsorrel 
##                        1                        1                        1                        1
# aggregate tree information, by RaCA site
ddply(r.bbox$trees, 'siteiid', summarize, mean.dbh=mean(dbh), mean.height=mean(height), n=length(siteiid))
##   siteiid mean.dbh mean.height n
## 1  605539   33.625      43.500 4
## 2  605563   14.500      36.750 4
## 3  605642   11.575      17.475 4
## 4  605645   26.200      72.350 4
## 5  682304   20.250      68.250 4
## 6  682305   21.750      34.500 4
## 7  682445   21.875      42.875 4

Soil Organic Carbon Concentrations

# extract sample data horizon-level carbon and bulk density
s <- r.series$sample

# init SoilProfileCollection
depths(s) <- rcapid ~ sample_top + sample_bottom

# slice-wise aggregation over entire collection
a <- slab(s, ~ soc)

# plot
xyplot(
  top ~ p.q50, data=a, main='Auburn', ylab='Depth (cm)', xlab='Soil Organic Carbon Concentration (%)',
  lower=a$p.q5, upper=a$p.q95, sync.colors=TRUE, alpha=0.3,
  ylim=c(60,-5), asp=1.5,
  par.settings=list(superpose.line=list(lwd=2, col=c('RoyalBlue'))),
  panel=panel.depth_function, 
  prepanel=prepanel.depth_function,
  auto.key=list(columns=2, lines=TRUE, points=FALSE)
)

Soil Organic Carbon Stock Data

# join carbon stock estimates to SPC object
# these are site-level data
s <- r.bbox$pedons
site(s) <- r.bbox$stock

# compute mean SOC stock to 100cm by RCA site:
round(tapply(s$value_100cm, s$rcasiteid, mean))
## A0208P94 A0215P99 C0204C01 C0204F12 C0204R04 C0204R13 C0204R18 C0205C05 C0205C06 C0205C07 C0205R01 
##      250      161      248      157      173      177       90      513      153      217      270 
## C0206C05 C0206W02 C0206W04 C0208C02 C0208C05 C0208F11 C0208P02 C0208R10 C0209C01 C0209C04 C0209F04 
##      404      272      152      270      320      411      281      131      179      494      338 
## C0209F09 C0209F17 C0209F24 C0209R16 C0209R23 C0212C02 C0212C05 C0212C06 C0212C08 C0213C06 C0213C14 
##      168      275      511      327      169      266      923      260      315      277      199 
## C0215C07 C0215F06 C0218P02 C0218P05 C0219C05 C0219F04 C0219P04 C0219R01 
##      290      260      460      307      225      253      339      168
# compute mean SOC stock to 100cm by series:
round(tapply(s$value_100cm, s$taxonname, mean))
##         Altamont           Auburn       Ben Lomond          Briones         Campbell 
##              169              129              399              270              404 
##           Cometa          Cropley            Delhi           Felton           Fresno 
##              250              320              294              640              225 
##          Gonzaga          Hanford            Itano Josephine family          Katykat 
##              260              339              199              253              411 
##          Kingile          Lompico        Los Gatos           Maymen       Poterville 
##              260              439              253              157              161 
##           Rindge      San Andreas      San Joaquin          Shinkee       Stanislaus 
##              923              301              266              266              290 
##           Temple         Triangle          Tujunga           Valdez       Vallecitos 
##              384              212              494              277              161 
##           Webile        Zacharias 
##              315              179
# convert taxonname to a factor
s$taxonname <- factor(s$taxonname)

# re-level based on median C stock
stock.asc.order <- order(tapply(s$value_100cm, s$taxonname, median))
new.levels <- levels(s$taxonname)[stock.asc.order]
s$taxonname <- factor(s$taxonname, levels=new.levels)

# visualize
bwplot(taxonname ~ value_100cm, data=site(s), xlab='Carbon Stock to 100cm (Mg per ha)')

Depth-Wise Aggregation of Carbon Stock Data

m <- melt(r.series$stock, id.vars='rcapid', measure.vars=c(2,4,6,8,10,12))

# fix c stock variable name
names(m)[3] <- 'c.stock'

# generate depths from variable names
m$top <- NA
m$bottom <- NA

# sequence of possible top/bottom boundaries
m.top <- c(0,5,10,20,30,50)
m.bottom <- c(5,10,20,30,50,100)

# generate an index to real depths
# this only works because the factor levels are in the correct order
mm <- as.numeric(m$variable)

# assign real depths
m$top <- m.top[mm]
m$bottom <- m.bottom[mm]

# promote to SPC
depths(m) <- rcapid ~ top + bottom

# simple aggregation of the entire collection: note that this doesn't take into account error estimates...
a <- slab(m, ~ c.stock)

# plot aggregate data
xyplot(
  top ~ p.q50, data=a, main='Auburn', ylab='Depth (cm)', xlab='Cumulative Carbon Stock (Mg per ha)',
  lower=a$p.q5, upper=a$p.q95, sync.colors=TRUE, alpha=0.3,
  ylim=c(105,-5), asp=1.5,
  par.settings=list(superpose.line=list(lwd=2, col=c('RoyalBlue'))),
  panel=panel.depth_function, 
  prepanel=prepanel.depth_function,
  auto.key=list(columns=2, lines=TRUE, points=FALSE)
)

VNIR Spectra

VNIR spectra are large, and therefore not downloaded by default. The get.vnir=TRUE argument to fetchRaCA() will download VNIR spectra associated with the query results.

s <- fetchRaCA(series='auburn', get.vnir=TRUE)

# plot the spectra
par(mar=c(0,0,0,0))
matplot(t(s$spectra), type='l', lty=1, col=rgb(0, 0, 0, alpha=0.25), ylab='', xlab='', axes=FALSE)
box()

# Q-mode PCA of spectra
pca <- prcomp(s$spectra, center=TRUE, scale=TRUE)
summary(pca)

# cluster on first 3 principal components into 3 hard groupings via PAM
spectra.pam <- pam(predict(pca)[, 1:3], 3)

# re-plot spectra, colored by 3-class clustering
cols <- rgb(t(col2rgb(spectra.pam$clustering)), maxColorValue=255, alpha=100)
matplot(t(s$spectra), type='l', lty=1, col=cols, ylab='', xlab='', axes=FALSE)
box()

Querying and loading spectra

This example demonstrates how to load RaCA data for a regional area, subset it to specific soil taxa based on pattern matching, and load the spectra for a specific set of taxa.

#load data by state without spectra
r.state <- fetchRaCA(state='id')

# subset and get rcasiteid's for sites with volcanic ash, andic and vitrandic subgroups
idx <- grep('andic', r.state$pedons$tax_subgroup)
length(idx)
print(table(r.state$pedons$tax_subgroup[idx]))
print(table(r.state$pedons$rcasiteid[idx]))

rcaIDs <- r.state$pedons$rcasiteid[idx]
rcIDs <- paste(rcaIDs, sep=', ')
#r.state <- fetchRaCA(rcasiteid = rcIDs)

# load spectra for specific RaCA siteIDs for the subset above
s <- fetchRaCA(rcasiteid = rcIDs, get.vnir=TRUE)

Working with VNIR spectra as SpectralDataFrames

A very useful construct called a SpectralDataFrame and associated operations can be found in the spectacles package. We will promote the downloaded RaCA spectra to a SpectralDataFrame by joining in associated data and attribute variables. Once promoted to a SpectralDataFrame they can be split by horizon designation name and then aggregated to a mean spectra for each set of similarly designated horizons. Applying a logical genetic horizonation order (from soil surface to subsurface) to the aggregated spectra makes it easier to compare how the spectra vary with depth.

library(ggplot2)
library(spectacles)
library(RColorBrewer)

# convert spectra matrix to dataframe
d <- as.data.frame(s$spectra)

# assign rownames to id column
d$id <- rownames(d)

# create Spectra from the dataframe
spectra(d) <- id ~ 350:2500

# subset some data elements from the sample dataframe
s1 <- s$sample[ , c('sample_id','rcapid','hzname','sample_top','sample_bottom')]
rownames(s1) <- s1$sample_id
names(s1) <- c('id', 'rcapid', 'hzname','sample_top','sample_bottom')

# promote Spectra to a SpectralDataFrame object by joining in the associated data
features(d, key='id') <- s1

# split data into lists on horizon names (hzname)
d1 <- split(d, d$hzname)
print(table(d$hzname))
# plot subset of 'A' horizons
plot(d1$A)

# aggregate spectra by taking the mean values across spectra within each hzname group
d_ag <- aggregate_spectra(d, fun = mean, id = "hzname")

# melt format for plotting with ggplot
d2 <- melt_spectra(d_ag, attr = c('hzname','sample_top'))

# set a logical horizon order for plotting
hz_ordered <- c('Oi', 'Oe', 'A', 'Bw1', 'Bw2', '2Bt', '2Bw','2Bw2', '2Bw3', '2Bw4', '2BC', '2C', '2C1', '2C2')
d2$hzname <- factor(d2$hzname, levels = hz_ordered)

# plot
ggplot(d2) + geom_line(aes(x = wl, y = nir, colour = hzname)) 


This document is based on aqp version 1.25 and soilDB version 2.5.9.