Getting RaCA Data in R

The soilDB package abstracts URL-based queries for RaCA data into a single function, fetchRaCA(). Calling this function with user-defined query paramters, e.g. fetchRaCA(series='auburn'), will download all data (site, horizon, veg, trees, stock) associated with the Auburn series and return the results as a list. VNIR spectra can be very large and are therefore not downloaded by default.

Installation

With a recent version of R (>= 2.15), it is possible to get all of the packages that this tutorial depends on via:

# run these commands in the R console
install.packages('soilDB', dep=TRUE)

Example Usage

Data can be queried by two-letter state code, fuzzy-matching of series name, or using a geographic bounding-box in WGS84 referenced coordinates. See ?fetchRaCA() for details.

# load libraries
library(soilDB)
require(aqp)
library(plyr)
library(maps)
library(lattice)
library(reshape2)
library(cluster)

# define a geographic bounding box
bbox <- c(-120, 37, -122, 38)

# fetch data by soil series name
r.series <- fetchRaCA(series='auburn')

# fetch data by bounding box
r.bbox <- fetchRaCA(bbox=bbox)

# fetch data by state
r.state <- fetchRaCA(state='ca')

# map each set of fetched data
par(mar=c(0,0,0,0))
map('county', 'california')
points(y ~ x, data=site(r.state$pedons), col='RoyalBlue', cex=0.5, pch=1)
points(y ~ x, data=site(r.bbox$pedons), col='DarkGreen', cex=0.8, pch=15)
points(y ~ x, data=site(r.series$pedons), col='DarkRed', cex=0.8, pch=16)
# add bounding box
rect(bbox[1], bbox[2], bbox[3], bbox[4])
# add long/lat axes
map.axes()
# add a simple legend
legend('topright', pch=c(1, 15, 16), col=c('RoyalBlue', 'DarkGreen', 'DarkRed'), legend=c('state', 'bbox', 'series'), bty='n', title='query by:')

Exploring the Results

A list of objects and tables (data.frame objects) is returned by fetchRaCA(). Elements can be accessed using the $ operator.

# check structure
str(r.bbox, 2)

## List of 6
##  $ pedons :Formal class 'SoilProfileCollection' [package "aqp"] with 9 slots
##  $ trees  :'data.frame': 28 obs. of  3 variables:
##   ..$ siteiid: int [1:28] 605539 605539 605539 605539 605563 605563 605563 605563 605642 605642 ...
##   ..$ dbh    : num [1:28] 11.5 25 48 50 8 ...
##   ..$ height : num [1:28] 36 39 45 54 30 ...
##  $ veg    :'data.frame': 88 obs. of  6 variables:
##   ..$ siteiid              : int [1:88] 605539 605539 605539 605539 605539 605539 605551 605551 605551 605551 ...
##   ..$ plantsym             : chr [1:88] "AVFA" "ERODI" "LUPIN" "MONTI" ...
##   ..$ plantsciname         : chr [1:88] "Avena fatua" "Erodium" "Lupinus" "Montia" ...
##   ..$ plantnatvernm        : chr [1:88] "wild oat" "storks bill" "lupine" "minerslettuce" ...
##   ..$ vegetationstratalevel: logi [1:88] NA NA NA NA NA NA ...
##   ..$ orderofdominance     : logi [1:88] NA NA NA NA NA NA ...
##  $ stock  :'data.frame': 204 obs. of  13 variables:
##   ..$ rcapid     : chr [1:204] "A0208P941" "A0208P942" "A0208P943" "A0208P944" ...
##   ..$ value_5cm  : num [1:204] 17.8 18.4 14.9 22.8 23.2 ...
##   ..$ sd_5cm     : num [1:204] 11.2 12.2 10.1 14.6 15 ...
##   ..$ value_10cm : num [1:204] 41.5 32.9 40 45.1 40.6 ...
##   ..$ sd_10cm    : num [1:204] 17.7 14.6 18 20.3 19.8 ...
##   ..$ value_20cm : num [1:204] 90.4 58.7 79.2 89.1 77.6 ...
##   ..$ sd_20cm    : num [1:204] 44.6 28.2 39.4 41.2 38.4 ...
##   ..$ value_30cm : num [1:204] 125 85.1 112.2 127.5 114.2 ...
##   ..$ sd_30cm    : num [1:204] 52.4 32.5 44.3 48.9 51.1 ...
##   ..$ value_50cm : num [1:204] 198 139 173 206 196 ...
##   ..$ sd_50cm    : num [1:204] 77.9 55.6 75.6 85 87.1 ...
##   ..$ value_100cm: num [1:204] 248 198 246 255 302 ...
##   ..$ sd_100cm   : num [1:204] 90.9 75.7 91.1 91.4 107.7 ...
##  $ sample :'data.frame': 938 obs. of  16 variables:
##   ..$ sample_id    : int [1:938] 12936 12932 12933 12934 12935 12938 12937 12941 12940 12939 ...
##   ..$ rcapid       : chr [1:938] "A0208P941" "A0208P941" "A0208P941" "A0208P941" ...
##   ..$ sample_top   : int [1:938] 69 0 5 19 54 5 0 72 55 21 ...
##   ..$ sample_bottom: int [1:938] 69 5 19 54 69 21 5 72 72 55 ...
##   ..$ hzname       : chr [1:938] "Btqm" "A1" "A2" "BAt" ...
##   ..$ texture      : chr [1:938] "" "sl" "sl" "sl" ...
##   ..$ frag_vol     : int [1:938] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ bd           : num [1:938] 1.29 1.29 1.71 1.6 1.37 ...
##   ..$ bd_sd        : num [1:938] 0.2085 0.0652 0.0652 0.0652 0.2085 ...
##   ..$ bd_source    : chr [1:938] "modeled" "measured" "measured" "measured" ...
##   ..$ bd_method    : chr [1:938] "" "scoop" "core, constant volume" "core, constant volume" ...
##   ..$ soc          : num [1:938] 1.19 2.82 2.87 2.22 2.02 ...
##   ..$ soc_sd       : num [1:938] 2.17 1.75 1.78 1.38 1.25 ...
##   ..$ soc_measured : chr [1:938] "modeled" "modeled" "modeled" "modeled" ...
##   ..$ soc_scan     : logi [1:938] NA NA NA NA NA NA ...
##   ..$ soc_lab      : num [1:938] NA NA NA NA NA NA NA NA NA NA ...
##  $ spectra: NULL

Site/Pedon Data

Site and pedon data are returned as a SoilProfileCollection object within the pedons list element.

# extract site/pedon data
s <- r.series$pedons

# how many pedons in the collection?
length(s)

# how many pedons / RCA site?
table(s$rcasiteid)

# generate soil colors from moist sRGB
s$soil_color <- rep(NA, times=nrow(horizons(s)))
idx <- complete.cases(s$m_r)
s$soil_color[idx] <- with(horizons(s)[idx, ], rgb(m_r, m_g, m_b)) # moist colors

# plot first 20 pedons
par(mar=c(0,0,0,0))
plot(s[1:20, ], name='hzname', max.depth=100)

Vegetation Data

# tabulate vegetation data across the entire collection of RaCA data queried by BBOX
sort(table(r.bbox$veg$plantnatvernm), decreasing=TRUE)

## 
##                 wild oat              storks bill                 blue oak               soft brome 
##                        7                        6                        5                        5 
## California foothill pine        interior live oak       Pacific poison oak             ripgut brome 
##                        3                        3                        3                        3 
##                    brome     California black oak                   fescue                   lupine 
##                        2                        2                        2                        2 
##            minerslettuce             mouse barley              slender oat                  thistle 
##                        2                        2                        2                        2 
##               wheatgrass             Bermudagrass            bigleaf maple            black mustard 
##                        2                        1                        1                        1 
##                bluegrass   bristly dogstail grass                 brodiaea             bull thistle 
##                        1                        1                        1                        1 
##       California juniper        California laurel      California live oak                   clover 
##                        1                        1                        1                        1 
##   coastal sage scrub oak        coastal sagebrush                     corn               curly dock 
##                        1                        1                        1                        1 
##                     dock            Grass; annual            incense cedar         Italian ryegrass 
##                        1                        1                        1                        1 
##                 knotweed                 live oak               medusahead             monkeyflower 
##                        1                        1                        1                        1 
##                  mullein                      oat           ponderosa pine                  redwood 
##                        1                        1                        1                        1 
##                    sedge               sugar pine                   tanoak                  tarweed 
##                        1                        1                        1                        1 
##                    vetch                   walnut      western brackenfern               woodsorrel 
##                        1                        1                        1                        1

# aggregate tree information, by RaCA site
ddply(r.bbox$trees, 'siteiid', summarize, mean.dbh=mean(dbh), mean.height=mean(height), n=length(siteiid))

##   siteiid mean.dbh mean.height n
## 1  605539   33.625      43.500 4
## 2  605563   14.500      36.750 4
## 3  605642   11.575      17.475 4
## 4  605645   26.200      72.350 4
## 5  682304   20.250      68.250 4
## 6  682305   21.750      34.500 4
## 7  682445   21.875      42.875 4

Soil Organic Carbon Concentrations

# extract sample data horizon-level carbon and bulk density
s <- r.series$sample

# init SoilProfileCollection
depths(s) <- rcapid ~ sample_top + sample_bottom

# slice-wise aggregation over entire collection
a <- slab(s, ~ soc)

# plot
xyplot(
  top ~ p.q50, data=a, main='Auburn', ylab='Depth (cm)', xlab='Soil Organic Carbon Concentration (%)',
  lower=a$p.q5, upper=a$p.q95, sync.colors=TRUE, alpha=0.3,
  ylim=c(60,-5), asp=1.5,
  par.settings=list(superpose.line=list(lwd=2, col=c('RoyalBlue'))),
  panel=panel.depth_function, 
  prepanel=prepanel.depth_function,
  auto.key=list(columns=2, lines=TRUE, points=FALSE)
)

Soil Organic Carbon Stock Data

# join carbon stock estimates to SPC object
# these are site-level data
s <- r.bbox$pedons
site(s) <- r.bbox$stock

# compute mean SOC stock to 100cm by RCA site:
round(tapply(s$value_100cm, s$rcasiteid, mean))

## A0208P94 A0215P99 C0204C01 C0204F12 C0204R04 C0204R13 C0204R18 C0205C05 C0205C06 C0205C07 C0205R01 
##      250      161      248      157      173      177       90      513      153      217      270 
## C0206C05 C0206W02 C0206W04 C0208C02 C0208C05 C0208F11 C0208P02 C0208R10 C0209C01 C0209C04 C0209F04 
##      404      272      152      270      320      411      281      131      179      494      338 
## C0209F09 C0209F17 C0209F24 C0209R16 C0209R23 C0212C02 C0212C05 C0212C06 C0212C08 C0213C06 C0213C14 
##      168      275      511      327      169      266      923      260      315      277      199 
## C0215C07 C0215F06 C0218P02 C0218P05 C0219C05 C0219F04 C0219P04 C0219R01 
##      290      260      460      307      225      253      339      168

# compute mean SOC stock to 100cm by series:
round(tapply(s$value_100cm, s$taxonname, mean))

##         Altamont           Auburn       Ben Lomond          Briones         Campbell 
##              169              129              399              270              404 
##           Cometa          Cropley            Delhi           Felton           Fresno 
##              250              320              294              640              225 
##          Gonzaga          Hanford            Itano Josephine family          Katykat 
##              260              339              199              253              411 
##          Kingile          Lompico        Los Gatos           Maymen       Poterville 
##              260              439              253              157              161 
##           Rindge      San Andreas      San Joaquin          Shinkee       Stanislaus 
##              923              301              266              266              290 
##           Temple         Triangle          Tujunga           Valdez       Vallecitos 
##              384              212              494              277              161 
##           Webile        Zacharias 
##              315              179

# convert taxonname to a factor
s$taxonname <- factor(s$taxonname)

# re-level based on median C stock
stock.asc.order <- order(tapply(s$value_100cm, s$taxonname, median))
new.levels <- levels(s$taxonname)[stock.asc.order]
s$taxonname <- factor(s$taxonname, levels=new.levels)

# visualize
bwplot(taxonname ~ value_100cm, data=site(s), xlab='Carbon Stock to 100cm (Mg per ha)')

Depth-Wise Aggregation of Carbon Stock Data

m <- melt(r.series$stock, id.vars='rcapid', measure.vars=c(2,4,6,8,10,12))

# fix c stock variable name
names(m)[3] <- 'c.stock'

# generate depths from variable names
m$top <- NA
m$bottom <- NA

# sequence of possible top/bottom boundaries
m.top <- c(0,5,10,20,30,50)
m.bottom <- c(5,10,20,30,50,100)

# generate an index to real depths
# this only works because the factor levels are in the correct order
mm <- as.numeric(m$variable)

# assign real depths
m$top <- m.top[mm]
m$bottom <- m.bottom[mm]

# promote to SPC
depths(m) <- rcapid ~ top + bottom

# simple aggregation of the entire collection: note that this doesn't take into account error estimates...
a <- slab(m, ~ c.stock)

# plot aggregate data
xyplot(
  top ~ p.q50, data=a, main='Auburn', ylab='Depth (cm)', xlab='Cumulative Carbon Stock (Mg per ha)',
  lower=a$p.q5, upper=a$p.q95, sync.colors=TRUE, alpha=0.3,
  ylim=c(105,-5), asp=1.5,
  par.settings=list(superpose.line=list(lwd=2, col=c('RoyalBlue'))),
  panel=panel.depth_function, 
  prepanel=prepanel.depth_function,
  auto.key=list(columns=2, lines=TRUE, points=FALSE)
)

VNIR Spectra

VNIR spectra are large, and therefore not downloaded by default. The get.vnir=TRUE argument to fetchRaCA() will download VNIR spectra associated with the query results.

s <- fetchRaCA(series='auburn', get.vnir=TRUE)

# plot the spectra
par(mar=c(0,0,0,0))
matplot(t(s$spectra), type='l', lty=1, col=rgb(0, 0, 0, alpha=0.25), ylab='', xlab='', axes=FALSE)
box()

# Q-mode PCA of spectra
pca <- prcomp(s$spectra, center=TRUE, scale=TRUE)
summary(pca)

# cluster on first 3 principal components into 3 hard groupings via PAM
spectra.pam <- pam(predict(pca)[, 1:3], 3)

# re-plot spectra, colored by 3-class clustering
cols <- rgb(t(col2rgb(spectra.pam$clustering)), maxColorValue=255, alpha=100)
matplot(t(s$spectra), type='l', lty=1, col=cols, ylab='', xlab='', axes=FALSE)
box()

Querying and loading spectra

This example demonstrates how to load RaCA data for a regional area, subset it to specific soil taxa based on pattern matching, and load the spectra for a specific set of taxa.

#load data by state without spectra
r.state <- fetchRaCA(state='id')

# subset and get rcasiteid's for sites with volcanic ash, andic and vitrandic subgroups
idx <- grep('andic', r.state$pedons$tax_subgroup)
length(idx)
print(table(r.state$pedons$tax_subgroup[idx]))
print(table(r.state$pedons$rcasiteid[idx]))

rcaIDs <- r.state$pedons$rcasiteid[idx]
rcIDs <- paste(rcaIDs, sep=', ')
#r.state <- fetchRaCA(rcasiteid = rcIDs)

# load spectra for specific RaCA siteIDs for the subset above
s <- fetchRaCA(rcasiteid = rcIDs, get.vnir=TRUE)

Working with VNIR spectra as SpectralDataFrames

A very useful construct called a SpectralDataFrame and associated operations can be found in the spectacles package. We will promote the downloaded RaCA spectra to a SpectralDataFrame by joining in associated data and attribute variables. Once promoted to a SpectralDataFrame they can be split by horizon designation name and then aggregated to a mean spectra for each set of similarly designated horizons. Applying a logical genetic horizonation order (from soil surface to subsurface) to the aggregated spectra makes it easier to compare how the spectra vary with depth.

library(ggplot2)
library(spectacles)
library(RColorBrewer)

# convert spectra matrix to dataframe
d <- as.data.frame(s$spectra)

# assign rownames to id column
d$id <- rownames(d)

# create Spectra from the dataframe
spectra(d) <- id ~ 350:2500

# subset some data elements from the sample dataframe
s1 <- s$sample[ , c('sample_id','rcapid','hzname','sample_top','sample_bottom')]
rownames(s1) <- s1$sample_id
names(s1) <- c('id', 'rcapid', 'hzname','sample_top','sample_bottom')

# promote Spectra to a SpectralDataFrame object by joining in the associated data
features(d, key='id') <- s1

# split data into lists on horizon names (hzname)
d1 <- split(d, d$hzname)
print(table(d$hzname))
# plot subset of 'A' horizons
plot(d1$A)

# aggregate spectra by taking the mean values across spectra within each hzname group
d_ag <- aggregate_spectra(d, fun = mean, id = "hzname")

# melt format for plotting with ggplot
d2 <- melt_spectra(d_ag, attr = c('hzname','sample_top'))

# set a logical horizon order for plotting
hz_ordered <- c('Oi', 'Oe', 'A', 'Bw1', 'Bw2', '2Bt', '2Bw','2Bw2', '2Bw3', '2Bw4', '2BC', '2C', '2C1', '2C2')
d2$hzname <- factor(d2$hzname, levels = hz_ordered)

# plot
ggplot(d2) + geom_line(aes(x = wl, y = nir, colour = hzname))

This document is based on aqp version 1.25 and soilDB version 2.5.9.

Getting Data from the Rapid Carbon Assessment Project

D.E. Beaudette and S. Wills

2021-01-15

Introduction

The SoilWeb RaCA Query API