This document demonstrates how to use the soilDB package to download RaCA via SoilWeb. Comparisons are made via graphical summaries of key soil properties with depth, using data structures and functions from the aqp package.
RaCA data are queried via HTTP request to the SoilWeb server, with query parameters encoded as URL parameters. For example, by appending several parameters to the base URL it is possible to query “site-level” data associated with soils correlated to the Auburn series like this:
http://casoilresource.lawr.ucdavis.edu/soil_web/rca/rca_query.php?what=site&series=auburn
There are 5 URL parameters that can be used to refine RaCA data queries:
what=site ——query only the site-level data
series=auburn ——return only those data correlated to the Auburn series
bbox=-120,37,-122,38 ——return only those data within the specified bounding box, constrained to a 5-degree block
state=ca ——return only those data within California
plaintext=1 ——return data as plain text instead of gzipped text
The what parameter determines which chunk of data will be returned: * site: site and pedon level data * horizon: horizon level data * veg: vegetation information recorded at each site (relative abundence of 6 most common species) * trees: tree height(feet) and DBH(inches) measured on the first for trees in using RaCA procdures * stock: cumulative soil organic carbon stocks (Mg per ha) to standardized depths * sample: measured/modeled bulk density and carbon concentration at the sample-level * vnir: VNIR spectra [queries by soil series or BBOX only]
The soilDB package abstracts URL-based queries for RaCA data into a single function, fetchRaCA()
. Calling this function with user-defined query paramters, e.g. fetchRaCA(series='auburn')
, will download all data (site, horizon, veg, trees, stock) associated with the Auburn series and return the results as a list
. VNIR spectra can be very large and are therefore not downloaded by default.
With a recent version of R (>= 2.15), it is possible to get all of the packages that this tutorial depends on via:
# run these commands in the R console
install.packages('soilDB', dep=TRUE)
Data can be queried by two-letter state code, fuzzy-matching of series name, or using a geographic bounding-box in WGS84 referenced coordinates. See ?fetchRaCA()
for details.
# load libraries
library(soilDB)
require(aqp)
library(plyr)
library(maps)
library(lattice)
library(reshape2)
library(cluster)
# define a geographic bounding box
bbox <- c(-120, 37, -122, 38)
# fetch data by soil series name
r.series <- fetchRaCA(series='auburn')
# fetch data by bounding box
r.bbox <- fetchRaCA(bbox=bbox)
# fetch data by state
r.state <- fetchRaCA(state='ca')
# map each set of fetched data
par(mar=c(0,0,0,0))
map('county', 'california')
points(y ~ x, data=site(r.state$pedons), col='RoyalBlue', cex=0.5, pch=1)
points(y ~ x, data=site(r.bbox$pedons), col='DarkGreen', cex=0.8, pch=15)
points(y ~ x, data=site(r.series$pedons), col='DarkRed', cex=0.8, pch=16)
# add bounding box
rect(bbox[1], bbox[2], bbox[3], bbox[4])
# add long/lat axes
map.axes()
# add a simple legend
legend('topright', pch=c(1, 15, 16), col=c('RoyalBlue', 'DarkGreen', 'DarkRed'), legend=c('state', 'bbox', 'series'), bty='n', title='query by:')
A list of objects and tables (data.frame
objects) is returned by fetchRaCA()
. Elements can be accessed using the $
operator.
# check structure
str(r.bbox, 2)
## List of 6
## $ pedons :Formal class 'SoilProfileCollection' [package "aqp"] with 9 slots
## $ trees :'data.frame': 28 obs. of 3 variables:
## ..$ siteiid: int [1:28] 605539 605539 605539 605539 605563 605563 605563 605563 605642 605642 ...
## ..$ dbh : num [1:28] 11.5 25 48 50 8 ...
## ..$ height : num [1:28] 36 39 45 54 30 ...
## $ veg :'data.frame': 88 obs. of 6 variables:
## ..$ siteiid : int [1:88] 605539 605539 605539 605539 605539 605539 605551 605551 605551 605551 ...
## ..$ plantsym : chr [1:88] "AVFA" "ERODI" "LUPIN" "MONTI" ...
## ..$ plantsciname : chr [1:88] "Avena fatua" "Erodium" "Lupinus" "Montia" ...
## ..$ plantnatvernm : chr [1:88] "wild oat" "storks bill" "lupine" "minerslettuce" ...
## ..$ vegetationstratalevel: logi [1:88] NA NA NA NA NA NA ...
## ..$ orderofdominance : logi [1:88] NA NA NA NA NA NA ...
## $ stock :'data.frame': 204 obs. of 13 variables:
## ..$ rcapid : chr [1:204] "A0208P941" "A0208P942" "A0208P943" "A0208P944" ...
## ..$ value_5cm : num [1:204] 17.8 18.4 14.9 22.8 23.2 ...
## ..$ sd_5cm : num [1:204] 11.2 12.2 10.1 14.6 15 ...
## ..$ value_10cm : num [1:204] 41.5 32.9 40 45.1 40.6 ...
## ..$ sd_10cm : num [1:204] 17.7 14.6 18 20.3 19.8 ...
## ..$ value_20cm : num [1:204] 90.4 58.7 79.2 89.1 77.6 ...
## ..$ sd_20cm : num [1:204] 44.6 28.2 39.4 41.2 38.4 ...
## ..$ value_30cm : num [1:204] 125 85.1 112.2 127.5 114.2 ...
## ..$ sd_30cm : num [1:204] 52.4 32.5 44.3 48.9 51.1 ...
## ..$ value_50cm : num [1:204] 198 139 173 206 196 ...
## ..$ sd_50cm : num [1:204] 77.9 55.6 75.6 85 87.1 ...
## ..$ value_100cm: num [1:204] 248 198 246 255 302 ...
## ..$ sd_100cm : num [1:204] 90.9 75.7 91.1 91.4 107.7 ...
## $ sample :'data.frame': 938 obs. of 16 variables:
## ..$ sample_id : int [1:938] 12936 12932 12933 12934 12935 12938 12937 12941 12940 12939 ...
## ..$ rcapid : chr [1:938] "A0208P941" "A0208P941" "A0208P941" "A0208P941" ...
## ..$ sample_top : int [1:938] 69 0 5 19 54 5 0 72 55 21 ...
## ..$ sample_bottom: int [1:938] 69 5 19 54 69 21 5 72 72 55 ...
## ..$ hzname : chr [1:938] "Btqm" "A1" "A2" "BAt" ...
## ..$ texture : chr [1:938] "" "sl" "sl" "sl" ...
## ..$ frag_vol : int [1:938] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ bd : num [1:938] 1.29 1.29 1.71 1.6 1.37 ...
## ..$ bd_sd : num [1:938] 0.2085 0.0652 0.0652 0.0652 0.2085 ...
## ..$ bd_source : chr [1:938] "modeled" "measured" "measured" "measured" ...
## ..$ bd_method : chr [1:938] "" "scoop" "core, constant volume" "core, constant volume" ...
## ..$ soc : num [1:938] 1.19 2.82 2.87 2.22 2.02 ...
## ..$ soc_sd : num [1:938] 2.17 1.75 1.78 1.38 1.25 ...
## ..$ soc_measured : chr [1:938] "modeled" "modeled" "modeled" "modeled" ...
## ..$ soc_scan : logi [1:938] NA NA NA NA NA NA ...
## ..$ soc_lab : num [1:938] NA NA NA NA NA NA NA NA NA NA ...
## $ spectra: NULL
Site and pedon data are returned as a SoilProfileCollection object within the pedons
list element.
# extract site/pedon data
s <- r.series$pedons
# how many pedons in the collection?
length(s)
# how many pedons / RCA site?
table(s$rcasiteid)
# generate soil colors from moist sRGB
s$soil_color <- rep(NA, times=nrow(horizons(s)))
idx <- complete.cases(s$m_r)
s$soil_color[idx] <- with(horizons(s)[idx, ], rgb(m_r, m_g, m_b)) # moist colors
# plot first 20 pedons
par(mar=c(0,0,0,0))
plot(s[1:20, ], name='hzname', max.depth=100)
# tabulate vegetation data across the entire collection of RaCA data queried by BBOX
sort(table(r.bbox$veg$plantnatvernm), decreasing=TRUE)
##
## wild oat storks bill blue oak soft brome
## 7 6 5 5
## California foothill pine interior live oak Pacific poison oak ripgut brome
## 3 3 3 3
## brome California black oak fescue lupine
## 2 2 2 2
## minerslettuce mouse barley slender oat thistle
## 2 2 2 2
## wheatgrass Bermudagrass bigleaf maple black mustard
## 2 1 1 1
## bluegrass bristly dogstail grass brodiaea bull thistle
## 1 1 1 1
## California juniper California laurel California live oak clover
## 1 1 1 1
## coastal sage scrub oak coastal sagebrush corn curly dock
## 1 1 1 1
## dock Grass; annual incense cedar Italian ryegrass
## 1 1 1 1
## knotweed live oak medusahead monkeyflower
## 1 1 1 1
## mullein oat ponderosa pine redwood
## 1 1 1 1
## sedge sugar pine tanoak tarweed
## 1 1 1 1
## vetch walnut western brackenfern woodsorrel
## 1 1 1 1
# aggregate tree information, by RaCA site
ddply(r.bbox$trees, 'siteiid', summarize, mean.dbh=mean(dbh), mean.height=mean(height), n=length(siteiid))
## siteiid mean.dbh mean.height n
## 1 605539 33.625 43.500 4
## 2 605563 14.500 36.750 4
## 3 605642 11.575 17.475 4
## 4 605645 26.200 72.350 4
## 5 682304 20.250 68.250 4
## 6 682305 21.750 34.500 4
## 7 682445 21.875 42.875 4
# extract sample data horizon-level carbon and bulk density
s <- r.series$sample
# init SoilProfileCollection
depths(s) <- rcapid ~ sample_top + sample_bottom
# slice-wise aggregation over entire collection
a <- slab(s, ~ soc)
# plot
xyplot(
top ~ p.q50, data=a, main='Auburn', ylab='Depth (cm)', xlab='Soil Organic Carbon Concentration (%)',
lower=a$p.q5, upper=a$p.q95, sync.colors=TRUE, alpha=0.3,
ylim=c(60,-5), asp=1.5,
par.settings=list(superpose.line=list(lwd=2, col=c('RoyalBlue'))),
panel=panel.depth_function,
prepanel=prepanel.depth_function,
auto.key=list(columns=2, lines=TRUE, points=FALSE)
)
# join carbon stock estimates to SPC object
# these are site-level data
s <- r.bbox$pedons
site(s) <- r.bbox$stock
# compute mean SOC stock to 100cm by RCA site:
round(tapply(s$value_100cm, s$rcasiteid, mean))
## A0208P94 A0215P99 C0204C01 C0204F12 C0204R04 C0204R13 C0204R18 C0205C05 C0205C06 C0205C07 C0205R01
## 250 161 248 157 173 177 90 513 153 217 270
## C0206C05 C0206W02 C0206W04 C0208C02 C0208C05 C0208F11 C0208P02 C0208R10 C0209C01 C0209C04 C0209F04
## 404 272 152 270 320 411 281 131 179 494 338
## C0209F09 C0209F17 C0209F24 C0209R16 C0209R23 C0212C02 C0212C05 C0212C06 C0212C08 C0213C06 C0213C14
## 168 275 511 327 169 266 923 260 315 277 199
## C0215C07 C0215F06 C0218P02 C0218P05 C0219C05 C0219F04 C0219P04 C0219R01
## 290 260 460 307 225 253 339 168
# compute mean SOC stock to 100cm by series:
round(tapply(s$value_100cm, s$taxonname, mean))
## Altamont Auburn Ben Lomond Briones Campbell
## 169 129 399 270 404
## Cometa Cropley Delhi Felton Fresno
## 250 320 294 640 225
## Gonzaga Hanford Itano Josephine family Katykat
## 260 339 199 253 411
## Kingile Lompico Los Gatos Maymen Poterville
## 260 439 253 157 161
## Rindge San Andreas San Joaquin Shinkee Stanislaus
## 923 301 266 266 290
## Temple Triangle Tujunga Valdez Vallecitos
## 384 212 494 277 161
## Webile Zacharias
## 315 179
# convert taxonname to a factor
s$taxonname <- factor(s$taxonname)
# re-level based on median C stock
stock.asc.order <- order(tapply(s$value_100cm, s$taxonname, median))
new.levels <- levels(s$taxonname)[stock.asc.order]
s$taxonname <- factor(s$taxonname, levels=new.levels)
# visualize
bwplot(taxonname ~ value_100cm, data=site(s), xlab='Carbon Stock to 100cm (Mg per ha)')
m <- melt(r.series$stock, id.vars='rcapid', measure.vars=c(2,4,6,8,10,12))
# fix c stock variable name
names(m)[3] <- 'c.stock'
# generate depths from variable names
m$top <- NA
m$bottom <- NA
# sequence of possible top/bottom boundaries
m.top <- c(0,5,10,20,30,50)
m.bottom <- c(5,10,20,30,50,100)
# generate an index to real depths
# this only works because the factor levels are in the correct order
mm <- as.numeric(m$variable)
# assign real depths
m$top <- m.top[mm]
m$bottom <- m.bottom[mm]
# promote to SPC
depths(m) <- rcapid ~ top + bottom
# simple aggregation of the entire collection: note that this doesn't take into account error estimates...
a <- slab(m, ~ c.stock)
# plot aggregate data
xyplot(
top ~ p.q50, data=a, main='Auburn', ylab='Depth (cm)', xlab='Cumulative Carbon Stock (Mg per ha)',
lower=a$p.q5, upper=a$p.q95, sync.colors=TRUE, alpha=0.3,
ylim=c(105,-5), asp=1.5,
par.settings=list(superpose.line=list(lwd=2, col=c('RoyalBlue'))),
panel=panel.depth_function,
prepanel=prepanel.depth_function,
auto.key=list(columns=2, lines=TRUE, points=FALSE)
)
VNIR spectra are large, and therefore not downloaded by default. The get.vnir=TRUE
argument to fetchRaCA()
will download VNIR spectra associated with the query results.
s <- fetchRaCA(series='auburn', get.vnir=TRUE)
# plot the spectra
par(mar=c(0,0,0,0))
matplot(t(s$spectra), type='l', lty=1, col=rgb(0, 0, 0, alpha=0.25), ylab='', xlab='', axes=FALSE)
box()
# Q-mode PCA of spectra
pca <- prcomp(s$spectra, center=TRUE, scale=TRUE)
summary(pca)
# cluster on first 3 principal components into 3 hard groupings via PAM
spectra.pam <- pam(predict(pca)[, 1:3], 3)
# re-plot spectra, colored by 3-class clustering
cols <- rgb(t(col2rgb(spectra.pam$clustering)), maxColorValue=255, alpha=100)
matplot(t(s$spectra), type='l', lty=1, col=cols, ylab='', xlab='', axes=FALSE)
box()
This example demonstrates how to load RaCA data for a regional area, subset it to specific soil taxa based on pattern matching, and load the spectra for a specific set of taxa.
#load data by state without spectra
r.state <- fetchRaCA(state='id')
# subset and get rcasiteid's for sites with volcanic ash, andic and vitrandic subgroups
idx <- grep('andic', r.state$pedons$tax_subgroup)
length(idx)
print(table(r.state$pedons$tax_subgroup[idx]))
print(table(r.state$pedons$rcasiteid[idx]))
rcaIDs <- r.state$pedons$rcasiteid[idx]
rcIDs <- paste(rcaIDs, sep=', ')
#r.state <- fetchRaCA(rcasiteid = rcIDs)
# load spectra for specific RaCA siteIDs for the subset above
s <- fetchRaCA(rcasiteid = rcIDs, get.vnir=TRUE)
A very useful construct called a SpectralDataFrame
and associated operations can be found in the spectacles package. We will promote the downloaded RaCA spectra to a SpectralDataFrame by joining in associated data and attribute variables. Once promoted to a SpectralDataFrame they can be split by horizon designation name and then aggregated to a mean spectra for each set of similarly designated horizons. Applying a logical genetic horizonation order (from soil surface to subsurface) to the aggregated spectra makes it easier to compare how the spectra vary with depth.
library(ggplot2)
library(spectacles)
library(RColorBrewer)
# convert spectra matrix to dataframe
d <- as.data.frame(s$spectra)
# assign rownames to id column
d$id <- rownames(d)
# create Spectra from the dataframe
spectra(d) <- id ~ 350:2500
# subset some data elements from the sample dataframe
s1 <- s$sample[ , c('sample_id','rcapid','hzname','sample_top','sample_bottom')]
rownames(s1) <- s1$sample_id
names(s1) <- c('id', 'rcapid', 'hzname','sample_top','sample_bottom')
# promote Spectra to a SpectralDataFrame object by joining in the associated data
features(d, key='id') <- s1
# split data into lists on horizon names (hzname)
d1 <- split(d, d$hzname)
print(table(d$hzname))
# plot subset of 'A' horizons
plot(d1$A)
# aggregate spectra by taking the mean values across spectra within each hzname group
d_ag <- aggregate_spectra(d, fun = mean, id = "hzname")
# melt format for plotting with ggplot
d2 <- melt_spectra(d_ag, attr = c('hzname','sample_top'))
# set a logical horizon order for plotting
hz_ordered <- c('Oi', 'Oe', 'A', 'Bw1', 'Bw2', '2Bt', '2Bw','2Bw2', '2Bw3', '2Bw4', '2BC', '2C', '2C1', '2C2')
d2$hzname <- factor(d2$hzname, levels = hz_ordered)
# plot
ggplot(d2) + geom_line(aes(x = wl, y = nir, colour = hzname))
This document is based on aqp
version 1.25 and soilDB
version 2.5.9.