This document demonstrates how to use the soilDB package to download data from the Henry Mount soil climate database. Soil climate data are routinely collected by SSO staff via buried sensor/data-logger devices (“hobos”) and now above ground weather stations. The Henry Mount Soil Climate database was established to assist with the management and analysis of these data.
With a recent version of R (>= 2.15), it is possible to get all of the packages that this tutorial depends on via:
# run these commands in the R console
install.packages('tactile', dep=TRUE)
install.packages('latticeExtra', dep=TRUE)
install.packages('reshape', dep=TRUE)
install.packages('dismo', dep=TRUE)
install.packages('soilDB', dep=TRUE)
install.packages('sharpshootR', dep=TRUE)
# get latest version from GitHub
install.packages('devtools', dep=TRUE)
devtools::install_github("ncss-tech/soilDB", dependencies=FALSE, upgrade_dependencies=FALSE)
Soil climate data can be queried by:
and optionally filtered by:
and aggregated to the following granularity:
Query daily sensor data associated with the Sequoia / Kings Canyon soil survey.
library(soilDB)
library(sharpshootR)
library(latticeExtra)
library(tactile)
library(plyr)
# get soil temperature, soil moisture, and air temperature data
x <- fetchHenry(project = 'CA792')
# check object structure:
str(x, 2)
Quick listing of essential site-level data. “Functional years” is the number of years of non-missing data, after grouping data by Day of Year. “Complete years” is the number of years that have 365 days of non-missing data. “dslv” is the number of days since the data-logger was last visited.
# convert into data.frame
d <- as.data.frame(x$sensors)
# keep only information on soil temperature sensors at 50cm
d <- subset(d, subset = sensor_type == 'soiltemp' & sensor_depth == 50)
# check top 6 rows and select columns
user_site_id | name | sensor_depth | MAST | Winter | Summer | STR | functional.yrs | complete.yrs | dslv |
---|---|---|---|---|---|---|---|---|---|
2006CA7920001 | Muir Pass | 50 | 1.45 | -1.23 | 4.73 | cryic | 15 | 15 | 463 |
2012CA7921062 | Dusy Basin | 50 | 5.00 | 1.18 | 10.18 | cryic | 9 | 6 | 463 |
2015CA7921071 | Tyndall | 50 | 6.15 | 1.68 | 11.77 | cryic | 7 | 6 | 810 |
S2012CA019001 | Littlepete | 50 | 4.93 | 1.29 | 9.79 | cryic | 6 | 5 | 2275 |
S2012CA019002 | LeConte | 50 | 6.38 | 2.01 | 11.31 | cryic | 10 | 10 | 463 |
S2012CA019003 | McDermand | 50 | 4.02 | 1.04 | 8.14 | cryic | 10 | 10 | 463 |
x$soiltemp
) look like
this:
sid | date_time | sensor_value | year | doy | month | season | water_year | water_day | name | sensor_name | sensor_depth |
---|---|---|---|---|---|---|---|---|---|---|---|
26 | 2006-01-01 | NA | 2006 | 1 | Jan | Winter | 2006 | 93 | Muir Pass-50 | Muir Pass | 50 |
26 | 2006-01-02 | NA | 2006 | 2 | Jan | Winter | 2006 | 94 | Muir Pass-50 | Muir Pass | 50 |
26 | 2006-01-03 | NA | 2006 | 3 | Jan | Winter | 2006 | 95 | Muir Pass-50 | Muir Pass | 50 |
26 | 2006-01-04 | NA | 2006 | 4 | Jan | Winter | 2006 | 96 | Muir Pass-50 | Muir Pass | 50 |
26 | 2006-01-05 | NA | 2006 | 5 | Jan | Winter | 2006 | 97 | Muir Pass-50 | Muir Pass | 50 |
26 | 2006-01-06 | NA | 2006 | 6 | Jan | Winter | 2006 | 98 | Muir Pass-50 | Muir Pass | 50 |
Make a simple graphical timeline of the data by sensor.
HenryTimeLine(x$soiltemp, main = 'Soil Temperature Records', col = 'RoyalBlue')
HenryTimeLine(x$soilVWC, main = 'Soil VWC Records', col = 'RoyalBlue')
Basic quality control, check for impossible values
# note presence of very large negative values
summary(x$soiltemp$sensor_value)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## -888.880 2.728 9.387 -2.036 15.543 42.523 20092
summary(x$soilVWC$sensor_value)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## -888.880 0.080 0.184 -24.294 0.257 0.718 13821
# set large negative values to NA
idx <- which(x$soiltemp$sensor_value < -20)
x$soiltemp$sensor_value[idx] <- NA
# set out of range values to NA
idx <- which(x$soilVWC$sensor_value < 0 | x$soilVWC$sensor_value > 0.55)
x$soilVWC$sensor_value[idx] <- NA
All of the following examples are based on lattice graphics. The syntax takes a little getting used to, but provides a very flexible framework for layout of grouped data into panels. Try adapting the examples below as a starting point for more complex or customized figures. Critical elements of the syntax include:
sensor_value ~ date_time | sensor_name
: y ~ x |
facet-by-groupsdata=x$soiltemp
: get data from the
soiltemperature sensor recordssubset=sensor_depth == 50
:
keep only records at 50cm# soil temperature at 50cm
xyplot(sensor_value ~ date_time | sensor_name,
data = x$soiltemp, subset = sensor_depth == 50,
main = 'Daily Soil Temperature (Deg. C) at 50cm',
type = c('l', 'g'),
as.table = TRUE, xlab = 'Date', ylab = 'Deg C',
scales = list(alternating=3, cex=0.75, x=list(rot=45)),
strip = strip.custom(bg='grey'),
par.settings = tactile.theme()
)
xyplot(sensor_value ~ date_time | sensor_name,
data=x$soilVWC, subset=sensor_depth == 50,
main='Daily Soil Moisture at 50cm', type=c('l', 'g'),
as.table=TRUE, xlab='Date', ylab='Volumetric Water Content',
scales=list(alternating=3, cex=0.75, x=list(rot=45)),
strip=strip.custom(bg = 'grey'),
par.settings = tactile.theme()
)
One approach for investigating data gaps, blue: data, grey: no data.
levelplot(
factor(!is.na(sensor_value)) ~ doy * factor(year) | sensor_name,
data=x$soiltemp,
subset=sensor_depth == 50,
main='Daily Soil Temperature (Deg. C) at 50cm',
col.regions = c('grey', 'RoyalBlue'), cuts = 1,
colorkey=FALSE, as.table=TRUE, scales=list(alternating=3, cex=0.75),
par.strip.text=list(cex=0.85), strip=strip.custom(bg='yellow'),
xlab = 'Day of Year', ylab = 'Year',
par.settings = tactile.theme()
)
Again, this time only include 2013-2017.
levelplot(
factor(!is.na(sensor_value)) ~ doy * factor(year) | sensor_name,
data = x$soiltemp,
subset = sensor_depth == 50 & year %in% 2013:2017,
main = 'Daily Soil Temperature (Deg. C) at 50cm',
col.regions = c('grey', 'RoyalBlue'), cuts = 1,
colorkey = FALSE, as.table = TRUE, scales=list(alternating=3, cex=0.75),
par.strip.text=list(cex=0.85), strip=strip.custom(bg='yellow'),
xlab='Day of Year', ylab='Year',
par.settings = tactile.theme()
)
Soil moisture data by calendar years and Julian day.
levelplot(
factor(!is.na(sensor_value)) ~ doy * factor(year) | sensor_name, main='Daily Soil Moisture at 50cm',
data=x$soilVWC, subset=sensor_depth == 50,
col.regions=c('grey', 'RoyalBlue'), cuts=1,
colorkey=FALSE, as.table=TRUE, scales=list(alternating=3, cex=0.75),
par.strip.text=list(cex=0.85), strip=strip.custom(bg='yellow'),
xlab='Day', ylab='Year',
par.settings = tactile.theme()
)
Soil moisture data by water year and day.
levelplot(
factor(!is.na(sensor_value)) ~ water_day * factor(water_year) | sensor_name, main='Daily Soil Moisture at 50cm\nOctober 1 -- September 30',
data=x$soilVWC, subset=sensor_depth == 50,
col.regions=c('grey', 'RoyalBlue'), cuts=1,
colorkey=FALSE, as.table=TRUE, scales=list(alternating=3, cex=0.75),
par.strip.text=list(cex=0.85), strip=strip.custom(bg='yellow'),
xlab='Water Day', ylab='Water Year',
par.settings = tactile.theme()
)
Comparison between years, faceted by sensor name.
# generate some better colors
cols.temp <- colorRampPalette(rev(hcl.colors(11, 'RdYlBu')), space='Lab', interpolate='spline')
levelplot(
sensor_value ~ doy * factor(year) | sensor_name, main='Daily Soil Temperature (Deg. C) at 50cm',
data=x$soiltemp, col.regions=cols.temp,
subset=sensor_depth == 50 & year %in% 2013:2017,
colorkey=list(space='top'), as.table=TRUE, scales=list(alternating=3, cex=0.75),
par.strip.text=list(cex=0.85), strip=strip.custom(bg='grey'),
xlab='Day of Year', ylab='Year',
par.settings = tactile.theme()
)
Comparison between sensor depths, faceted by sensor name.
# generate some better colors
cols.vwc <- colorRampPalette(hcl.colors(11, 'RdYlBu'), space='Lab', interpolate='spline')
levelplot(
sensor_value ~ doy * factor(sensor_depth) | sensor_name, main='Daily Soil Moisture',
data=x$soilVWC, col.regions=cols.vwc,
subset=year == 2015,
colorkey=list(space='top'), as.table=TRUE, scales=list(alternating=3, cex=0.75),
par.strip.text=list(cex=0.85), strip=strip.custom(bg='grey'),
xlab='Day of Year', ylab='Sensor Depth (cm)',
par.settings = tactile.theme()
)
Facet data (organize into panels) by a combination of sensor name and depth.
p <- levelplot(
sensor_value ~ doy * factor(year) | sensor_name + factor(sensor_depth), main='Daily Soil Moisture',
data=x$soilVWC, col.regions=cols.vwc,
colorkey=list(space='top'), as.table=TRUE, scales=list(alternating=3, cex=0.75),
par.strip.text=list(cex=0.85),
xlab='Day of Year', ylab='Year',
par.settings = tactile.theme()
)
useOuterStrips(p, strip=strip.custom(bg='grey'), strip.left = strip.custom(bg='grey', horizontal=FALSE))
Same idea, this time with soil temperature data.
p <- levelplot(
sensor_value ~ doy * factor(year) | sensor_name + factor(sensor_depth), main='Daily Soil Temperature',
data=x$soiltemp, col.regions=cols.temp,
subset=sensor_name %in% c('Ashmountain', 'CedarGrove', 'LeConte'),
colorkey=list(space='top'), as.table=TRUE, scales=list(alternating=3, cex=0.75),
par.strip.text=list(cex=0.85),
xlab='Day of Year', ylab='Year',
par.settings = tactile.theme()
)
useOuterStrips(p, strip=strip.custom(bg='grey'), strip.left = strip.custom(bg='grey', horizontal=FALSE))
Aggregate over years by sensor / Day of Year, 50cm data only.
# compute MAST by sensor
a <- ddply(x$soiltemp[which(x$soiltemp$sensor_depth == 50), ], c('sensor_name', 'doy'), .fun=plyr::summarise, soiltemp=mean(sensor_value, na.rm = TRUE))
# re-order sensor names according to MAST
a.mast <- sort(tapply(a$soiltemp, a$sensor_name, median, na.rm=TRUE))
a$sensor_name <- factor(a$sensor_name, levels=names(a.mast))
levelplot(
soiltemp ~ doy * sensor_name,
main = 'Median Soil Temperature (Deg. C) at 50cm',
data = a,
col.regions = cols.temp,
xlab = 'Day of Year', ylab='',
colorkey = list(space='top'), scales=list(alternating=3, cex=0.75, x=list(tick.number=15)),
par.settings = tactile.theme()
)
Convert data to percent saturation and flag days with pct. saturation >= 50%.
fun <- function(i) {
i$pct.sat <- i$sensor_value / max(i$sensor_value, na.rm = TRUE)
return(i)
}
# flag days with percent saturation >= 50%
z <- ddply(x$soilVWC, c('sid', 'year'), .fun=fun)
z$pct.sat <- factor(z$pct.sat >= 0.5, levels = c('TRUE', 'FALSE'), labels = c('Moist', 'Dry'))
p <- levelplot(
pct.sat ~ doy * factor(year) | sensor_name + factor(sensor_depth), main='Percent Saturation >= 50%',
data=z,
col.regions=c('grey', 'RoyalBlue'), cuts=1,
colorkey=FALSE, as.table=TRUE, scales=list(alternating=3, cex=0.75),
par.strip.text=list(cex=0.85),
xlab='Day of Year', ylab='Year',
par.settings = tactile.theme()
)
useOuterStrips(p, strip=strip.custom(bg='grey'), strip.left = strip.custom(bg='grey', horizontal=FALSE))
In the presence of missing data, MAST calculations will be biased towards those data that are not missing. For example, a block of missing data in January will result in an estimated MAST that is too high due to the missing data from the middle of winter. It is possible to estimate (mostly) unbiased MAST values in the presence of some missing data by averaging multiple years of data by Day of Year. This approach will generate reasonable summaries in the presence of missing data, as long as data gaps are “covered” by corresponding data from another year. The longer the period of record and shorter the data gaps, the better.
Soil temperature regime assignment for gelic, cryic, and frigid conditions typically require additional information and are marked with an ’*’.
When daily data are queried, unbiased summaries and indices of data “completeness” are calculated.
as.data.frame(x$sensors)[which(!is.na(x$sensors$MAST)), c('user_site_id', 'sensor_depth', 'name', 'MAST', 'Winter', 'Summer', 'STR', 'functional.yrs', 'complete.yrs', 'gap.index')]
user_site_id | sensor_depth | name | MAST | Winter | Summer | STR | functional.yrs | complete.yrs | gap.index |
---|---|---|---|---|---|---|---|---|---|
2006CA7920001 | 50 | Muir Pass | 1.45 | -1.23 | 4.73 | cryic | 15 | 15 | 0.08 |
2012CA7921062 | 50 | Dusy Basin | 5.00 | 1.18 | 10.18 | cryic | 9 | 6 | 0.16 |
2015CA7921071 | 50 | Tyndall | 6.15 | 1.68 | 11.77 | cryic | 7 | 6 | 0.11 |
S2012CA019001 | 50 | Littlepete | 4.93 | 1.29 | 9.79 | cryic | 6 | 5 | 0.14 |
S2012CA019002 | 50 | LeConte | 6.38 | 2.01 | 11.31 | cryic | 10 | 10 | 0.08 |
S2012CA019003 | 50 | McDermand | 4.02 | 1.04 | 8.14 | cryic | 10 | 10 | 0.08 |
S2012CA019004 | 50 | Evolution | 4.68 | 1.38 | 9.15 | cryic | 10 | 10 | 0.09 |
S2013CA019003 | 20 | CedarGrove | 12.17 | 4.65 | 20.38 | mesic | 8 | 5 | 0.24 |
S2013CA019003 | 50 | CedarGrove | 11.59 | 5.82 | 17.64 | mesic | 8 | 5 | 0.24 |
S2013CA019003 | 100 | CedarGrove | 11.22 | 7.27 | 15.15 | mesic | 8 | 5 | 0.24 |
S2013CA019003 | 8 | CedarGrove | 11.76 | 3.62 | 21.21 | mesic | 7 | 3 | 0.30 |
S2013CA107001 | 50 | MineralKingRd | -61.66 | -71.64 | -34.59 | gelic | 6 | 4 | 0.32 |
S2013CA107001 | 20 | MineralKingRd | -67.04 | -82.13 | -27.30 | gelic | 7 | 4 | 0.30 |
S2013CA107001 | 100 | MineralKingRd | -57.56 | -70.90 | -24.47 | gelic | 8 | 4 | 0.27 |
S2013CA107001 | 8 | MineralKingRd | -61.70 | -73.04 | -33.32 | gelic | 6 | 4 | 0.33 |
S2013CA107002 | 5 | ParkRidge | 7.06 | 2.74 | 13.04 | frigid | 1 | 0 | 0.42 |
S2013CA107002 | 20 | ParkRidge | 7.43 | 3.93 | 12.66 | cryic | 1 | 0 | 0.47 |
S2013CA107002 | 50 | ParkRidge | 7.50 | 5.09 | 11.00 | cryic | 1 | 0 | 0.59 |
S2013CA107002 | 100 | ParkRidge | 7.25 | 6.35 | 9.39 | cryic | 0 | 0 | 0.67 |
S2013CA107002 | 8 | ParkRidge | 7.38 | 3.10 | 13.66 | frigid | 1 | 0 | 0.47 |
S2014CA107005 | 8 | Ashmountain | 20.45 | 11.77 | 29.76 | thermic | 8 | 6 | 0.13 |
S2014CA107005 | 20 | Ashmountain | 20.34 | 12.26 | 28.80 | thermic | 8 | 6 | 0.13 |
S2014CA107005 | 50 | Ashmountain | 22.55 | 15.81 | 29.23 | hyperthermic | 8 | 6 | 0.13 |
S2014CA107005 | 100 | Ashmountain | 20.89 | 15.49 | 25.87 | thermic | 8 | 6 | 0.13 |
S2014CA107006 | 20 | Tharpslog-MinKingRd | 9.25 | 3.18 | 16.39 | mesic | 2 | 1 | 0.31 |
S2014CA107006 | 50 | Tharpslog-MinKingRd | 9.16 | 4.78 | 14.25 | mesic | 2 | 1 | 0.40 |
S2014CA107008 | 20 | Tharpslog-GiantForest | 8.61 | 3.51 | 14.77 | mesic | 3 | 3 | 0.24 |
S2014CA107008 | 50 | Tharpslog-GiantForest | 8.35 | 4.51 | 12.77 | mesic | 3 | 3 | 0.32 |
Get water level data associated with the “NorthernNY_watertable”
project. Water level data are found in x$waterlevel
. All of
these sensors have been installed at 100cm depth.
x <- fetchHenry(project='NorthernNY_watertable', gran = 'day', what='waterlevel')
Check data availability.
HenryTimeLine(x$waterlevel, main='Water Level', col='RoyalBlue')
Simple time-series style plot, each panel is a water level sensor.
xyplot(
sensor_value ~ date_time | sensor_name, data=x$waterlevel,
subset=sensor_depth == 100,
type='l', scales=list(alternating=3, cex=0.75),
par.settings = tactile.theme(),
par.strip.text=list(cex=0.85), strip=strip.custom(bg='grey'), as.table=TRUE, xlab='', ylab='Water Level (cm)',
main='Sensor Installed at 100cm',
panel=function(...) {
panel.grid(-1, -1)
panel.abline(h=0, col='red', lty=2)
panel.xyplot(...)
}
)
Apply a threshold and plot TRUE (blue) / FALSE (grey): water level > 25 cm depth.
levelplot(
factor(sensor_value > -25) ~ doy * factor(year) | sensor_name, main='Daily Water Level > 25 cm Depth',
data=x$waterlevel,
subset=sensor_depth == 100,
col.regions=c('grey', 'RoyalBlue'), cuts=1,
colorkey=FALSE, as.table=TRUE, scales=list(alternating=3, cex=0.75),
par.strip.text=list(cex=0.85), strip=strip.custom(bg='yellow'),
xlab='Day of Year', ylab='Year',
par.settings = tactile.theme()
)
Aggregate representation of water level data, by sensor / year:
# aggregation of water level data
# i: data.frame, typically subset by ID / year
# level: water table level used for threshold
water.level.threshold <- function(i, level) {
# apply threshold
thresh <- i$sensor_value > level
# compute total records with water level > threshold
total.records <- length(which(thresh))
# fraction of records > threshold
fraction <- total.records / length(na.omit(i$sensor_value))
# get runs of thresholded data
r <- rle(as.integer(na.omit(thresh)))
idx <- which(r$values == 1)
if(length(idx) > 0)
consecutive.records <- max(r$lengths[idx])
else
consecutive.records <- 0
# format and return
d <- data.frame(total=total.records, fraction=fraction, max.consecutive.records=consecutive.records)
return(d)
}
# aggregate water level data
wl.agg <- ddply(x$waterlevel, c('sensor_name', 'year'), .fun=water.level.threshold, level=-25)
sensor_name | year | total | fraction | max.consecutive.records |
---|---|---|---|---|
Beiler_Lyons | 2006 | 81 | 0.4792899 | 76 |
Beiler_Lyons | 2007 | 192 | 0.5378151 | 127 |
Beiler_Lyons | 2008 | 267 | 0.7355372 | 130 |
Beiler_Lyons | 2009 | 273 | 0.7562327 | 155 |
Beiler_Lyons | 2010 | 200 | 0.5494505 | 112 |
Beiler_Lyons | 2011 | 27 | 0.1901408 | 14 |
Graphical representation of annual water level summaries.
cols <- colorRampPalette(hcl.colors(11, 'RdYlBu'), space='Lab', interpolate='spline', bias=1.75)
levelplot(max.consecutive.records ~ factor(year) * factor(sensor_name), data=wl.agg, col.regions = cols, main='Maximum Consecutive Days\n> 25cm Water Level Depth', xlab='', ylab='', par.settings = tactile.theme())
levelplot(fraction ~ factor(year) * factor(sensor_name), data=wl.agg, col.regions = cols, main='Fraction of Days / Year\n> 25cm Water Level Depth', xlab='', ylab='', par.settings = tactile.theme())
Combine water level data from Henry DB and daily precipitation data from nearby SCAN station (suggested by Ben Marshall).
This example requires the latticeExtra
package and a
couple of small adjustments to make the Henry and SCAN data
compatible.
# get specific data from Henry:
# water level data only
# as daily means
# do not fill missing days with NA (adjust as needed)
x <- fetchHenry(project='MD021', what='waterlevel', gran = 'day', pad.missing.days = FALSE)
x <- subset(x$waterlevel, name == "Hatboro-155")
# convert Henry date/time into `Date` class: compatibility with SCAN data
x$date_time <- as.Date(x$date_time)
# get data from nearby SCAN station for 2015--2017
x.scan <- fetchSCAN(site.code=2049, year=c(2015,2016,2017))
When working with Date
class values on the x-axis, it is
often more convenient to derive the tick locations by hand. Here we
generate the start.date
and stop.date
tick
location using the oldest and latest water level records, then pad with
14 days. seq.Date()
is helpful for generating a regular
sequence of dates that span the calcuated “start” and “stop” dates.
# create a new date axis
# using the limits of the water level data and pad days
start.date <- min(x$date_time) - 14
stop.date <- max(x$date_time) + 14
date.axis <- seq.Date(start.date, stop.date, by='2 months')
Lattice plots can be saved to regular R objects for further manipulation.
# plot water level data, save to object
p.1 <- xyplot(sensor_value ~ date_time, data=x, type=c('l', 'g'), cex=0.75, ylab='Water Level (cm)', xlab='', scales=list(x=list(at=date.axis, format="%b\n%Y"), y=list(tick.number=10)), par.settings = tactile.theme())
# plot precip data, save to object
p.2 <- xyplot(value ~ Date, data=x.scan$PRCP, as.table=TRUE, type=c('h'), strip=strip.custom(bg=grey(0.80)), scales=list(x=list(at=date.axis, format="%b\n%Y")), ylab='Precipitation (in)', par.settings = tactile.theme())
# combine plots into panels (latticeExtra feature)
p.3 <- c(p.1, p.2, layout=c(1,2), x.same=TRUE)
Make final adjustments and plot:
start.date
–stop.date
, define by the water
level dataupdate(p.3, scales=list(alternating=3, y=list(rot=0)), ylab=c('Water Level (cm)', 'Precipitation (in)'), main='Daily Values', xlim=c(start.date, stop.date), panel=function(...) {
panel.xyplot(...)
panel.abline(v=date.axis, col='grey', lty=3)
panel.grid(h = -1, v=0, col='grey', lty=3)
})
Save sites as shape file.
Compute frost-free days.
Fit a simple model relating MAST to MAAT (PRISM) using soil temperature data from the several MLRA SSO.
m <- subset(res, subset=sensor_depth == 50)
plot(MAST ~ maat, data=m)
dd <- datadist(m)
options(datadist="dd")
(m.ols <- ols(MAST ~ rcs(maat), data=m))
plot(Predict(m.ols, conf.type = 'simultaneous'), ylab='MAST', xlab='MAAT (PRISM)')
This document is based on aqp
version 2.1.0 and
soilDB
version 2.8.5.