Introduction

This document demonstrates how to use the soilDB package to download data from the Henry Mount soil climate database. Soil climate data are routinely collected by SSO staff via buried sensor/data-logger devices (“hobos”) and now above ground weather stations. The Henry Mount Soil Climate database was established to assist with the management and analysis of these data.

Setup R Environment

With a recent version of R (>= 2.15), it is possible to get all of the packages that this tutorial depends on via:

# run these commands in the R console
install.packages('tactile', dep=TRUE)
install.packages('latticeExtra', dep=TRUE)
install.packages('reshape', dep=TRUE)
install.packages('dismo', dep=TRUE)
install.packages('soilDB', dep=TRUE)
install.packages('sharpshootR', dep=TRUE)
# get latest version from GitHub
install.packages('devtools', dep=TRUE)
devtools::install_github("ncss-tech/soilDB", dependencies=FALSE, upgrade_dependencies=FALSE)

Getting and Viewing Data

Soil climate data can be queried by:

project (typically a soil survey area, “CA630”)
NASIS user site ID (e.g. “2006CA7920001”)
MLRA soil survey office (e.g. “2-SON”)

and optionally filtered by:

start date (“YYYY-MM-DD”)
end date (“YYYY-MM-DD”)
sensor type:
- “all”: all available time series data
- “soiltemp”: soil temperature time series
- “soilVWC”: soil volumetric water content time series
- “airtemp”: air temperature time series
- “waterlevel”: water level time series

and aggregated to the following granularity:

“hour” (houly data are returned if available)
“day” (MAST and mean summer/winter temperatures are automatically computed)
“week”
“month”
“year”

Query daily sensor data associated with the Sequoia / Kings Canyon soil survey.

library(soilDB)
library(sharpshootR)
library(latticeExtra)
library(tactile)
library(plyr)


# get soil temperature, soil moisture, and air temperature data
x <- fetchHenry(project = 'CA792')

# check object structure:
str(x, 2)

Quick listing of essential site-level data. “Functional years” is the number of years of non-missing data, after grouping data by Day of Year. “Complete years” is the number of years that have 365 days of non-missing data. “dslv” is the number of days since the data-logger was last visited.

# convert into data.frame
d <- as.data.frame(x$sensors)
# keep only information on soil temperature sensors at 50cm
d <- subset(d, subset = sensor_type == 'soiltemp' & sensor_depth == 50)
# check top 6 rows and select columns

user_site_id	name	sensor_depth	MAST	Winter	Summer	STR	functional.yrs	complete.yrs	dslv
2006CA7920001	Muir Pass	50	1.45	-1.23	4.73	cryic	15	15	463
2012CA7921062	Dusy Basin	50	5.00	1.18	10.18	cryic	9	6	463
2015CA7921071	Tyndall	50	6.15	1.68	11.77	cryic	7	6	810
S2012CA019001	Littlepete	50	4.93	1.29	9.79	cryic	6	5	2275
S2012CA019002	LeConte	50	6.38	2.01	11.31	cryic	10	10	463
S2012CA019003	McDermand	50	4.02	1.04	8.14	cryic	10	10	463

The tables with sensor data (e.g. x$soiltemp) look like this:

sid	date_time	sensor_value	year	doy	month	season	water_year	water_day	name	sensor_name	sensor_depth
26	2006-01-01	NA	2006	1	Jan	Winter	2006	93	Muir Pass-50	Muir Pass	50
26	2006-01-02	NA	2006	2	Jan	Winter	2006	94	Muir Pass-50	Muir Pass	50
26	2006-01-03	NA	2006	3	Jan	Winter	2006	95	Muir Pass-50	Muir Pass	50
26	2006-01-04	NA	2006	4	Jan	Winter	2006	96	Muir Pass-50	Muir Pass	50
26	2006-01-05	NA	2006	5	Jan	Winter	2006	97	Muir Pass-50	Muir Pass	50
26	2006-01-06	NA	2006	6	Jan	Winter	2006	98	Muir Pass-50	Muir Pass	50

Make a simple graphical timeline of the data by sensor.

HenryTimeLine(x$soiltemp, main = 'Soil Temperature Records', col = 'RoyalBlue')

HenryTimeLine(x$soilVWC, main = 'Soil VWC Records', col = 'RoyalBlue')

Basic quality control, check for impossible values

# note presence of very large negative values
summary(x$soiltemp$sensor_value)

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
## -888.880    2.728    9.387   -2.036   15.543   42.523    20092

summary(x$soilVWC$sensor_value)

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
## -888.880    0.080    0.184  -24.294    0.257    0.718    13821

# set large negative values to NA
idx <- which(x$soiltemp$sensor_value < -20)
x$soiltemp$sensor_value[idx] <- NA

# set out of range values to NA
idx <- which(x$soilVWC$sensor_value < 0 | x$soilVWC$sensor_value > 0.55)
x$soilVWC$sensor_value[idx] <- NA

Plot Data

All of the following examples are based on lattice graphics. The syntax takes a little getting used to, but provides a very flexible framework for layout of grouped data into panels. Try adapting the examples below as a starting point for more complex or customized figures. Critical elements of the syntax include:

formula interface: sensor_value ~ date_time | sensor_name: y ~ x | facet-by-groups
data source: data=x$soiltemp: get data from the soiltemperature sensor records
optional subsetting: subset=sensor_depth == 50: keep only records at 50cm

# soil temperature at 50cm
xyplot(sensor_value ~ date_time | sensor_name, 
       data = x$soiltemp, subset = sensor_depth == 50, 
       main = 'Daily Soil Temperature (Deg. C) at 50cm', 
       type = c('l', 'g'), 
       as.table = TRUE, xlab = 'Date', ylab = 'Deg C', 
       scales = list(alternating=3, cex=0.75, x=list(rot=45)), 
       strip = strip.custom(bg='grey'),
       par.settings = tactile.theme()
       )

xyplot(sensor_value ~ date_time | sensor_name, 
       data=x$soilVWC, subset=sensor_depth == 50, 
       main='Daily Soil Moisture at 50cm', type=c('l', 'g'), 
       as.table=TRUE, xlab='Date', ylab='Volumetric Water Content', 
       scales=list(alternating=3, cex=0.75, x=list(rot=45)), 
       strip=strip.custom(bg = 'grey'),
       par.settings = tactile.theme()
       )

One approach for investigating data gaps, blue: data, grey: no data.

levelplot(
  factor(!is.na(sensor_value)) ~ doy * factor(year) | sensor_name, 
  data=x$soiltemp,
  subset=sensor_depth == 50, 
  main='Daily Soil Temperature (Deg. C) at 50cm',
  col.regions = c('grey', 'RoyalBlue'), cuts = 1, 
  colorkey=FALSE, as.table=TRUE, scales=list(alternating=3, cex=0.75), 
  par.strip.text=list(cex=0.85), strip=strip.custom(bg='yellow'), 
  xlab = 'Day of Year', ylab = 'Year',
  par.settings = tactile.theme()
)

Again, this time only include 2013-2017.

levelplot(
  factor(!is.na(sensor_value)) ~ doy * factor(year) | sensor_name, 
  data = x$soiltemp,
  subset = sensor_depth == 50 & year %in% 2013:2017, 
  main = 'Daily Soil Temperature (Deg. C) at 50cm',
  col.regions = c('grey', 'RoyalBlue'), cuts = 1, 
  colorkey = FALSE, as.table = TRUE, scales=list(alternating=3, cex=0.75), 
  par.strip.text=list(cex=0.85), strip=strip.custom(bg='yellow'), 
  xlab='Day of Year', ylab='Year',
  par.settings = tactile.theme()
)

Soil moisture data by calendar years and Julian day.

levelplot(
  factor(!is.na(sensor_value)) ~ doy * factor(year) | sensor_name, main='Daily Soil Moisture at 50cm',
  data=x$soilVWC, subset=sensor_depth == 50, 
  col.regions=c('grey', 'RoyalBlue'), cuts=1, 
  colorkey=FALSE, as.table=TRUE, scales=list(alternating=3, cex=0.75), 
  par.strip.text=list(cex=0.85), strip=strip.custom(bg='yellow'), 
  xlab='Day', ylab='Year',
  par.settings = tactile.theme()
)

Soil moisture data by water year and day.

levelplot(
  factor(!is.na(sensor_value)) ~ water_day * factor(water_year) | sensor_name, main='Daily Soil Moisture at 50cm\nOctober 1 -- September 30',
  data=x$soilVWC, subset=sensor_depth == 50, 
  col.regions=c('grey', 'RoyalBlue'), cuts=1, 
  colorkey=FALSE, as.table=TRUE, scales=list(alternating=3, cex=0.75), 
  par.strip.text=list(cex=0.85), strip=strip.custom(bg='yellow'), 
  xlab='Water Day', ylab='Water Year',
  par.settings = tactile.theme()
)

Comparison between years, faceted by sensor name.

# generate some better colors
cols.temp <- colorRampPalette(rev(hcl.colors(11, 'RdYlBu')), space='Lab', interpolate='spline')

levelplot(
  sensor_value ~ doy * factor(year) | sensor_name, main='Daily Soil Temperature (Deg. C) at 50cm',
  data=x$soiltemp, col.regions=cols.temp,
  subset=sensor_depth == 50 & year %in% 2013:2017,
  colorkey=list(space='top'), as.table=TRUE, scales=list(alternating=3, cex=0.75), 
  par.strip.text=list(cex=0.85), strip=strip.custom(bg='grey'), 
  xlab='Day of Year', ylab='Year',
  par.settings = tactile.theme()
)

Comparison between sensor depths, faceted by sensor name.

# generate some better colors
cols.vwc <- colorRampPalette(hcl.colors(11, 'RdYlBu'), space='Lab', interpolate='spline')

levelplot(
  sensor_value ~ doy * factor(sensor_depth) | sensor_name, main='Daily Soil Moisture',
  data=x$soilVWC, col.regions=cols.vwc,
  subset=year == 2015,
  colorkey=list(space='top'), as.table=TRUE, scales=list(alternating=3, cex=0.75), 
  par.strip.text=list(cex=0.85), strip=strip.custom(bg='grey'), 
  xlab='Day of Year', ylab='Sensor Depth (cm)',
  par.settings = tactile.theme()
)

Facet data (organize into panels) by a combination of sensor name and depth.

p <- levelplot(
  sensor_value ~ doy * factor(year) | sensor_name + factor(sensor_depth), main='Daily Soil Moisture',
  data=x$soilVWC, col.regions=cols.vwc,
  colorkey=list(space='top'), as.table=TRUE, scales=list(alternating=3, cex=0.75), 
  par.strip.text=list(cex=0.85), 
  xlab='Day of Year', ylab='Year',
  par.settings = tactile.theme()
)

useOuterStrips(p, strip=strip.custom(bg='grey'), strip.left = strip.custom(bg='grey', horizontal=FALSE))

Same idea, this time with soil temperature data.

p <- levelplot(
  sensor_value ~ doy * factor(year) | sensor_name + factor(sensor_depth), main='Daily Soil Temperature',
  data=x$soiltemp, col.regions=cols.temp,
  subset=sensor_name %in% c('Ashmountain', 'CedarGrove', 'LeConte'), 
  colorkey=list(space='top'), as.table=TRUE, scales=list(alternating=3, cex=0.75), 
  par.strip.text=list(cex=0.85), 
  xlab='Day of Year', ylab='Year',
  par.settings = tactile.theme()
)

useOuterStrips(p, strip=strip.custom(bg='grey'), strip.left = strip.custom(bg='grey', horizontal=FALSE))

Aggregate over years by sensor / Day of Year, 50cm data only.

# compute MAST by sensor
a <- ddply(x$soiltemp[which(x$soiltemp$sensor_depth == 50), ], c('sensor_name', 'doy'), .fun=plyr::summarise, soiltemp=mean(sensor_value, na.rm = TRUE))

# re-order sensor names according to MAST
a.mast <- sort(tapply(a$soiltemp, a$sensor_name, median, na.rm=TRUE))
a$sensor_name <- factor(a$sensor_name, levels=names(a.mast))

levelplot(
  soiltemp ~ doy * sensor_name, 
  main = 'Median Soil Temperature (Deg. C) at 50cm',
  data = a, 
  col.regions = cols.temp, 
  xlab = 'Day of Year', ylab='',
  colorkey = list(space='top'), scales=list(alternating=3, cex=0.75, x=list(tick.number=15)),
  par.settings = tactile.theme()
)

Convert data to percent saturation and flag days with pct. saturation >= 50%.

fun <- function(i) {
  i$pct.sat <- i$sensor_value / max(i$sensor_value, na.rm = TRUE)
  return(i)
}

# flag days with percent saturation >= 50%
z <- ddply(x$soilVWC, c('sid', 'year'), .fun=fun)
z$pct.sat <- factor(z$pct.sat >= 0.5, levels = c('TRUE', 'FALSE'), labels = c('Moist', 'Dry'))

p <- levelplot(
  pct.sat ~ doy * factor(year) | sensor_name + factor(sensor_depth), main='Percent Saturation >= 50%',
  data=z, 
  col.regions=c('grey', 'RoyalBlue'), cuts=1,
  colorkey=FALSE, as.table=TRUE, scales=list(alternating=3, cex=0.75), 
  par.strip.text=list(cex=0.85), 
  xlab='Day of Year', ylab='Year',
  par.settings = tactile.theme()
)

useOuterStrips(p, strip=strip.custom(bg='grey'), strip.left = strip.custom(bg='grey', horizontal=FALSE))

Data Summaries

In the presence of missing data, MAST calculations will be biased towards those data that are not missing. For example, a block of missing data in January will result in an estimated MAST that is too high due to the missing data from the middle of winter. It is possible to estimate (mostly) unbiased MAST values in the presence of some missing data by averaging multiple years of data by Day of Year. This approach will generate reasonable summaries in the presence of missing data, as long as data gaps are “covered” by corresponding data from another year. The longer the period of record and shorter the data gaps, the better.

Soil temperature regime assignment for gelic, cryic, and frigid conditions typically require additional information and are marked with an ’*’.

When daily data are queried, unbiased summaries and indices of data “completeness” are calculated.

as.data.frame(x$sensors)[which(!is.na(x$sensors$MAST)), c('user_site_id', 'sensor_depth', 'name', 'MAST', 'Winter', 'Summer', 'STR', 'functional.yrs', 'complete.yrs', 'gap.index')]

user_site_id	sensor_depth	name	MAST	Winter	Summer	STR	functional.yrs	complete.yrs	gap.index
2006CA7920001	50	Muir Pass	1.45	-1.23	4.73	cryic	15	15	0.08
2012CA7921062	50	Dusy Basin	5.00	1.18	10.18	cryic	9	6	0.16
2015CA7921071	50	Tyndall	6.15	1.68	11.77	cryic	7	6	0.11
S2012CA019001	50	Littlepete	4.93	1.29	9.79	cryic	6	5	0.14
S2012CA019002	50	LeConte	6.38	2.01	11.31	cryic	10	10	0.08
S2012CA019003	50	McDermand	4.02	1.04	8.14	cryic	10	10	0.08
S2012CA019004	50	Evolution	4.68	1.38	9.15	cryic	10	10	0.09
S2013CA019003	20	CedarGrove	12.17	4.65	20.38	mesic	8	5	0.24
S2013CA019003	50	CedarGrove	11.59	5.82	17.64	mesic	8	5	0.24
S2013CA019003	100	CedarGrove	11.22	7.27	15.15	mesic	8	5	0.24
S2013CA019003	8	CedarGrove	11.76	3.62	21.21	mesic	7	3	0.30
S2013CA107001	50	MineralKingRd	-61.66	-71.64	-34.59	gelic	6	4	0.32
S2013CA107001	20	MineralKingRd	-67.04	-82.13	-27.30	gelic	7	4	0.30
S2013CA107001	100	MineralKingRd	-57.56	-70.90	-24.47	gelic	8	4	0.27
S2013CA107001	8	MineralKingRd	-61.70	-73.04	-33.32	gelic	6	4	0.33
S2013CA107002	5	ParkRidge	7.06	2.74	13.04	frigid	1	0	0.42
S2013CA107002	20	ParkRidge	7.43	3.93	12.66	cryic	1	0	0.47
S2013CA107002	50	ParkRidge	7.50	5.09	11.00	cryic	1	0	0.59
S2013CA107002	100	ParkRidge	7.25	6.35	9.39	cryic	0	0	0.67
S2013CA107002	8	ParkRidge	7.38	3.10	13.66	frigid	1	0	0.47
S2014CA107005	8	Ashmountain	20.45	11.77	29.76	thermic	8	6	0.13
S2014CA107005	20	Ashmountain	20.34	12.26	28.80	thermic	8	6	0.13
S2014CA107005	50	Ashmountain	22.55	15.81	29.23	hyperthermic	8	6	0.13
S2014CA107005	100	Ashmountain	20.89	15.49	25.87	thermic	8	6	0.13
S2014CA107006	20	Tharpslog-MinKingRd	9.25	3.18	16.39	mesic	2	1	0.31
S2014CA107006	50	Tharpslog-MinKingRd	9.16	4.78	14.25	mesic	2	1	0.40
S2014CA107008	20	Tharpslog-GiantForest	8.61	3.51	14.77	mesic	3	3	0.24
S2014CA107008	50	Tharpslog-GiantForest	8.35	4.51	12.77	mesic	3	3	0.32

Water Level Data

Get water level data associated with the “NorthernNY_watertable” project. Water level data are found in x$waterlevel. All of these sensors have been installed at 100cm depth.

x <- fetchHenry(project='NorthernNY_watertable', gran = 'day', what='waterlevel')

Check data availability.

HenryTimeLine(x$waterlevel, main='Water Level', col='RoyalBlue')

Simple time-series style plot, each panel is a water level sensor.

xyplot(
  sensor_value ~ date_time | sensor_name, data=x$waterlevel,
  subset=sensor_depth == 100,
  type='l', scales=list(alternating=3, cex=0.75),
  par.settings = tactile.theme(),
  par.strip.text=list(cex=0.85), strip=strip.custom(bg='grey'), as.table=TRUE, xlab='', ylab='Water Level (cm)',
  main='Sensor Installed at 100cm',
  panel=function(...) {
    panel.grid(-1, -1)
    panel.abline(h=0, col='red', lty=2)
    panel.xyplot(...)
  }
)

Apply a threshold and plot TRUE (blue) / FALSE (grey): water level > 25 cm depth.

levelplot(
  factor(sensor_value > -25) ~ doy * factor(year) | sensor_name, main='Daily Water Level > 25 cm Depth',
  data=x$waterlevel, 
  subset=sensor_depth == 100,
  col.regions=c('grey', 'RoyalBlue'), cuts=1, 
  colorkey=FALSE, as.table=TRUE, scales=list(alternating=3, cex=0.75), 
  par.strip.text=list(cex=0.85), strip=strip.custom(bg='yellow'), 
  xlab='Day of Year', ylab='Year',
  par.settings = tactile.theme()
)

Aggregate representation of water level data, by sensor / year:

total records > threshold water level depth
fraction of annual records > threshold water level depth
maximum consecutive records > threshold water level depth

# aggregation of water level data
# i: data.frame, typically subset by ID / year
# level: water table level used for threshold
water.level.threshold <- function(i, level) {
  # apply threshold
  thresh <- i$sensor_value > level
  # compute total records with water level > threshold
  total.records <- length(which(thresh))
  # fraction of records > threshold
  fraction <- total.records / length(na.omit(i$sensor_value))
  
  # get runs of thresholded data
  r <- rle(as.integer(na.omit(thresh)))
  idx <- which(r$values == 1)
  if(length(idx) > 0)
    consecutive.records <- max(r$lengths[idx])
  else
    consecutive.records <- 0
  
  # format and return
  d <- data.frame(total=total.records, fraction=fraction, max.consecutive.records=consecutive.records)
  return(d)
}

# aggregate water level data
wl.agg <- ddply(x$waterlevel, c('sensor_name', 'year'), .fun=water.level.threshold, level=-25)

sensor_name	year	total	fraction	max.consecutive.records
Beiler_Lyons	2006	81	0.4792899	76
Beiler_Lyons	2007	192	0.5378151	127
Beiler_Lyons	2008	267	0.7355372	130
Beiler_Lyons	2009	273	0.7562327	155
Beiler_Lyons	2010	200	0.5494505	112
Beiler_Lyons	2011	27	0.1901408	14

Graphical representation of annual water level summaries.

cols <- colorRampPalette(hcl.colors(11, 'RdYlBu'), space='Lab', interpolate='spline', bias=1.75)

levelplot(max.consecutive.records ~ factor(year) * factor(sensor_name), data=wl.agg, col.regions = cols, main='Maximum Consecutive Days\n> 25cm Water Level Depth', xlab='', ylab='', par.settings = tactile.theme())

levelplot(fraction ~ factor(year) * factor(sensor_name), data=wl.agg, col.regions = cols, main='Fraction of Days / Year\n> 25cm Water Level Depth', xlab='', ylab='', par.settings = tactile.theme())

Water Level and Precipitation

Combine water level data from Henry DB and daily precipitation data from nearby SCAN station (suggested by Ben Marshall).

This example requires the latticeExtra package and a couple of small adjustments to make the Henry and SCAN data compatible.

# get specific data from Henry:
# water level data only
# as daily means
# do not fill missing days with NA (adjust as needed)
x <- fetchHenry(project='MD021', what='waterlevel', gran = 'day', pad.missing.days = FALSE)
x <- subset(x$waterlevel, name == "Hatboro-155")

# convert Henry date/time into `Date` class: compatibility with SCAN data
x$date_time <- as.Date(x$date_time)

# get data from nearby SCAN station for 2015--2017
x.scan <- fetchSCAN(site.code=2049, year=c(2015,2016,2017))

When working with Date class values on the x-axis, it is often more convenient to derive the tick locations by hand. Here we generate the start.date and stop.date tick location using the oldest and latest water level records, then pad with 14 days. seq.Date() is helpful for generating a regular sequence of dates that span the calcuated “start” and “stop” dates.

# create a new date axis
# using the limits of the water level data and pad days
start.date <- min(x$date_time) - 14
stop.date <- max(x$date_time) + 14
date.axis <- seq.Date(start.date, stop.date, by='2 months')

Lattice plots can be saved to regular R objects for further manipulation.

# plot water level data, save to object
p.1 <- xyplot(sensor_value ~ date_time, data=x, type=c('l', 'g'), cex=0.75, ylab='Water Level (cm)', xlab='', scales=list(x=list(at=date.axis, format="%b\n%Y"), y=list(tick.number=10)), par.settings = tactile.theme())

# plot precip data, save to object
p.2 <- xyplot(value ~ Date, data=x.scan$PRCP, as.table=TRUE, type=c('h'), strip=strip.custom(bg=grey(0.80)), scales=list(x=list(at=date.axis, format="%b\n%Y")), ylab='Precipitation (in)', par.settings = tactile.theme())

# combine plots into panels (latticeExtra feature)
p.3 <- c(p.1, p.2, layout=c(1,2), x.same=TRUE)

Make final adjustments and plot:

fix scale labels
add a y-axis label for each panel
add a title
truncate x-axis to the interval start.date–stop.date, define by the water level data
add a custom set of grid lines to each panel

update(p.3, scales=list(alternating=3, y=list(rot=0)), ylab=c('Water Level (cm)', 'Precipitation (in)'), main='Daily Values', xlim=c(start.date, stop.date), panel=function(...) {
  panel.xyplot(...)
  panel.abline(v=date.axis, col='grey', lty=3)
  panel.grid(h = -1, v=0, col='grey', lty=3)
})

Additional Ideas

Save sites as shape file.

Compute frost-free days.

Fit a simple model relating MAST to MAAT (PRISM) using soil temperature data from the several MLRA SSO.

m <- subset(res, subset=sensor_depth == 50)

plot(MAST ~ maat, data=m)

dd <- datadist(m)
options(datadist="dd")

(m.ols <- ols(MAST ~ rcs(maat), data=m))
plot(Predict(m.ols, conf.type = 'simultaneous'), ylab='MAST', xlab='MAAT (PRISM)')

This document is based on aqp version 2.1.0 and soilDB version 2.8.5.

Henry Mount Soil Climate Database Tutorial

D.E. Beaudette and J. Wood

2024-12-17