This is a short tutorial on how to interact with the Soil Data Access (SDA) web-service using R. Queries are written using a dialect of SQL. On first glance SQL appears similar to the language used to write NASIS queries and reports, however, these are two distinct languages. Soil Data Access is a “window” into the spatial and tabular data associated with the current SSURGO snapshot. Queries can contain spatial and tabular filters. If you are new to SDA or SQL, have a look at this page. Here is a SSURGO-specific library of SQL code, organized by task.
Spatial queries can be included in SQL statements submitted to SDA as
long as the geometry has first been transformed to WGS84 geographic
coordinates and formatted as “well known
text” (WKT). The sf
and terra
packages
provide functionality for converting between coordinate systems.
Coordinate reference system definitions (a “CRS”) are typically provided
using EPSG codes. You can search for various CRS definitions in a
variety of formats using spatialreference.org/.
The soilDB
library for R provides a helper function
(SDA_query()
) for submitting queries to SDA, processing the
result, and reformatting the results into a rectangular table (a
data.frame
). Most of the work required to use the
SDA_query()
function will be writing SQL to describe the
columns you would like returned and how the data should be filtered and
possibly grouped. The SDA_spatialQuery()
function
simplifies the process of using spatial data (sf
,
spatVector
, or Spatial*
objects) to query map
unit polygons.
Follow along with the blocks of code below by copying / pasting into
a new R “script” document. Each block of command can be run by pasting
into the R console, or by “stepping through” lines of code by moving the
cursor to the top of a block (in the R script panel) and repeatedly
pressing ctrl + enter. Note that knittr:kable
is used to
create cleaner output in the tutorial and isn’t required while following
along.
If you are feeling adventurous, have a look at a draft tutorial on queries that return geometry from SDA. Additional tips on advanced spatial queries can be found here.
SSURGO (1:24k soil survey) and STATSGO (1:250k soil survey) records are stored together in SDA. Therefore, it is critical that evey query to SDA include some kind of filter for selecting the appropriate records. Filtering strategies include:
legend.areasymbol != 'US'
in the WHERE clauseSDA_Get_Mukey_from_intersection_with_WktWgs84()
mukey
, cokey
, etc.Explicit exclusion of STATSGO records:
SELECT [...]
FROM legend
INNER JOIN mapunit mu ON mu.lkey = legend.lkey
INNER JOIN component co ON mu.mukey = co.mukey
WHERE legend.areasymbol != 'US' ;
Implicit exclusion of STATSGO records:
SELECT [...]
FROM legend
INNER JOIN mapunit mu ON mu.lkey = legend.lkey
INNER JOIN component co ON mu.mukey = co.mukey
WHERE legend.areasymbol = 'CA113' ;
Be sure to explicitly CAST()
integer columns when using
operators such as multiplication, division, etc.. or when values may be
very large. Queries that work with large sets of muacres
from the mapunit
table are one such case. Implicit casting
is accomplished via operations that include higher precision values
(e.g. include values to the right of the decimal point).
Integer overflow, result is NULL
.
SELECT
SUM(muacres) AS ac
FROM legend
INNER JOIN mapunit mu ON mu.lkey = legend.lkey
WHERE legend.areasymbol != 'US' ;
Casting muacres
to numeric data type, result is
2,280,193,353 acres.
SELECT
SUM(CAST(muacres AS NUMERIC)) AS ac
FROM legend
INNER JOIN mapunit mu ON mu.lkey = legend.lkey
WHERE legend.areasymbol != 'US' ;
Implicit casting to numeric via division by floating point value, result is 2280.193 million acres.
SELECT
SUM(muacres / 1000000.0) AS ac
FROM legend
INNER JOIN mapunit mu ON mu.lkey = legend.lkey
WHERE legend.areasymbol != 'US' ;
Scaling of muacres
by component percentage, as converted
to a fraction. Result is wrong: 500 million acres.
SELECT
SUM((comppct_r / 100) * muacres) / 1000000 AS ac
FROM legend
INNER JOIN mapunit mu ON mu.lkey = legend.lkey
INNER JOIN component AS co ON mu.mukey = co.mukey
WHERE legend.areasymbol != 'US' ;
Casting comppct_r
and muacres
to numeric,
result is correct: 2242.39 million acres. Note that there are still some
components with NULL comppct_r
, will be fixed
eventually.
SELECT
SUM((CAST(comppct_r AS NUMERIC) / 100) * CAST(muacres AS NUMERIC)) / 1000000 AS ac
FROM legend
INNER JOIN mapunit mu ON mu.lkey = legend.lkey
INNER JOIN component AS co ON mu.mukey = co.mukey
WHERE legend.areasymbol != 'US' ;
Use this last pattern whenever converting component percentages to a
fraction, or scaling the muacres
field.
You only need to do this once. If you haven’t installed these packages, then copy the code below and paste into the RStudio “console” pane.
# run these commands in the R console
# stable version from CRAN + dependencies
install.packages("aqp", dep = TRUE)
install.packages("soilDB", dep = TRUE)
install.packages("terra", dep = TRUE)
install.packages("sf", dep = TRUE)
Now that you have the required packages, load them into the current R session.
library(aqp)
library(soilDB)
library(terra)
library(sf)
When was the CA653 survey area last exported to SSURGO?
SDA_query("SELECT areasymbol, saverest FROM sacatalog WHERE areasymbol = 'CA653'")
## areasymbol saverest
## 1 CA653 9/1/2022 2:12:46 PM
Are there any survey areas that haven’t been updated since Jan 1, 2019? Just STATSGO.
SDA_query("SELECT areasymbol, saverest FROM sacatalog WHERE saverest < '01/01/2019'")
## areasymbol saverest
## 1 US 10/13/2016 12:28:22 PM
What is the most recently updated survey in CA?
SDA_query("SELECT areasymbol, saverest FROM sacatalog WHERE areasymbol LIKE 'CA%' ORDER BY saverest DESC")[1, ]
## areasymbol saverest
## 1 CA663 9/15/2022 2:27:44 PM
A simple query from the component table, for a single map unit:
mukey = '461958'
. This is a SSURGO map unit key, therefore
STATSGO records are implicitly removed from the results.
q <- "SELECT
mukey, cokey, comppct_r, compname, taxclname
FROM component
WHERE mukey = '461958'"
# run the query
res <- SDA_query(q)
# check
knitr::kable(head(res))
mukey | cokey | comppct_r | compname | taxclname |
---|---|---|---|---|
461958 | 22732982 | 4 | Galt | NA |
461958 | 22732983 | 3 | Kimball | NA |
461958 | 22732984 | 1 | Unnamed | NA |
461958 | 22732985 | 85 | San Joaquin | Fine, mixed, thermic Abruptic Durixeralfs |
461958 | 22732986 | 4 | Bruella | NA |
461958 | 22732987 | 3 | Hedge | NA |
Get a list of map units that contain “Amador” as minor component. Note that this type of query requires explicit exclusion of STATSGO records.
q <- "SELECT
muname, mapunit.mukey, cokey, compname, comppct_r
FROM legend
INNER JOIN mapunit ON mapunit.lkey = legend.lkey
INNER JOIN component on mapunit.mukey = component.mukey
WHERE
-- exclude STATSGO
legend.areasymbol != 'US'
AND compname LIKE '%amador%'
AND majcompflag = 'No'"
# run the query
res <- SDA_query(q)
# check
knitr::kable(head(res))
muname | mukey | cokey | compname | comppct_r |
---|---|---|---|---|
Pardee-Ranchoseco complex, 3 to 15 percent slopes | 461931 | 22732564 | Amador | 3 |
Pardee cobbly loam, 2 to 15 percent slopes | 2924951 | 23459437 | Amador | 10 |
Pardee-Ranchoseco complex, 3 to 15 percent slopes | 1420721 | 22485048 | Amador | 3 |
Hicksville sandy clay loam, 0 to 2 percent slopes, occasionally flooded | 1612051 | 22521674 | Amador | 3 |
Ultic Haploxeralfs-Mollic Haploxeralfs complex, 3 to 30 percent slopes | 2924833 | 22523521 | Amador | 5 |
Mckeonhills clay, 5 to 15 percent slopes | 1540892 | 23467419 | Amador | 1 |
# optionally save the results to CSV file
# write.csv(res, file='path-to-file.csv', row.names=FALSE)
Get basic map unit and component data for a single survey area, Yolo County (CA113). There is no need to exclude STATSGO records because we are specifying a SSURGO areasymbol in the WHERE clause.
q <- "SELECT
component.mukey, cokey, comppct_r, compname, taxclname,
taxorder, taxsuborder, taxgrtgroup, taxsubgrp
FROM legend
INNER JOIN mapunit ON mapunit.lkey = legend.lkey
INNER JOIN component ON component.mukey = mapunit.mukey
WHERE legend.areasymbol = 'CA113'"
# run the query
res <- SDA_query(q)
# check
knitr::kable(head(res))
mukey | cokey | comppct_r | compname | taxclname | taxorder | taxsuborder | taxgrtgroup | taxsubgrp |
---|---|---|---|---|---|---|---|---|
459154 | 22734238 | 100 | Water | NA | NA | NA | NA | NA |
459204 | 22734512 | 100 | Gravel pits | NA | NA | NA | NA | NA |
459205 | 22734700 | 100 | Water | NA | NA | NA | NA | NA |
459208 | 22734523 | 5 | Sehorn | NA | NA | NA | NA | NA |
459208 | 22734524 | 5 | Corning | NA | NA | NA | NA | NA |
459208 | 22734525 | 85 | Balcom | Fine-loamy, mixed, thermic Calcixerollic Xerochrepts | Inceptisols | Ochrepts | Xerochrepts | Calcixerollic Xerochrepts |
Cross tabulate the occurrence of components named “Auburn” and “Dunstone” with survey areasymbol. Note that this type of query requires explicit exclusion of STATSGO records.
q <- "SELECT areasymbol, component.mukey, cokey, comppct_r, compname, compkind, taxclname
FROM legend
INNER JOIN mapunit ON mapunit.lkey = legend.lkey
INNER JOIN component ON component.mukey = mapunit.mukey
WHERE compname IN ('Auburn', 'Dunstone')
-- exclude STATSGO
AND areasymbol != 'US'
ORDER BY areasymbol, compname"
res <- SDA_query(q)
xtabs(~ areasymbol + compname, data=res)
## compname
## areasymbol Auburn Dunstone
## CA067 9 0
## CA607 16 0
## CA612 8 19
## CA618 29 1
## CA619 24 1
## CA620 14 0
## CA624 24 0
## CA628 20 0
## CA630 2 2
## CA632 2 1
## CA644 15 1
## CA648 6 0
## CA649 11 3
## CA707 11 0
## CA731 0 2
## CA750 0 2
Get the map unit key and name at a single, manually-defined point (-121.77100 37.368402). Spatial queries using SDA helper functions automatically exclude STATSGO records.
q <- "SELECT mukey, muname
FROM mapunit
WHERE mukey IN (
SELECT * from SDA_Get_Mukey_from_intersection_with_WktWgs84('point(-121.77100 37.368402)')
)"
res <- SDA_query(q)
knitr::kable(res)
mukey | muname |
---|---|
1882921 | Diablo clay, 15 to 30 percent slopes, MLRA 15 |
Get the map names and mukey
values for a 1000m buffer
around a manually-defined point (-121.77100 37.368402). The
buffer()
function from the {terra} package can conveniently
derive planar buffers from geographic coordinates.
# the query point is in geographic coordinates / WGS84 datum
p <- vect(cbind(-121.77100, 37.368402), crs = 'epsg:4326')
# 1000m buffer (planar coordinates)
# result is still geographic coordinates
p.buff <- terra::buffer(p, 1000)
# convert to WKT
p.wkt <- as.data.frame(p.buff, geom = 'wkt')$geometry
q <- paste0("SELECT mukey, muname
FROM mapunit
WHERE mukey IN (
SELECT * from SDA_Get_Mukey_from_intersection_with_WktWgs84('", p.wkt, "')
)")
res <- SDA_query(q)
knitr::kable(head(res))
mukey | muname |
---|---|
456983 | Diablo clay, 9 to 15 percent slopes |
456993 | Gaviota loam, 15 to 30 percent slopes |
457017 | Los Gatos-Gaviota complex, 50 to 75 percent slopes |
1882920 | Diablo clay, 30 to 50 percent slopes, MLRA 15 |
1882921 | Diablo clay, 15 to 30 percent slopes, MLRA 15 |
1882923 | Alo-Altamont complex, 15 to 30 percent slopes |
Get some component data for a manually-defined bounding box, defined in WGS84 geographic coordinates.
# define a bounding box: xmin, xmax, ymin, ymax
#
# +-------------------(ymax, xmax)
# | |
# | |
# (ymin, xmin) ----------------+
# bbox extent
bb <- ext(-120.9, -120.8, 37.7, 37.8)
# convert to polygons
bb <- vect(bb, crs = 'epsg:4326')
# convert bounding box to WKT
bb <- as.data.frame(bb, geom = 'wkt')$geometry
# compose query, using WKT BBOX as filtering criteria
q <- paste0("SELECT mukey, cokey, compname, comppct_r
FROM component
WHERE mukey IN (SELECT DISTINCT mukey FROM SDA_Get_Mukey_from_intersection_with_WktWgs84('", bb, "') )
ORDER BY mukey, cokey, comppct_r DESC")
res <- SDA_query(q)
# check
knitr::kable(head(res))
mukey | cokey | compname | comppct_r |
---|---|---|---|
462527 | 23471173 | Madera | 10 |
462527 | 23471174 | Alamo | 85 |
462527 | 23471175 | San Joaquin | 5 |
462554 | 23471263 | Corning | 85 |
462554 | 23471264 | Redding | 5 |
462554 | 23471265 | Montpellier | 10 |
Compute polygon area via SDA and double-check via
terra::expanse()
. Note the complex syntax required to
convert polygons into “geography” type, thus invoking spherical geometry
and resulting in area values with units of square meters. Computing
polygon area server-side requires a more complex SQL statement, but is
much more efficient because the associated polygon data do not need to
be transferred from SDA to the local R session.
# much faster (10x or more), because area is calculated server-side
a <- SDA_query("
SELECT mukey,
GEOGRAPHY::STGeomFromWKB(
mupolygongeo.STUnion(
mupolygongeo.STStartPoint()
).STAsBinary(), 4326
).STArea() * 0.000247105 AS area_ac
FROM mupolygon WHERE mukey = '418254'
")
# slow, polygon geometry must be converted to text and downloaded
x <- fetchSDA_spatial('418254')
x <- vect(x)
# compute area in sq. meters then convert to acres
x$area <-expanse(x) * 0.000247105
# note that the distribution of polygon area is identical
options(scipen = 10)
round(quantile(a$area_ac))
## 0% 25% 50% 75% 100%
## 1 7 15 36 16632
round(quantile(x$area))
## 0% 25% 50% 75% 100%
## 1 7 15 36 16632
fetchSDA()
library(ggplot2)
library(gridExtra)
# query soil components by areasymbol and musym
test <- fetchSDA(WHERE = "areasymbol = 'IN005' AND musym = 'MnpB2'")
# profile plot
par(mar = c(0, 0, 0, 2.5))
plotSPC(test, name.style = 'center-center', cex.names = 0.8)
# convert the data for depth plot
clay_slice <- horizons(dice(test, 0:200 ~ claytotal_l + claytotal_r + claytotal_h))
names(clay_slice) <- gsub("claytotal_", "", names(clay_slice))
om_slice <- horizons(dice(test, 0:200 ~ om_l + om_r + om_h))
names(om_slice) <- gsub("om_", "", names(om_slice))
test2 <- rbind(data.frame(clay_slice, var = "clay"),
data.frame(om_slice, var = "om")
)
h <- merge(test2, site(test)[c("nationalmusym", "cokey", "compname", "comppct_r")],
by = "cokey",
all.x = TRUE
)
# depth plot of clay content by soil component
gg_comp <- function(x) {
ggplot(x) +
geom_line(aes(y = r, x = hzdept_r)) +
geom_line(aes(y = r, x = hzdept_r)) +
geom_ribbon(aes(ymin = l, ymax = h, x = hzdept_r), alpha = 0.2) +
xlim(200, 0) +
xlab("depth (cm)") +
facet_grid(var ~ nationalmusym + paste(compname, comppct_r)) +
coord_flip()
}
g1 <- gg_comp(subset(h, var == "clay"))
g2 <- gg_comp(subset(h, var == "om"))
grid.arrange(g1, g2)
# query all Miami major components
s <- get_component_from_SDA(WHERE = "compname = 'Miami' \n
AND majcompflag = 'Yes' AND areasymbol != 'US'")
# landform vs 3-D morphometry
test <- {
subset(s, ! is.na(landform) | ! is.na(geompos)) ->.;
split(., .$drainagecl, drop = TRUE) ->.;
lapply(., function(x) {
test = data.frame()
test = as.data.frame(table(x$landform, x$geompos))
test$compname = x$compname[1]
test$drainagecl = x$drainagecl[1]
names(test)[1:2] <- c("landform", "geompos")
return(test)
}) ->.;
do.call("rbind", .) ->.;
.[.$Freq > 0, ] ->.;
within(., {
landform = reorder(factor(landform), Freq, max)
geompos = reorder(factor(geompos), Freq, max)
geompos = factor(geompos, levels = rev(levels(geompos)))
}) ->.;
}
test$Freq2 <- cut(test$Freq,
breaks = c(0, 5, 10, 25, 50, 100, 150),
labels = c("<5", "5-10", "10-25", "25-50", "50-100", "100-150")
)
ggplot(test, aes(x = geompos, y = landform, fill = Freq2)) +
geom_tile(alpha = 0.5) + facet_wrap(~ paste0(compname, "\n", drainagecl)) +
discrete_scale("colour", "viridis", function(n) viridisLite::viridis(n)) +
theme(aspect.ratio = 1, axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1)) +
ggtitle("Landform vs 3-D Morphometry for Miami Major Components on SDA")
SDA_spatialQuery()
Spatial queries are usually simpler with a two step process via
SDA_spatialQuery()
to get a set of intersecting map unit
keys, then SDA_query()
to return additional tabular data.
Some things to keep in mind when using
SDA_spatial_query()
:
what = 'mupolygon'
to return polygon features and
computed area in acres (slower, higher resource usage)what = 'mukey'
to return only map unit keys
(faster, lower resource usage)Long-running queries and queries that consume too many server
resources will be canceled by SDA. See ?SDA_spatialQuery
for details.
# query map unit keys intersecting with the 1000m buffered point defined above
mu <- SDA_spatialQuery(p.buff, what = 'mukey', geomIntersection = TRUE)
# create a query to get component data for resulting map unit keys
sql <- sprintf("SELECT mukey, cokey, compname, comppct_r
FROM component
WHERE mukey IN %s
ORDER BY cokey, comppct_r DESC", format_SQL_in_statement(mu$mukey)
)
co <- SDA_query(sql)
knitr::kable(head(co))
mukey | cokey | compname | comppct_r |
---|---|---|---|
2435467 | 23469156 | Gaviota | 85 |
2435467 | 23469157 | Vallecitos | 10 |
2435467 | 23469158 | Rock outcrop | 5 |
2435468 | 23469159 | Gaviota | 88 |
2435468 | 23469160 | Vallecitos | 10 |
2435468 | 23469161 | Rock outcrop | 2 |
Try downloading some polygon geometry using the same buffered point.
The geomIntersection
argument to
SDA_spatialQuery()
toggles overlap (FALSE
)
vs. intersection (TRUE
).
# use the buffered point geometry to query SDA
# requesting spatial intersection vs. overlapping features
p.mu.polys <- SDA_spatialQuery(p.buff, what = 'mupolygon', geomIntersection = TRUE)
Graphical description of the previous steps: query point, 1000m buffer, buffer bounding box (BBOX), and intersection of buffered point / map unit polygons.
# plot intersecting map unit polygons from SDA
plot(p.mu.polys, axes = FALSE, mar = c(1, 2, 0, 0))
# add buffered point
lines(p.buff, col = 'red', lwd = 2)
# add bounding box around buffered point
lines(ext(p.buff), col = 'RoyalBlue')
# add original point
points(p, col='orange', pch = 15, cex = 2)
legend('bottomleft', legend=c('query point', '1000m buffer', 'buffer BBOX', 'polygon intersection'), col=c('orange', 'red', 'royalblue', 'black'), lwd = c(NA, 2, 2, 2), pch=c(15, NA, NA, NA), xpd = FALSE, cex = 0.8)
Let’s apply the SDA_spatialQuery()
function to some real
data; KSSL pedons correlated to “Auburn”. Not all of these pedons have
coordinates, so we will have to do some filtering first. See the in-line
comments for details on each line of code.
# get KSSL pedons with taxonname = Auburn
# result is a SoilProfileCollection object
# coordinates will be WGS84 GCS
auburn <- fetchKSSL('auburn')
# keep only those pedons with valid coordinates
auburn <- subset(auburn, subset = !is.na(x) & !is.na(y))
# extract site data with coordinates
s <- site(auburn)
# init spatVector
sp <- vect(s, geom = c('x', 'y'), crs = 'epsg:4326')
# perform SDA query on collection of points
mu.data <- SDA_spatialQuery(sp, what = 'mupolygon')
# use local spatial intersection to link source + SDA data
# row-order is preserved
sp$mukey <- terra::extract(mu.data, sp)$mukey
# join results to original SoilProfileCollection using 'pedlabsampnum'
site(auburn) <- as.data.frame(sp)[, c('pedlabsampnum', 'mukey')]
Check the results and plot the “Auburn” KSSL pedons, grouped by intersecting map unit key and coloring horizons according to clay content.
# plot profiles, grouped by mukey
# color horizons with clay content
par(mar = c(0, 2, 4, 2))
groupedProfilePlot(auburn, groups = 'mukey', name.style = 'center-center', group.name.cex = 0.65, color = 'clay', id.style = 'side', label = 'pedon_id', max.depth = 100)
# describe IDs
mtext('user pedon ID', side = 2, line = -1.5)
mtext('mukey', side = 3, line = -1, at = c(0, 0), adj = 0)
This document is based on soilDB
(2.7.8) and
aqp
(2.0.2).