This is a short tutorial on how to interact with the Soil Data Access (SDA) web-service using R. Queries are written using a dialect of SQL. On first glance SQL appears similar to the language used to write NASIS queries and reports, however, these are two distinct languages. Soil Data Access is a "window" into the spatial and tabular data associated with the current SSURGO snapshot. Queries can contain spatial and tabular filters. If you are new to SDA or SQL, have a look at this page. Here is a SSURGO-specific library of SQL code, organized by task.
Spatial queries can be included in SQL statements submitted to SDA as long as the geometry has first been transformed to WGS84 geographic (or psuedo-Web Mercator) coordinates and formatted as "well known text" (WKT). The sp
and rgdal
packages provide functionality for converting between coordinate systems via spTransform()
. Coordinate reference system definitions (a "CRS") are typically provided using proj4 notation. You can search for various CRS definitions in a variety of formats using spatialreference.org/.
The soilDB
library for R provides a helper function (SDA_query()
) for submitting queries to SDA, processing the result, and reformatting the results into a rectangular table (a data.frame
). Most of the work required to use the SDA_query()
function will be writing SQL to describe the columns you would like returned and how the data should be filtered and possibly grouped.
Follow along with the blocks of code below by copying / pasting into a new R "script" document. Each block of command can be run by pasting into the R console, or by "stepping through" lines of code by moving the cursor to the top of a block (in the R script panel) and repeatedly pressing ctrl + enter. Note that knittr:kable
is used to create cleaner output in the tutorial and isn't required while following along.
If you are feeling adventurous, have a look at a draft tutorial on queries that return geometry from SDA. Additional tips on advanced spatial queries can be found here.
SSURGO (1:24k soil survey) and STATSGO (1:250k soil survey) records are stored together in SDA. Therefore, it is critical that evey query to SDA include some kind of filter for selecting the appropriate records. Filtering strategies include:
legend.areasymbol != 'US'
in the WHERE clauseSDA_Get_Mukey_from_intersection_with_WktWgs84()
mukey
, cokey
, etc.Explicit exclusion of STATSGO records:
SELECT [...]
FROM legend
INNER JOIN mapunit mu ON mu.lkey = legend.lkey
INNER JOIN component co ON mu.mukey = co.mukey
WHERE legend.areasymbol != 'US' ;
Implicit exclusion of STATSGO records:
SELECT [...]
FROM legend
INNER JOIN mapunit mu ON mu.lkey = legend.lkey
INNER JOIN component co ON mu.mukey = co.mukey
WHERE legend.areasymbol = 'CA113' ;
You only need to do this once. If you haven't installed these packages, then copy the code below and paste into the RStudio "console" pane.
# run these commands in the R console
# stable version from CRAN + dependencies
install.packages("httr", dep=TRUE)
install.packages("soilDB", dep=TRUE)
install.packages("rgdal", dep = TRUE)
install.packages("raster", dep = TRUE)
install.packages("rgeos", dep = TRUE)
Now that you have the required packages, load them into the current R session.
library(soilDB)
library(sp)
library(rgdal)
library(plyr)
library(raster)
library(rgeos)
When was the CA653 survey area last exported to SSURGO?
SDA_query("SELECT areasymbol, saverest FROM sacatalog WHERE areasymbol = 'CA653'")
## areasymbol saverest
## 1 CA653 9/12/2018 9:39:03 PM
Are there any survey areas that haven't been updated since Jan 1, 2010?
SDA_query("SELECT areasymbol, saverest FROM sacatalog WHERE saverest < '01/01/2010'")
## NULL
What is the most recently updated survey in CA?
SDA_query("SELECT areasymbol, saverest FROM sacatalog WHERE areasymbol LIKE 'CA%' ORDER BY saverest DESC")[1, ]
## areasymbol saverest
## 1 CA698 9/20/2018 7:55:22 PM
A simple query from the component table, for a single map unit: mukey = '461958'
. This is a SSURGO map unit key, therefore STATSGO records are implicitly removed from the results.
q <- "SELECT
mukey, cokey, comppct_r, compname, taxclname
FROM component
WHERE mukey = '461958'"
# run the query
res <- SDA_query(q)
# check
kable(head(res))
mukey | cokey | comppct_r | compname | taxclname |
---|---|---|---|---|
461958 | 16893928 | 85 | San Joaquin | Fine, mixed, thermic Abruptic Durixeralfs |
461958 | 16893929 | 4 | Galt | NA |
461958 | 16893930 | 4 | Bruella | NA |
461958 | 16893931 | 3 | Hedge | NA |
461958 | 16893932 | 3 | Kimball | NA |
461958 | 16893933 | 1 | Unnamed | NA |
Get a list of map units that contain "Amador" as minor component. Note that this type of query requires explicit exclusion of STATSGO records.
q <- "SELECT
muname, mapunit.mukey, cokey, compname, comppct_r
FROM legend
INNER JOIN mapunit ON mapunit.lkey = legend.lkey
INNER JOIN component on mapunit.mukey = component.mukey
WHERE
-- exclude STATSGO
legend.areasymbol != 'US'
AND compname LIKE '%amador%'
AND majcompflag = 'No'"
# run the query
res <- SDA_query(q)
# check
kable(head(res))
muname | mukey | cokey | compname | comppct_r |
---|---|---|---|---|
Pentz clay loam, 0 to 8 percent slopes | 463106 | 16889616 | Amador | 5 |
Pentz clay loam, 8 to 30 percent slopes | 463107 | 16889620 | Amador | 5 |
Pentz gravelly loam, 0 to 8 percent slopes | 463108 | 16889625 | Amador | 5 |
Pentz gravelly loam, 8 to 30 percent slopes | 463109 | 16889628 | Amador | 5 |
Pentz loam, 0 to 8 percent slopes | 463110 | 16889632 | Amador | 5 |
Pentz loam, 8 to 30 percent slopes | 463111 | 16889636 | Amador | 5 |
# optionally save the results to CSV file
# write.csv(res, file='path-to-file.csv', row.names=FALSE)
Get basic map unit and component data for a single survey area, Yolo County (CA113). There is no need to exclude STATSGO records because we are specifying a SSURGO areasymbol in the WHERE clause.
q <- "SELECT
component.mukey, cokey, comppct_r, compname, taxclname,
taxorder, taxsuborder, taxgrtgroup, taxsubgrp
FROM legend
INNER JOIN mapunit ON mapunit.lkey = legend.lkey
INNER JOIN component ON component.mukey = mapunit.mukey
WHERE legend.areasymbol = 'CA113'"
# run the query
res <- SDA_query(q)
# check
kable(head(res))
mukey | cokey | comppct_r | compname | taxclname | taxorder | taxsuborder | taxgrtgroup | taxsubgrp |
---|---|---|---|---|---|---|---|---|
765508 | 16891279 | 85 | Henneke | Clayey-skeletal, serpentinitic, thermic Lithic Argixerolls | Mollisols | Xerolls | Argixerolls | Lithic Argixerolls |
765509 | 16891280 | 30 | Millsholm | Loamy, mixed, superactive, thermic Lithic Xerochrepts | Inceptisols | Ochrepts | Xerochrepts | Lithic Xerochrepts |
765509 | 16891281 | 20 | Lodo | Loamy, mixed, superactive, thermic Lithic Haploxerolls | Mollisols | Xerolls | Haploxerolls | Lithic Haploxerolls |
765509 | 16891282 | 30 | Maymen | Loamy, mixed, active, mesic Dystric Lithic Xerochrepts | Inceptisols | Ochrepts | Xerochrepts | Dystric Lithic Xerochrepts |
765511 | 16891283 | 85 | Montara | Loamy, serpentinitic, thermic Lithic Haploxerolls | Mollisols | Xerolls | Haploxerolls | Lithic Haploxerolls |
765512 | 16891284 | 100 | Rock outcrop | NA | NA | NA | NA | NA |
Cross tabulate the occurrence of components named "Auburn" and "Dunstone" with survey areasymbol. Note that this type of query requires explicit exclusion of STATSGO records.
q <- "SELECT areasymbol, component.mukey, cokey, comppct_r, compname, compkind, taxclname
FROM legend
INNER JOIN mapunit ON mapunit.lkey = legend.lkey
INNER JOIN component ON component.mukey = mapunit.mukey
WHERE compname IN ('Auburn', 'Dunstone')
-- exclude STATSGO
AND areasymbol != 'US'
ORDER BY areasymbol, compname"
res <- SDA_query(q)
xtabs(~ areasymbol + compname, data=res)
## compname
## areasymbol Auburn Dunstone
## CA067 9 0
## CA607 21 0
## CA612 8 19
## CA618 32 1
## CA619 25 1
## CA620 14 0
## CA624 24 0
## CA628 20 0
## CA630 2 2
## CA632 2 1
## CA644 15 1
## CA648 6 0
## CA649 23 1
## CA707 11 0
## CA731 5 0
## CA750 1 0
Get the map unit key and name at a single, manually-defined point (-121.77100 37.368402). Spatial queries using SDA helper functions automatically exclude STATSGO records.
q <- "SELECT mukey, muname
FROM mapunit
WHERE mukey IN (
SELECT * from SDA_Get_Mukey_from_intersection_with_WktWgs84('point(-121.77100 37.368402)')
)"
res <- SDA_query(q)
kable(res)
mukey | muname |
---|---|
1882921 | Diablo clay, 15 to 30 percent slopes, MLRA 15 |
Get the map names and mukey values for a 1000m buffer around a manually-defined point (-121.77100 37.368402). A 1000m buffer applied to geographic coordinates will require several spatial transformations. First, the query point needs to be initialized in a geographic coordinate system with WGS84 datum. Next, the point is transformed to a planar coordinate system with units in meters; the NLCD coordinate reference system works well for points within the continental US. After computing a buffer in planar coordinates, the resulting polygon is converted back to geographic coordinates--this is what SDA is expecting.
# the query point is in geographic coordinates with WGS84 datum
p <- SpatialPoints(cbind(-121.77100, 37.368402), proj4string = CRS('+proj=longlat +datum=WGS84'))
# transform to planar coordinate system for buffering
p.aea <- spTransform(p, CRS('+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 +x_0=0 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs '))
# create 1000 meter buffer
p.aea <- gBuffer(p.aea, width = 1000)
# transform back to WGS84 GCS
p.buff <- spTransform(p.aea, CRS('+proj=longlat +datum=WGS84'))
# convert to WKT
p.wkt <- writeWKT(p.buff)
q <- paste0("SELECT mukey, muname
FROM mapunit
WHERE mukey IN (
SELECT * from SDA_Get_Mukey_from_intersection_with_WktWgs84('", p.wkt, "')
)")
res <- SDA_query(q)
kable(head(res))
mukey | muname |
---|---|
456983 | Diablo clay, 9 to 15 percent slopes |
456993 | Gaviota loam, 15 to 30 percent slopes |
457017 | Los Gatos-Gaviota complex, 50 to 75 percent slopes |
1882920 | Diablo clay, 30 to 50 percent slopes, MLRA 15 |
1882921 | Diablo clay, 15 to 30 percent slopes, MLRA 15 |
1882923 | Alo-Altamont complex, 15 to 30 percent slopes |
It is possible to download small collections of SSURGO map unit polygons from SDA using a bounding-box in WGS84 geographic coordinates. SDA will return polygons and their map unit keys that overlap with the BBOX query.
# extract bounding-box from out last point
# coordinates are in WGS84 GCS
b <- as.vector(bbox(p.buff))
# download map unit polygons that overlap with bbox
p.mu.polys <- mapunit_geom_by_ll_bbox(b)
Graphical description of the previous steps: query point, 1000m buffer, buffer bounding box (BBOX), intersecting map unit polygons, and overlapping polygons.
# plot
par(mar=c(0,0,0,0))
plot(p.mu.polys)
plot(p.mu.polys[which(p.mu.polys$mukey %in% setdiff(p.mu.polys$mukey, res$mukey)), ], add=TRUE, col='grey')
lines(p.buff, col='red', lwd=2)
plot(extent(bbox(p.buff)), add=TRUE, col='RoyalBlue')
points(p, col='orange', pch=15)
legend('bottomleft', legend=c('query point', '1000m buffer', 'buffer BBOX', 'intersected polygons', 'overlapping polygons'), col=c('orange', 'red', 'royalblue', 'black', 'grey'), lwd=c(NA, 2, 2, 2, 2), pch=c(15, NA, NA, NA, NA))
box()
Get some component data for a manually-defined bounding box, defined in WGS84 geographic coordinates.
# define a bounding box: xmin, xmax, ymin, ymax
#
# +-------------------(ymax, xmax)
# | |
# | |
# (ymin, xmin) ----------------+
b <- c(-120.9, -120.8, 37.7, 37.8)
# convert bounding box to WKT
p <- writeWKT(as(extent(b), 'SpatialPolygons'))
# compose query, using WKT BBOX as filtering criteria
q <- paste0("SELECT mukey, cokey, compname, comppct_r
FROM component
WHERE mukey IN (SELECT DISTINCT mukey FROM SDA_Get_Mukey_from_intersection_with_WktWgs84('", p, "') )
ORDER BY mukey, cokey, comppct_r DESC")
res <- SDA_query(q)
# check
kable(head(res))
mukey | cokey | compname | comppct_r |
---|---|---|---|
462527 | 16997157 | Alamo | 85 |
462527 | 16997158 | Madera | 10 |
462527 | 16997159 | San Joaquin | 5 |
462541 | 16997210 | Chualar | 85 |
462541 | 16997211 | Oakdale | 5 |
462541 | 16997212 | Modesto | 5 |
The examples above illustrate how to perform SDA queries using a single spatial filter. Typically we need to perform these queries over a collection of points, lines or polygons. The soilDB
package provides some helper functions for iterating over a collection of features (usually points).
TODO: document SDA_query_features
vs. SDA_spatialQuery
The function SDA_spatialQuery()
will convert Spatial*
(Points, Lines, Polygons) objects to WKT and submit a GeometryCollection to SDA. Depending on the argument what
, the function will return intersecting map unit polygons and keys (what=geom
) or just the mukey and map unit name (what=mukey
). The latter mode is much faster than the former.
An example based on a SpatialPoints
object with 1 feature, asking for only tabular results.
# the query point is in geographic coordinates with WGS84 datum
p <- SpatialPoints(cbind(-121.77100, 37.368402), proj4string = CRS('+proj=longlat +datum=WGS84'))
# p is a spatial filter applied to a query against the map unit polygons
# results contain intersecting map unit keys and names
res <- SDA_spatialQuery(p, what = 'mukey')
kable(res)
mukey | muname |
---|---|
1882921 | Diablo clay, 15 to 30 percent slopes, MLRA 15 |
Again with the same data, this time asking for spatial results.
# results contain intersecting polygons and their map unit keys
res <- SDA_spatialQuery(p, what = 'geom')
# check graphically
par(mar=c(1,1,1,1))
plot(res)
points(p, bg='royalblue', pch=21, cex=2)
box()
Some things to consider when using SDA_spatialQuery()
:
sp::over()
Let's apply the SDA_spatialQuery()
function to some real data; KSSL pedons correlated to "Auburn". Not all of these pedons have coordinates, so we will have to do some filtering first. See the in-line comments for details on each line of code.
# get KSSL pedons with taxonname = Auburn
# coordinates will be WGS84 GCS
auburn <- fetchKSSL('auburn')
# keep only those pedons with valid coordinates
auburn <- subsetProfiles(auburn, s='!is.na(x) & !is.na(y)')
# init spatial information
coordinates(auburn) <- ~ x + y
# define projection
proj4string(auburn) <- '+proj=longlat +datum=WGS84'
# extract "site data"
auburn.sp <- as(auburn, 'SpatialPointsDataFrame')
# perform SDA query on collection of points
mu.data <- SDA_spatialQuery(auburn.sp, what = 'geom')
# use local spatial intersection to link source + SDA data
auburn.sp$mukey <- over(auburn.sp, mu.data)$mukey
# join results to original SoilProfileCollection using 'pedlabsampnum'
# TODO: this is kind of clunky
site(auburn) <- as.data.frame(auburn.sp)[, c('pedlabsampnum', 'mukey')]
Check the results and plot the "Auburn" KSSL pedons, grouped by intersecting map unit key and coloring horizons according to clay content.
# plot profiles, grouped by mukey
# color horizons with clay content
par(mar=c(0,0,4,0))
groupedProfilePlot(auburn, groups='mukey', group.name.cex=0.65, color='clay', name='hzn_desgn', id.style='side', label='pedon_id', max.depth=100)
# describe IDs
mtext('user pedon ID', side=2, line=-1.5)
mtext('mukey', side=3, line=-1, at = c(0,0), adj = 0)
This document is based on soilDB
version 2.2-6.