Working with gSSURGO and SDA

2014-09-23 Dylan Beaudette

A gridded version of SSURGO is now available from the USDA-NRCS. It is important to point out cell values in this raster dataset are not the familiar map unit key (mukey) typically used to generate thematic maps. Rather, cell values are an integer index that are linked to map unit key via “raster attribute table” (RAT). Making thematic maps with gSSURGO is accomplished by associating aggregate data (awc, drainage class, carbon content, etc.) to the grid via joins to this RAT.

Generating a suitable RAT in R is very simple thanks to the ratify() function from the raster package. This tutorial demonstrates how the gSSURGO grid can be used in conjunction with data from Soil Data Access to generate a thematic map of available water storage. Examples used within this tutorial would also apply to generation of a thematic map using the traditional polygon-based SSURGO geometry.


With a recent version of R, it should be possible to get all of the packages need for this tutorial via:

# run these commands in the R console
install.packages('raster', dep=TRUE)
install.packages('plyr', dep=TRUE)
install.packages('Hmisc', dep=TRUE)
install.packages('soilDB', dep=TRUE) # stable version from CRAN + dependencies
install.packages("soilDB", repos="") # most recent copy from r-forge
install.packages("SSOAP", repos = "", type="source") # SSOAP and XMLSchema

Define custom functions

Before processing the gSSURGO data, we need to define two functions for aggregating horizon-level data to the map unit level. There are many possible approaches, however, a two-step process is typically the simplest. First profile-total water storage values are computed, then averaged (weighted by component percentage) within each map unit. At this stage we are only defining functions, we will use them later in the tutorial.

# function for computing profile-total water storage
co.sum.whc <- function(i) {
    wt <- i$comppct_r[1] # keep the first component pct (they are all the same)
    thick <- with(i, hzdepb_r - hzdept_r) # compute horizon thickness
    whc <- thick * i$awc_r # compute water storage by horizon <- sum(whc, na.rm=TRUE) # sum to get profile water storage
    data.frame(, wt=wt) # return profile water storage and component pct

# function for copmuting weighted-mean whc within a map unit
mu.mean.whc <- function(i) {
    whc <- wtd.mean(i$whc, weights=i$wt) # safely compute wt. mean water storage
    data.frame(whc=whc) # return wt. mean water storage

Load sample gSSURGO

Within this tutorial we will use a small chunk of the gSSURGO data set (gSSURGO.chunk), distributed within the soilDB R package. Note that cell values within this sample of the gSSURGO data have been converted from the gSSURGO integer index to map unit keys. An example of how to accomplish this conversion is given at the end of this tutorial.

# load libraries

# load chunk of gSSURGO
data(gSSURGO.chunk, package='soilDB')

# convert into a raster + RAT
gSSURGO.chunk <- ratify(gSSURGO.chunk, count=TRUE)

# save RAT to new object, will use later
rat <- levels(gSSURGO.chunk)[[1]]

# extract the map unit kets from the RAT, and format for use in an SQL IN-statement
in.statement <- format_SQL_in_statement(rat$ID)

Query SDA

Now that we have a list of map unit keys, formatted into an IN-statement, we can submit a query to SDA requesting horizon-level available water data. Note that we could do some or all of the aggregation from horizon to map unit within SQL. Queries to SDA can sometimes take 20–30 seconds to complete.

# format query in SQL- raw data are returned
q <- paste("SELECT component.mukey, component.cokey, compname, comppct_r, hzdept_r, hzdepb_r, hzname, awc_r
FROM component JOIN chorizon ON component.cokey = chorizon.cokey
AND mukey IN ", in.statement, "ORDER BY mukey, comppct_r DESC, hzdept_r ASC", sep="")

# now get component and horizon-level data for these map unit keys
res <- SDA_query(q)

# check first 6 rows, looks good
head(res, 6)

Aggregate SSURGO tabular data

The ddply() function applies a named function (co.sum.whc) to chunks of data from res, with chunks defined by combinations of map unit and component keys, and returns a rectangular table of data (e.g. a data.frame object). This approach works well for aggregating SSURGO data, or any other case where a split-apply-combine workflow is needed.

# aggregate by component, retaining map unit keys
co.whc <- ddply(res, c('mukey', 'cokey'), co.sum.whc)

# aggregate by map unit
mu.whc <- ddply(co.whc, 'mukey', mu.mean.whc)

# check: there should be a single water storage value per map unit key
##    mukey      whc
## 1 144980 11.86056
## 2 144983 20.69800
## 3 144984 20.96600
## 4 144985 21.23900
## 5 144986 16.17500
## 6 144989 22.41090

Join aggregate data

Aggregate water storage values can now be joined with the raster attribute table (RAT) via map unit key (mukey). After joining, the new RAT is packed-into our original raster object, and converted into a standard raster object with cell values containing water storage estimates. Plotting the final raster object yields a simple thematic map. See manual pages for ratify() and deratify() for details on how RATs are managed in R.

# change first colum name from 'mukey' to 'ID', so that it matches our RAT
names(mu.whc)[1] <- 'ID'

# combine RAT with aggregate data via "left" join <- join(rat, mu.whc, type='left')

# put modified RAT back into our raster object
levels(gSSURGO.chunk) <-

# convert into standard raster based on new column <- deratify(gSSURGO.chunk, att='whc')

# check: OK

plot of chunk join-RAT

Working with real gSSURGO data

A real chunk of gSSURGO data exported from its native format (ESRI file geodatabase) to something like TIFF will result in two files. The TIFF, with gSSURGO integer indices used as cell values, and a DBF file containing the associated RAT. Unfortunately GDAL, R, and other open source tools don't always recognize that these two files are part of a single object. The following code demonstrates how to convert the native gSSURGO grid into a new grid of map unit keys.


# load exported gSSURGO chunk, cell values are NOT map unit keys
r <- raster('export-gSSURGO.tif')

# generate a RAT via raster package functionality
r <- ratify(r)

# extract RAT to a data.frame
rat <- levels(r)[[1]]

# load ESRI-specific RAT, generated when gSSURGO was exported
mu <- read.dbf('export-gSSURGO.tif.vat.dbf',

# re-name the first coulmn to match our new RAT
names(mu)[1] <- 'ID'

# convert map unit keys from character to integer
mu$MUKEY <- as.integer(mu$MUKEY)

# join map unit keys to gSSURGO integer indices <- join(rat, mu, by='ID', type='left')

# over-write original RAT with new one, containing map unit keys
levels(r) <-

# make a new raster, this time with map unit keys used as the cell values <- deratify(r, att='MUKEY')