Chapter 4 Spatial Data in R

Most of us are familiar with spatial data types, sources, and the jargon used to describe interaction with these data.

GIS software provides a convenient framework for most of the spatial analysis that we do, however, the combination of statistical routines, advanced graphics, and data access functionality make R an ideal environment for soil science.

For example, with a couple of lines of R code, it is possible to quickly integrate soil morphology (NASIS), lab data (KSSL), map unit polygons (SSURGO), and climate data (PRISM raster files).

This chapter is a very brief demonstration of several possible ways to process spatial data in R.

4.1 Objectives (Spatial Data Structures)

  • Gain experience with creating, editing, and exporting spatial data objects in R.
  • Learn about tools for making maps with R
  • Learn the basics of sf and sp representation of vector data
  • Learn the basics of raster classes and functions
  • Learn about some interfaces to NCSS spatial data sources
  • Develop a strategy for navigating the many possible spatial data processing methods

There are many packages available for working with spatial data, however we only have time to cover the below libraries.

The next couple of sections will require loading these libraries into the R session.

# SPC and soil database interface
library(aqp)
library(soilDB)
# modern vector data structures / manipulation
library(sf)
# old-style vector/raster data structure / manipulation
library(sp)
# GDAL library / tools
library(rgdal)
# gridded data management / analysis
library(raster)

4.2 Making Maps with R

R has become a powerful tool for visualization and interaction with spatial data. There are many tools available for making maps with R! It is not all geostatistics and coordinate reference system transformations. There are powerful ways to automate your GIS work flow from beginning to end–from creating terrain derivatives from a source DEM, all the way to high-quality, publication-ready maps and interactive HTML/JavaScript widgets.

All of the details of this could fill several books! And it does! A couple resources that provides some solid work-throughs of some examples using 5 different packages that have different focuses and applications: tmap, ggplot2, ggmap, mapview, mapdeck and leaflet:

4.3 Spatial Data Sources

Spatial data sources: “raster” and “vector”

  • Raster data sources (elevation, PRISM, etc.): GeoTIFF, ERDAS, BIL, ASCII grid, WMS, …
  • Vector data sources (points/lines/polygons): Shape File, “file” geodatabase, KML, GeoJSON, GML, WFS, …

Conventional data sources that can be upgraded to spatial data:

  • NASIS/LIMS reports: typically point coordinates
  • Web pages: GeoJSON, WKT, or point coordinates
  • Excel file: typically point coordinates
  • CSV files: typically point coordinates
  • Photo EXIF information: typically point coordinates

Here are some R-based interfaces to NCSS data sources via soilDB package.

4.4 Viewing Pedon Locations

( Introducing the sf package with mapview)

4.4.1 Plotting Geographic Data

Plotting the data as an R graphic can give you some idea of how data look spatially and whether their distribution is what you expect.

Typos are relatively common when coordinates are manually entered. Viewing the data spatially is a quick way to see if any points plot far outside of the geographic area of interest and therefore clearly have an error.

# plot the locations of the gopheridge pedons with R
# 
# Steps:
# 1) create and inspect an sf data.frame object
# 2) plot the data with mapview

# load libraries
library(aqp)
library(soilDB)
library(sf)
library(mapview)

# this creates sample gopheridge object in your environment
data("gopheridge", package = "soilDB")

# replace gopheridge object with fetchNASIS() (your data)
# gopheridge <- fetchNASIS()

# create simple features POINT geometry data.frame
# st_as_sf(): convert data.frame to spatial simple features, with points in $geometry 
# st_crs(): set EPSG:4326 Coordinate Reference System (CRS) as Well-Known Text (WKT)
gopher.locations <- st_as_sf(
  site(gopheridge), 
  coords = c('x_std','y_std'),
  crs = st_crs(4326)
)

# create interactive map with sfc_POINT object
#  use site_id in sf data.frame as labels
mapview(gopher.locations, label = gopher.locations$site_id)

4.4.2 EXERCISE 1 (Spatial Intro)

In this exercise, you will create an interactive map with the pedons in your selected set. Then you will export them to a shapefile.

4.4.2.1 Interactive mapview

Use the script below to make an R plot and a shapefile of pedon data loaded from your NASIS selected set.

The following script plots the standard WGS84 longitude/latitude decimal degrees fields from Site table of NASIS. In some cases, these data might be incomplete.

library(aqp)
library(soilDB)
library(sf)
library(mapview)

# get pedons from the selected set
pedons <- fetchNASIS(from = 'pedons')

4.4.2.2 Create a subset

Missing values in coordinates are not allowed. Creata a subset SoilProfileCollection for the pedons that are not missing their standard spatial data.

  • Use the base R subset() function (or dplyr::filter())
  • Create a subset of your pedons using is.na()
  • x_std and y_std variables contain WGS84 standard longitude and latitude
# modify this code to create a subset
pedons.sp  <- subset(pedon.locations, ...)
# create sf object (more on this in next section)
pedon.locations <- st_as_sf(
  site(pedons.sp), 
  coords = c('x_std','y_std'),
  crs = st_crs(4326) #WGS84 GCS
)

# plot an interactive map
mapview(pedon.locations, 
        legend = FALSE, 
        map.types = 'OpenStreetMap', 
        label = pedon.locations$site_id)

4.4.2.3 Saving to Shapefile

pedon.locations <- pedon.locations[, c(
    "pedlabsampnum", "pedon_id",
    "taxonname", "hillslopeprof",
    "elev_field", "slope_field",
    "aspect_field", "plantassocnm",
    "bedrckdepth", "bedrckkind",
    'pmkind', 'pmorigin'
  )]

# write to SHP; output CRS is CGS WGS84
st_write(s, "./NASIS-pedons.shp")

For further information on exporting data to shapefile, see this tutorial: Export Pedons to Shapefile with sp.

4.4.2.4 Export for Google Earth (.kml)

Google Earth is a powerful viewer for point data. Geographic data is displayed in Google Earth using the Keyhole Markup Language (KML) format.

Using the plotKML package, you can easily create a KML file to inspect and view in Google Earth.

See the related material in this tutorial: Export Pedons to Google Earth.

4.5 Many Packages, Many Spatial Representations

4.5.1 The sf package

Simple Features Access is a set of standards that specify a common storage and access model of geographic features. It is used mostly for two-dimensional geometries such as point, line, polygon, multi-point, multi-line, etc.

This is one of many ways of modeling the geometry of shapes in the real world. This model happens to be widely adopted in the R ecosystem via the sf package, and very convenient for typical data encountered by soil survey operations.

The sf package represents the latest and greatest in spatial data processing within the comfort of an R session. It provides a “main” object class sf to contain geometric data and associated tabular data in a familiar data.frame format. sf methods work on a variety of different levels of abstraction and manipulation of those geometries.

4.5.2 The sp Package

The data structures (“classes”) and functions provided by the sp package have served a foundational role in the handling of spatial data in R for years.

Many of the following examples will reference names such as SpatialPoints, SpatialPointsDataFrame, and SpatialPolygonsDataFrame. These are specialized (S4) classes implemented by the sp package.

Objects of these classes maintain linkages between all of the components of spatial data. For example, a point, line, or polygon feature will typically be associated with:

  • coordinate geometry
  • bounding box
  • coordinate reference system
  • attribute table

4.5.3 Converting sp and sf

sp provides access to the same compiled code libraries (PROJ, GDAL, GEOS) as sf, but mostly via the interfaces in the separate rgdal package.

For certain applications, such as some packages we demonstrate below, there are no sp “interfaces” to the methods – only sf, or vice-versa.

The two different categories of object types are interchangeable, and you may find yourself having to do this for a variety of reasons. You can convert between objects using sf::as_Spatial or sf::st_as_sf.

Check the documentation (?functionname) to figure out what object types different methods need as input; and check an input object’s class with class() or inherits().

4.5.4 Importing / Exporting Vector Data

Import a feature class from a ESRI File Geodatabase or shape file.

If you have a .shp file, you can specify the whole path, including the file extension in the dsn argument, or just the folder.

For a Geodatabase, you should specify the feature class using the layer argument. Note that a trailing “/” is omitted from the dsn (data source name) and the “.shp” suffix is omitted from the layer.

4.5.4.1 sf

x <- sf::st_read(dsn = 'E:/gis_data/ca630/FG_CA630_OFFICIAL.gdb', layer = 'ca630_a')
sf::write_sf(x, dsn = 'E:/gis_data/ca630/pedon_locations.shp')

4.5.4.2 sp / rgdal

Export object x to shapefile.

x <- rgdal::readOGR(dsn = 'E:/gis_data/ca630/FG_CA630_OFFICIAL.gdb', layer = 'ca630_a')
rgdal::writeOGR(x, dsn = 'E:/gis_data/ca630/pedon_locations.shp', driver = 'ESRI Shapefile')

The st_read() / read_sf() / write_sf() and readOGR(), writeOGR(), readGDAL(), writeGDAL() functions have many arguments, so it is worth spending some time with the associated manual pages.

4.5.5 Interactive mapping with mapview and leaflet

These packages make it possible to display interactive maps of sf objects in RStudio, or within an HTML document generated via R Markdown (e.g. this document).

The seriesExtent function in soilDB returns an sp object (SpatialPolygonsDataFrame) showing generalized extent polygons for a given soil series.

# load required packages, just in case
library(soilDB)
library(sf)
library(mapview)

# series extents from SoilWeb (sp objects)
pentz <- seriesExtent('pentz')
amador <- seriesExtent('amador')

# convert to sf objects (way of the future)
pentz <- st_as_sf(pentz)
amador <- st_as_sf(amador)

# combine into a single object
s <- rbind(pentz, amador)

# colors used in the map
# add more colors as needed
cols <- c('royalblue', 'firebrick')

# make a simple map, colors set by 'series' column
mapview(s, zcol = 'series', col.regions = cols, legend = TRUE)

4.5.5.1 EXERCISE 2: Map your favorite soil series extents

The following code demonstrates how to automatically fetch / convert / map soil series extents, using a vector of soil series names. Results appear in the RStudio “Viewer” pane. Be sure to try the “Export” and “show in window” (next to the broom icon) buttons.

# load required packages, just in case
library(soilDB)
library(sf)
library(mapview)

# vector of series names, letter case does not matter
# try several (2-9)!
series.names <- c('auberry', 'sierra', 'holland', 'cagwin')

# iterate over series names, get extent
# result is a list
s <- lapply(series.names, seriesExtent)

# iterate over series extents (sp objects)
# convert to sf objects
# result is a list
s <- lapply(s, st_as_sf)

# flatten list -> single sf object
s <- do.call('rbind', s)

# colors used in the map
# note trick used to dynamically set the number of colors
cols <- RColorBrewer::brewer.pal(n = length(series.names), name = 'Set1')

# make a simple map, colors set by 'series' column
# click on polygons for details
# try pop-out / export buttons
mapview(s, zcol = 'series', col.regions = cols, legend = TRUE)

4.5.6 The raster Package

The raster package package provides most of the commonly used grid processing functionality that one might find in a conventional GIS:

  • re-sampling / interpolation
  • warping (coordinate system transformations of gridded data)
  • cropping, mosaicing, masking
  • local and focal functions
  • raster algebra
  • contouring
  • raster/vector conversions
  • terrain analysis
  • model-based prediction (more on this in later chapters)

Introduction to the raster package vignette

4.5.6.1 Importing / Exporting Rasters

# use an example from the raster package
f <- system.file("external/test.grd", package = "raster")

# create a reference to this raster
r <- raster(f)

# print the details
print(r)
## class      : RasterLayer 
## dimensions : 115, 80, 9200  (nrow, ncol, ncell)
## resolution : 40, 40  (x, y)
## extent     : 178400, 181600, 329400, 334000  (xmin, xmax, ymin, ymax)
## crs        : +proj=sterea +lat_0=52.1561605555556 +lon_0=5.38763888888889 +k=0.9999079 +x_0=155000 +y_0=463000 +datum=WGS84 +units=m +no_defs 
## source     : C:/Users/Andrew.G.Brown/Documents/R/win-library/4.0/raster/external/test.grd 
## names      : test 
## values     : 138.7071, 1736.058  (min, max)
# default plot method
plot(r)

The disk-based reference can be converted to an in-memory RasterLayer with the readAll() function.

Processing of raster data in memory is always faster than processing on disk, as long as there is sufficient memory.

# check: file is on disk
inMemory(r)

# load into memory, if possible
r <- readAll(r)

# check: file is in memory
inMemory(r)

Exporting data requires consideration of the output format, datatype, encoding of NODATA, and other options such as compression.

See the manual pages for writeRaster(), writeFormats(), and dataType() for details. For example, suppose you had a RasterLayer object that you wanted to save to disk as an internally-compressed GeoTIFF:

# using previous example data set
writeRaster(r, filename = 'r.tif', options = c("COMPRESS=LZW"))

The writeRaster() function interprets the given (and missing) arguments as:

  • ‘.tif’ suffix interpreted as format=GTiff
  • creation options of “LZW compression” passed to GeoTiff driver
  • default datatype
  • default NAflag

4.5.6.2 Object Properties

RasterLayer objects are similar to sf and sp objects in that they keep track of the linkages between data, coordinate reference system, and optional attribute tables. Getting and setting the contents of RasterLayer objects should be performed using functions such as:

  • NAvalue(r): get / set the NODATA value
  • crs(r) or proj4string(r): get / set the coordinate reference system
  • res(r): get / set the resolution
  • extent(r): get / set the extent
  • dataType(r): get / set the data type
  • … many more, see the raster package manual

4.5.6.3 Data Types

It is worth spending a couple minutes going over some the commonly used datatypes; “unsigned integer”, “signed integer”, and “floating point” of variable precision.

  • INT1U: integers from 0 to 255
  • INT2U: integers from 0 to 65,534
  • INT2S: integers from -32,767 to 32,767
  • INT4S: integers from -2,147,483,647 to 2,147,483,647
  • FLT4S: floating point from -3.4e+38 to 3.4e+38
  • FLT8S: floating point from -1.7e+308 to 1.7e+308

It is wise to manually specify an output datatype that will “just fit” the required precision. For example, if you have generated a RasterLayer that warrants integer precision and ranges from 0 to 100, then the INT1U data type would provide enough precision to store all possible values and the NODATA value. Raster data stored as integers will always be smaller (sometimes 10-100x) than floating point, especially when internal compression is enabled.

# integer grid with a range of 0-100
# maybe soil texture classes
writeRaster(r, filename = 'r.tif', datatype = 'INT1U')

# floating point grid with very wide range
# maybe DSM model output
writeRaster(r, filename = 'r.tif', datatype = 'FLT4S')

4.5.6.4 Notes on Compression

In general, it is always a good idea to create internally-compressed raster data. The GeoTiff format can accomodate many different compression algorithms, including lossy (JPEG) compression. See this article for some ideas on optimization of file read/write times and associated compressed file sizes. Usually, the default “LZW” or “DEFLATE” compression will result in significant savings, especially for data encoded as integers.

For example, the CONUS gSSURGO map unit key grid at 30m resolution is about 55Gb (GeoTiff, no compression) vs. 2.4Gb after LZW compression.

# reasonable compression
writeRaster(r, filename='r.tif', options=c("COMPRESS=LZW"))

# takes longer to write the file, but better compression
writeRaster(r, filename='r.tif', options=c("COMPRESS=DEFLATE", "PREDICTOR=2", "ZLEVEL=9"))

4.5.6.5 EXERCISE 3: getting and plotting raster data

You will need to install the following packages: * leafsync: synchronized mapview panels * rasterVis: fancy plotting for raster objects

Use the following as a template to tinker with some gridded soil property data (800m grid, derived from SSURGO/STATSGO). See ?ISSR800.wcs for details.

The soilDB WCS tutorial contains many more examples from which to draw ideas.

# WCS interface
library(soilDB)
# wrangling polygons
library(sp)
library(sf)
# raster data visualization
library(rasterVis)

# make a bounding box and assign a CRS (4326: GCS, WGS84)
bb <- st_bbox(
  c(xmin = -121, xmax = -120, ymin = 37, ymax = 38), 
  crs = st_crs(4326)
)

# convert bbox to sf geometry
bb <- st_as_sfc(bb)

# try some others from the complete list of available grids:
# WCS_details(wcs = 'ISSR800')

# get soil texture class, 0-25cm
r1 <- ISSR800.wcs(aoi = bb, var = 'texture_025cm')

# get pH, 0-25cm
r2 <- ISSR800.wcs(aoi = bb, var = 'ph_025cm')


# grid of category IDs + labels stored in a 
# raster attribute table (RAT)
r1

# grid of continuous values
r2

# plot
levelplot(r1, margin = FALSE)
levelplot(r2, margin = FALSE)

Sync-ed mapview panels. This can be an effective way to explore similarities or differences between two spatial datasets. Try adjusting the alpha.regions argument (value range: 0-1) to mapview() to set transparency (lower values = more transparent).

# interactive mapping
library(mapview)
# sync-ed panels
library(leafsync)

# try 
m1 <- mapview(r1, map.types = "Esri.WorldImagery", legend = TRUE, na.color = NA)
m2 <- mapview(r2, map.types = "Esri.WorldImagery", legend = TRUE, na.color = NA)

# build synced panels.
sync(m1, m2)

Make another pair of synchronized mapview panels, this time using SSURGO and STATSGO map unit delineations via SDA. This example uses the SDA_spatialQuery() function to request vector data from SDA using a template sp object. The geomIntersection = TRUE argument to SDA_spatialQuery() causes SDA to compute the spatial intersection between template (bounding box) and map unit delineations. SDA has fairly tight constraints on how much data can be returned per request (32Mb), so you will have to plan carefully when working with large areas of interest.

# make a smaller bounding box
bb <- st_bbox(c(xmin = -114.16, xmax = -114.08, ymin = 47.65, ymax = 47.68), crs = st_crs(4326))
# convert to sf geometry
bb <- st_as_sfc(bb)

## TODO: SDA_spatialQuery() currently expects an sp object
# convert to sp object
bb <- as_Spatial(bb)

# get SSURGO and STATSGO
# result is a SpatialPolygonsDataFrame
ssurgo <- SDA_spatialQuery(bb, what = 'geom', geomIntersection = TRUE)
statsgo <- SDA_spatialQuery(bb, what = 'geom', geomIntersection = TRUE, db = 'STATSGO')

# make mapview panels
# adjust line styles, disable legends
m1 <- mapview(ssurgo, map.types = "Esri.WorldImagery", legend = FALSE, na.color = NA, color = 'yellow', fill = NA, lwd = 1)
m2 <- mapview(statsgo, map.types = "Esri.WorldImagery", legend = FALSE, na.color = NA, color = 'white', fill = NA, lwd = 1)

# sync
sync(m1, m2)

4.5.7 Converting Vector to Raster

4.6 Coordinate Reference Systems

Spatial data aren’t all that useful without an accurate description of the coordinate reference system (CRS). This type of information is typically stored within the “.prj” component of a shapefile, or in the header of a GeoTIFF.

Without a CRS it is not possible to perform coordinate transformations (e.g. conversion of geographic coordinates to projected coordinates), spatial overlay (e.g. intersection), or geometric calculations (e.g. distance or area).

The “old” way (PROJ.4) of specifying coordinate reference systems is using character strings containing, for example: +proj or +init arguments. In general, this still “works,” so you may encounter it and need to know about it. But you also may encounter cases where CRS are specified using integer EPSG codes, OGC codes or well-known text (WKT).

Some common examples of coordinate system EPSG codes and their legacy PROJ.4 strings:

  • EPSG: 4326 / PROJ.4:+proj=longlat +datum=WGS84 - geographic, WGS84 datum (NASIS Standard)

  • EPSG: 4269 / PROJ.4:+proj=longlat +datum=NAD83 - geographic, NAD83 datum

  • EPSG: 4267 / PROJ.4:+proj=longlat +datum=NAD27 - geographic, NAD27 datum

  • EPSG: 26910 / PROJ.4:+proj=utm +zone=10 +datum=NAD83 - projected (UTM zone 10), NAD83 datum

  • EPSG: 6350 / PROJ.4: +proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23.0 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs - Albers Equal Area CONUS (gSSURGO)

  • More on the EPSG codes and specifics of CRS definitions:

While you may encounter PROJ.4 strings, these are no longer considered the preferred method of referencing Coordinate Reference Systems – and, in general, newer methods are “easier.”

Well-known text (WKT) is a human- machine-readable standard format for geometry, so storing the Coordinate Reference System information in a similar format makes sense. This format is returned by the sf::st_crs method.

For example: the WKT representation of EPSG:4326:

st_crs(4326)
## Coordinate Reference System:
##   User input: EPSG:4326 
##   wkt:
## GEOGCRS["WGS 84",
##     DATUM["World Geodetic System 1984",
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1]]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["geodetic latitude (Lat)",north,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["geodetic longitude (Lon)",east,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     USAGE[
##         SCOPE["Horizontal component of 3D system."],
##         AREA["World."],
##         BBOX[-90,-180,90,180]],
##     ID["EPSG",4326]]

This is using the OGC WKT CRS standard. Adoption of this standard caused some significant changes in packages in the R ecosystem.

So you can get familiar, what follows are several examples of doing the same thing: setting the CRS of spatial objects with WGS84 longitude/latitude geographic coordinates. If you have another target coordinate system, it is just a matter of using the correct codes to identify it.

4.6.1 Assigning and Transforming Coordinate Systems

Returning to the example from above, lets assign a CRS to our series extent s using different methods.

s <- seriesExtent('san joaquin')

The following are equivalent sf versus sp vs. rgdal syntax.

4.6.1.1 sf

Use st_crs<- to set, or st_crs get CRS of sf objects. Supply the target EPSG code as an integer.

# s is an sp object, we convert it to sf with st_as_sf
s <- st_as_sf(s) 

# the CRS of s is EPSG:4326
st_crs(s) == st_crs(4326)
## [1] TRUE
# set CRS using st_crs<- (replace with identical value)
st_crs(s) <- st_crs(4326)

Transformation of points, lines, and polygons with sf requires an “origin” CRS be defined in the object that is the argument x, and “target” CRS defined in crs argument as an integer, or output of st_crs().

# transform to UTM zone 10
s.utm <- st_transform(x = s, crs = 26910)

# transform to GCS NAD27
s.nad27 <- st_transform(x = s, crs = st_crs(4267))

4.6.1.2 sp and rgdal

You can do the same thing several different ways with sp objects. An equivalent EPSG, OGC and PROJ.4 can be set or get using proj4string<-/proj4string and either a sp CRS object or a PROJ.4 string for Spatial objects.

# s is an sf object (we converted it), convert back to Spatial* object
s <- sf::as_Spatial(s) 

# these all create the same internal sp::CRS object
proj4string(s) <- sp::CRS('EPSG:4326')          # proj >6; EPSG
proj4string(s) <- sp::CRS('OGC:CRS84')          # proj >6; OGC
proj4string(s) <- '+init=epsg:4326'             # proj4 style +init string
proj4string(s) <- '+proj=longlat +datum=WGS84'  # proj4 style +proj string

Here, we do the same transformations we did above only using sp: spTransform().

# transform to UTM zone 10
s.utm <- spTransform(s, CRS('+proj=utm +zone=10 +datum=NAD83'))

# transform to GCS NAD27
s.nad27 <- spTransform(s, CRS('+proj=longlat +datum=NAD27'))

4.6.1.3 raster

Use crs<- and crs for raster or Spatial objects, it takes as argument and returns a sp CRS object.

# r is a raster object; set CRS as the CRS of itself
crs(r) <- raster::crs(sp::CRS(r))

“Transforming” or warping a raster is a different matter than a vector as it requires interpolation of pixels to a defined target resolution and CRS.

The method provided by raster to do this is projectRaster(). It works the same as the above transform methods in that you specify an object to transform, and the target reference system or a template for the object.

r.wgs84 <- projectRaster(r, CRS("EPSG:4326"))

Note that the default projectRaster uses bilinear interpolation (method='bilinear'), which is appropriate for continuous variables. You also have the option of using nearest-neighbor (method='ngb') for categorical variables (class maps) where interpolation does not make sense.

If we want to save this transformed raster to file, we can use something like this:

writeRaster(r, filename='r.tif', options=c("COMPRESS=LZW"))

4.7 EXERCISE: Working with real data

4.7.1 Load Required Packages

Load required packages into a fresh RStudio Session.

library(aqp)
library(soilDB)
library(sp)
library(rgdal)
library(raster)
library(rasterVis)

4.7.2 Download Example Data

Run the following to create a path for the example data. Be sure to set a valid path to a local disk.

# store path as a variable, in case you want to keep it somewhere else
ch2b.data.path <- 'C:/workspace2/chapter-2b'

# make a place to store chapter 2b example data
dir.create(ch2b.data.path, recursive = TRUE)

# download polygon example data from github
download.file(
  'https://github.com/ncss-tech/stats_for_soil_survey/raw/master/data/chapter_2b-spatial-data/chapter-2b-mu-polygons.zip', 
  file.path(ch2b.data.path, 'chapter-2b-mu-polygons.zip')
)

# download raster example data from github 
download.file(
  'https://github.com/ncss-tech/stats_for_soil_survey/raw/master/data/chapter_2b-spatial-data/chapter-2b-PRISM.zip', 
  file.path(ch2b.data.path, 'chapter-2b-PRISM.zip')
)

# unzip
unzip(
  file.path(ch2b.data.path, 'chapter-2b-mu-polygons.zip'), 
  exdir = ch2b.data.path, overwrite = TRUE
)

unzip(
  file.path(ch2b.data.path, 'chapter-2b-PRISM.zip'), 
  exdir = ch2b.data.path, overwrite = TRUE
)

4.7.3 Load the Data

We will be using polygons associated with MLRAs 15 and 18 as part of this demonstration.

Import these data now with readOGR(); recall the somewhat strange syntax. You will need the data and RasterStack object rs we created in the examples above.

# just to be sure, pointer to data path 
ch2b.data.path <- 'C:/workspace2/chapter-2b'

# load MLRA polygons
mlra <- readOGR(dsn = ch2b.data.path, layer = 'mlra-18-15-AEA')

# mean annual air temperature, Deg C
maat <- raster(file.path(ch2b.data.path, 'MAAT.tif'))

# mean annual precipitation, mm
map <- raster(file.path(ch2b.data.path, 'MAP.tif'))

# frost-free days
ffd <- raster(file.path(ch2b.data.path, 'FFD.tif'))

# growing degree days
gdd <- raster(file.path(ch2b.data.path, 'GDD.tif'))

# percent of annual PPT as rain
rain_fraction <- raster(file.path(ch2b.data.path, 'rain_fraction.tif'))

# annual sum of monthly PPT - ET_p
ppt_eff <- raster(file.path(ch2b.data.path, 'effective_preciptitation.tif'))

Sometimes it is convenient to “stack” raster data that share a common grid size, extent, and coordinate reference system into a single RasterStack object.

# create a raster stack (multiple rasters aligned)
rs <- stack(maat, map, ffd, gdd, rain_fraction, ppt_eff)

# reset layer names
names(rs) <- c('MAAT', 'MAP', 'FFD', 'GDD', 'rain.fraction', 'eff.PPT')

Quick inspection of the data.

# object class
class(mlra)
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"
class(maat)
## [1] "RasterLayer"
## attr(,"package")
## [1] "raster"
class(rs)
## [1] "RasterStack"
## attr(,"package")
## [1] "raster"
# the raster package provides a nice "print" method for raster and sp classes
print(maat)
## class      : RasterLayer 
## dimensions : 762, 616, 469392  (nrow, ncol, ncell)
## resolution : 0.008333333, 0.008333333  (x, y)
## extent     : -123.2708, -118.1375, 34.44583, 40.79583  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=NAD83 +no_defs 
## source     : C:/workspace2/chapter-2b/MAAT.tif 
## names      : MAAT 
## values     : -4.073542, 18.67642  (min, max)
# coordinate reference systems: note that they are not all the same
proj4string(mlra)
## [1] "+proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs"
proj4string(maat)
## [1] "+proj=longlat +datum=NAD83 +no_defs"
proj4string(rs)
## [1] "+proj=longlat +datum=NAD83 +no_defs"

Basic plot methods (class-specific functions) for the data. Note that this approach requires that all layers in the “map” are in the same coordinate refrence system (CRS).

# MLRA polygons in native coordinate system
# recall that mlra is a SpatialPolygonsDataFrame
plot(mlra, main = 'MLRA 15 and 18')
box()

# MAAT raster
# recall that maat is a raster object
plot(maat, main = 'PRISM Mean Annual Air Temperature (deg C)')

# plot MAAT raster with MLRA polygons on top
# this requires transforming to CRS of MAAT
mlra.gcs <- spTransform(mlra, CRS(proj4string(maat)))
plot(maat, main = 'PRISM Mean Annual Air Temperature (deg C)')
plot(mlra.gcs, main = 'MLRA 15 and 18', add = TRUE)

Try the rasterVis package approach to plotting raster and sp objects. Syntax is more expressive, figures are more intricate, but there is a learning curve. See the rasterVis tutorial for a comprehensive set of examples.

levelplot(
  # raster object
  maat, 
  # suppress marginal plots
  margin = FALSE,
  # custom colors
  par.settings = BuRdTheme,
  # combine multiple spatial data layer here
  panel = function(...) {
    panel.levelplot(...)
    sp.lines(mlra.gcs, col = 'black')
  }
  )

4.7.4 Spatial Overlay Operations

Spatial data are lot more useful when “combined” (overlay, intersect, spatial query, etc.) to generate something new. For simplicity, we will refer to this kind of operation as an “extraction”. The CRS of the two objects being overlaid must match.

4.7.4.1 Vector Data

In sf the functions used to do this are st_intersects() or st_intersection().

In sp objects, you do these operations with the sp::over() function. Access the associated vignette by pasting vignette("over") in the console.

# hand make a SpatialPoints object
# note that this is GCS
p <- SpatialPoints(coords = cbind(-120, 37.5), 
                   proj4string = CRS('+proj=longlat +datum=WGS84'))

# spatial extraction of MLRA data requires a CRS transformation
p.aea <- spTransform(p, proj4string(mlra))
over(p.aea, mlra)

4.7.4.2 Raster Data

The values stored in a RasterLayer or RasterStack object can be extracted using the extract() function.

As long a the “query” feature has a valid CRS defined, the raster::extract() function will automatically perform any required CRS transformation.

# extract from a single RasterLayer
extract(maat, p)

# extract from a RasterStack
extract(rs, p)

4.7.5 Sampling and Extraction

4.7.5.1 Raster Data Sampling

Typically, spatial queries of raster data by polygon features are performed in two ways:

  1. for each polygon, collect all pixels that overlap (exactextractr approach)

  2. for each polygon, collect a sample of pixels defined by sampling points

The first method ensures that all data are included in the analysis, however, processing can be slow for multiple/detailed rasters, and the results may not fit into memory.

The second method is more efficient (10-100x faster), requires less memory, and can remain statistically sound–as long as a reasonable sampling strategy is applied. Sampling may also help you avoid low-acreage “anomalies” in the raster product. More on sampling methods in the next chapter.

The extract() function can perform several operations in one pass, such as buffering (in projected units) then extracting. See the manual page for an extensive listing of optional arguments and what they do.

# extract using a buffer with radius specified in meters (1000m)
extract(rs, p, buffer = 1000)

Sampling and extraction with raster methods results in a matrix object.

# sampling single RasterLayer
sampleRegular(maat, size = 10)

# sampling RasterStack
sampleRegular(rs, size = 10)

Sampling and extract, result is a SpatialPointsDataFrame object.

par(mfcol = c(1, 2), mar = c(1, 1, 3, 1))

# regular sampling + extraction of raster values
x.regular <- sampleRegular(maat, size = 100, sp = TRUE)
plot(maat,
     axes = FALSE,
     legend = FALSE,
     main = 'Regular Sampling')
points(x.regular)

# random sample + extraction of raster values
# note that NULL values are removed
x.random <- sampleRandom(maat,
                         size = 100,
                         sp = TRUE,
                         na.rm = TRUE)
plot(maat,
     axes = FALSE,
     legend = FALSE,
     main = 'Random Sampling with NA Removal')

points(x.random)

Note that the mean can be efficiently estimated, even with a relatively small number of samples.

# all values: slow for large grids
mean(values(maat), na.rm = TRUE)

# regular sampling: efficient, central tendency comparable to above
mean(x.regular$MAAT, na.rm = TRUE)

# this value will be pseudorandom
#  depends on number of samples, pattern of NA
mean(x.random$MAAT, na.rm = TRUE)

Just how much variation can we expect when collecting 100, randomly-located samples over such a large area? This is better covered in chapter 4 (Sampling), but a quick experiment might be fun. Do this 100 times: compute the mean MAAT from 100 randomly-located samples.

# takes a couple of seconds
z <- replicate(30, mean(sampleRandom(maat, size = 100, na.rm = TRUE), na.rm = TRUE))

# 90% of the time the mean MAAT values were within:
quantile(z, probs = c(0.05, 0.95))

4.7.5.2 Extracting Raster Data: KSSL Pedon Locations

Extract PRISM data at the coordinates associated with KSSL pedons that have been correlated to the AUBURN series.

We will use the fetchKSSL() function from the soilDB package to get KSSL data from the most recent snapshot. This example can be easily adapted to pedon data extracted from NASIS using fetchNASIS().

Get some KSSL data and upgrade the “site” data to a SpatialPointsDataFrame.

# result is a SoilProfileCollection object
auburn <- fetchKSSL(series = 'auburn')

# extract site data
s <- site(auburn)

## TODO: this is an old-fashioned specification of a CRS...
# these are GCS WGS84 coordinates from NASIS
coordinates(s) <- ~ x + y
proj4string(s) <- '+proj=longlat +datum=WGS84'

Extract PRISM data (the RasterStack object we made earlier) at the Auburn KSSL locations and summarize.

# return the result as a data.frame object
e <- extract(rs, s, df=TRUE)

# summarize: remove first (ID) column using [, -1] j index
summary(e[, -1])
##       MAAT            MAP             FFD             GDD       rain.fraction    eff.PPT      
##  Min.   :15.52   Min.   :448.0   Min.   :278.0   Min.   :2456   Min.   :99    Min.   :-409.0  
##  1st Qu.:16.24   1st Qu.:519.0   1st Qu.:305.5   1st Qu.:2586   1st Qu.:99    1st Qu.:-329.1  
##  Median :16.45   Median :569.5   Median :316.0   Median :2608   Median :99    Median :-282.3  
##  Mean   :16.33   Mean   :633.3   Mean   :314.4   Mean   :2588   Mean   :99    Mean   :-208.6  
##  3rd Qu.:16.60   3rd Qu.:661.0   3rd Qu.:329.5   3rd Qu.:2623   3rd Qu.:99    3rd Qu.:-188.4  
##  Max.   :16.65   Max.   :947.0   Max.   :334.0   Max.   :2651   Max.   :99    Max.   : 128.4

Join the extracted PRISM data with the original SoilProfileCollection object.

More information on SoilProfileCollection objects here.

# don't convert character data into factors
options(stringsAsFactors = FALSE)

# combine site data with extracted raster values, row-order is identical
res <- cbind(as(s, 'data.frame'), e)

# extract unique IDs and PRISM data
res <- res[, c('pedon_key', 'MAAT', 'MAP', 'FFD', 'GDD', 'rain.fraction', 'eff.PPT')]

# join with original SoilProfileCollection object via pedon_key
site(auburn) <- res

The extracted values are now part of the “auburn” SoilProfileCollection object.

Does there appear to be a relationship between soil morphology and “effective precipitation”? Not really.

# create an ordering of pedons based on the extracted effective PPT
new.order <- order(auburn$eff.PPT)

# setup figure margins, 1x1 row*column layout
par(mar = c(4.5, 0, 4, 0), mfcol = c(1, 1))

# plot profile sketches
plotSPC(auburn,
        name = 'hzn_desgn',
        print.id = FALSE,
        color = 'clay',
        plot.order = new.order,
        cex.names = 0.75,
        max.depth = 70,
        width = 0.3,
        name.style = 'center-top',
        scaling.factor = 1,
        plot.depth.axis = FALSE,
        hz.depths = TRUE
)

# add an axis with extracted raster values
axis(side = 1,
     at = 1:length(auburn),
     labels = round(auburn$eff.PPT[new.order]),
     cex.axis = 0.75)

mtext('Annual Sum of Monthly (PPT - ET_p) (mm)',
      side = 1,
      line = 2.5)

Note that negative values are associated with a net deficit in monthly precipitation vs. estimated ET.

4.7.5.3 Raster Summary By Polygon: Series Extent

The seriesExtent() function from the soilDB package provides a simple interface to Series Extent Explorer data files.

Note that these series extents have been generalized for rapid display at regional to continental scales. A more precise representation of “series extent” can be generated from SSURGO polygons and queried from SDA.

Get an approximate extent for the Amador soil series from SEE. See the seriesExtent tutorial and manual page for additional options and related functions.

amador <- seriesExtent(s = 'amador')

Generate 100 sampling points within the extent, using a hexagonal grid. These will be used to extract raster values from our RasterStack of PRISM data.

s <- spsample(amador, n = 100, type = 'hexagonal')

For comparison, extract a single point from each SSURGO map unit delineation that contains Amador as a major component. This will require a query to SDA for the set of matching map unit keys (mukey), followed by a second request to SDA for the geometry.

The SDA_query function is used to send arbitrary queries written in SQL to SDA, the results may be a data.frame or list, depending on the complexity of the query. The fetchSDA_spatial function returns map unit geometry as either polygons, polygon envelopes, or a single point within each polygon as selected by mukey or nationalmusym.

# result is a data.frame
mukeys <- SDA_query("SELECT DISTINCT mukey FROM component
                     WHERE compname = 'Amador' AND majcompflag = 'Yes';")

# result is a SpatialPointsDataFrame
amador.pts <- fetchSDA_spatial(mukeys$mukey,
                               by.col = 'mukey',
                               method = 'point',
                               chunk.size = 2)

Graphically check both methods:

# adjust margins and setup plot device for two columns
par(mar = c(1, 1, 3, 1), mfcol = c(1, 2))

# first figure
plot(maat,
     ext = extent(s),
     main = 'PRISM MAAT\n100 Sampling Points from Extent',
     axes = FALSE)
plot(amador, add = TRUE)
points(s, cex = 0.25)

plot(maat,
     ext = extent(s),
     main = 'PRISM MAAT\nPolygon Centroids',
     axes = FALSE)
points(amador.pts, cex = 0.25)

Extract PRISM data (the RasterStack object we made earlier) at the sampling locations (100 regularly-spaced and from MU polygon centroids) and summarize. Note that CRS transformations are automatic (when possible).

# return the result as a data.frame object
e <- extract(rs, s, df = TRUE)
e.pts <- extract(rs, amador.pts, df = TRUE)

# check out the extracted data
summary(e[,-1])
##       MAAT            MAP             FFD             GDD       rain.fraction       eff.PPT      
##  Min.   :16.42   Min.   :334.0   Min.   :313.0   Min.   :2590   Min.   : 99.00   Min.   :-548.7  
##  1st Qu.:16.60   1st Qu.:439.8   1st Qu.:324.0   1st Qu.:2628   1st Qu.: 99.00   1st Qu.:-419.2  
##  Median :16.65   Median :488.5   Median :329.0   Median :2644   Median : 99.00   Median :-352.8  
##  Mean   :16.65   Mean   :481.2   Mean   :328.7   Mean   :2643   Mean   : 99.03   Mean   :-367.5  
##  3rd Qu.:16.69   3rd Qu.:530.5   3rd Qu.:334.2   3rd Qu.:2667   3rd Qu.: 99.00   3rd Qu.:-298.2  
##  Max.   :16.82   Max.   :584.0   Max.   :343.0   Max.   :2708   Max.   :100.00   Max.   :-236.1
# all pair-wise correlations, note NAs -> why?
knitr::kable(cor(e[,-1]), digits = 2)
MAAT MAP FFD GDD rain.fraction eff.PPT
MAAT 1.00 -0.43 0.14 0.67 0.07 -0.40
MAP -0.43 1.00 0.63 -0.88 0.06 0.99
FFD 0.14 0.63 1.00 -0.54 0.17 0.65
GDD 0.67 -0.88 -0.54 1.00 0.00 -0.86
rain.fraction 0.07 0.06 0.17 0.00 1.00 0.06
eff.PPT -0.40 0.99 0.65 -0.86 0.06 1.00
# only 1 unique value, correlation is impossible to compute
unique(e$rain.fraction)
## [1]  99 100

Quickly compare the two sets of samples. More on this in the Sampling module.

# compile results into a list
maat.comparison <- list('regular samples' = e$MAAT,
                        'polygon centroids' = e.pts$MAAT)

# number of samples per method
lapply(maat.comparison, length)
## $`regular samples`
## [1] 88
## 
## $`polygon centroids`
## [1] 395
# summary() applied by group
lapply(maat.comparison, summary)
## $`regular samples`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   16.42   16.60   16.65   16.65   16.69   16.82 
## 
## $`polygon centroids`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   16.18   16.61   16.65   16.66   16.72   17.10
# box-whisker plot
par(mar = c(4.5, 8, 3, 1), mfcol = c(1, 1))
boxplot(
  maat.comparison,
  horizontal = TRUE,
  las = 1,
  xlab = 'MAAT (deg C)',
  varwidth = TRUE,
  boxwex = 0.5,
  main = 'MAAT Comparison'
)

Basic climate summaries from a standardized source (e.g. PRISM) might be a useful addition to an OSD.

Think about how you could adapt this example to compare climate summaries derived from NASIS pedons to similar summaries derived from map unit polygons and generalized soil series extents.

4.7.5.4 Raster Summary By Polygon: MLRA

The following example is a simplified version of what is available in the soilReports package, reports on the ncss-tech GitHub repository, or in the TEUI suite of map unit summary tools.

Example output from the soilReports package:

Efficient summary of large raster data sources can be accomplished using:

  • internally-compressed raster data sources, stored on a local disk, can be in any coordinate system
  • polygons stored in an equal-area or UTM coordinate system, with CRS units of meters
  • fixed-density sampling of polygons
  • estimation of quantiles from collected raster samples

Back to our example data. The first step is to check the MLRA polygons (mlra); how many features per MLRA symbol? Note that some MLRA have more than one polygon.

table(mlra$MLRARSYM)

Convert polygon area from square meters to acres and summarize. Note that this will only make sense when using a projected CRS with units of meters (equal area)!

poly.area <- round(sapply(mlra@polygons, slot, 'area') * 0.000247105)
summary(poly.area)
sum(poly.area)

Sample each polygon at a constant sampling density of 0.001 samples per acre (1 sample for every 1,000 ac.). At this sampling density we should expect approximately 16,700 samples–more than enough for our simple example.

library(sharpshootR)

# the next function requires a polygon ID: 
#  each polygon gets a unique number 1--number of polygons
mlra$pID <- 1:nrow(mlra)
s <- constantDensitySampling(mlra, n.pts.per.ac = 0.001)

Extract MLRA symbol at sample points using the over() function. The result will be a data.frame object with all attributes from our MLRA polygons that intersect sampling points s.

# spatial overlay: sampling points and MLRA polygons
res <- over(spTransform(s, proj4string(mlra)), mlra)

# row / feature order is preserved, so we can directly copy
s$mlra <- res$MLRARSYM

# tabulate number of samples per MLRA
table(s$mlra)
## 
## 18 
## 75

Extract values from the RasterStack of PRISM data as a data.frame.

# raster stack extraction at sampling points
e <- extract(rs, s, df = TRUE)

# convert sampling points from SpatialPointsDataFrame to data.frame
s.df <- as(s, 'data.frame')

# join columns from extracted values and sampling points
s.df <- cbind(s.df, e)

# check results
head(s.df)
##   mlra         x        y ID     MAAT MAP FFD  GDD rain.fraction   eff.PPT
## 1 <NA> -120.2324 37.33594  1 16.81579 334 313 2708            99 -548.7027
## 2   18 -120.2983 37.40442  2 16.71504 341 315 2682            99 -530.5748
## 3   18 -120.2851 37.42724  3 16.68039 353 315 2675            99 -516.7639
## 4   18 -120.2983 37.45007  4 16.73550 375 318 2685            99 -493.7712
## 5   18 -120.3510 37.49572  5 16.79508 383 325 2694            99 -485.0020
## 6   18 -120.3246 37.49572  6 16.78542 390 325 2690            99 -480.2292

Summarizing multivariate data by group (MLRA) is usually much simpler after reshaping data from “wide” to “long” format.

library(reshape2)

# reshape from wide to long format
m <- melt(s.df,
          id.vars = c('mlra'),
          measure.vars = c('MAAT', 'MAP', 'FFD', 'GDD', 'rain.fraction', 'eff.PPT'))

# check "wide" format
head(m)
##   mlra variable    value
## 1 <NA>     MAAT 16.81579
## 2   18     MAAT 16.71504
## 3   18     MAAT 16.68039
## 4   18     MAAT 16.73550
## 5   18     MAAT 16.79508
## 6   18     MAAT 16.78542

A simple tabular summary of means by MLRA and PRISM variable using tapply().

# tabular summary of mean values
tapply(m$value, list(m$mlra, m$variable), mean, na.rm = TRUE)
##        MAAT    MAP      FFD  GDD rain.fraction   eff.PPT
## 18 16.64259 490.04 329.4267 2640      99.02667 -357.4184

4.7.5.5 Faster with exactextractr

This example shows how to determine the distribution of Frost-Free Days across a soil series extent.

The data are extracted from the raster data source very rapidly using the exactextractr package.

library(sf)
library(soilDB)
library(raster)
library(lattice)
library(exactextractr)

# 5-10 seconds to download Series Extent Explorer data 
series <- c('holland', 'san joaquin')

# make SpatialPolygonsDataFrame
s <- do.call('rbind', lapply(series, seriesExtent))

# load pointer to PRISM data
r <- raster('C:/workspace2/chapter-2b/FFD.tif')

# transform extent to CRS of raster with sf
s <- st_transform(st_as_sf(s), crs = st_crs(r))

# inspect
s

# use `st_union(s)` to create a MULTI- POINT/LINE/POLYGON from single
# use `sf::st_cast(s, 'POLYGON')` to create other types

# <0.4 seconds for sampling, including coverage fractions!
system.time({ ex <- exactextractr::exact_extract(r, s) })

# ex is a list(), with data.frame [value, coverage_fraction]
#  for each polygon in s (we have one MULTIPOLYGON per series)

# combine all list elements `ex` into single data.frame `ex.all`
#  - use do.call('rbind', ...) to stack data.frames row-wise
#  - an anonymous function that iterates along length of `ex`
#  - adding the series name to as a new variable, calculated using `i`
ex.all <- do.call('rbind', lapply(seq_along(ex), function(i) {
  cbind(data.frame(group = series[i]), ex[[i]])
}))

# simple summary
densityplot(~ value | group, data = ex.all, 
            plot.points = FALSE, bw = 2, lwd = 2,
            strip = strip.custom(bg = grey(0.85)),
            scales = list(alternating = 1),
            col = c('RoyalBlue'), layout = c(1, 2),
            ylab = 'Density', from = 0, to = 400,
            xlab = 'Frost-Free Days (50% chance)\n800m PRISM Data (1981-2010)', 
            main = 'FFD Estimate for Extent of San Joaquin and Holland Series'
)

4.8 Additional Reading (Spatial)