Chapter 4 Spatial Data in R

This chapter is a brief demonstration of possible ways to process spatial data in R.

4.1 Objectives (Spatial Data)

Gain experience with creating, editing, and exporting spatial data objects in R.
Learn about making maps with R
Learn the basics of sf and sp representation of vector data
Learn the basics of terra classes and functions
Learn about some interfaces to NCSS spatial data sources
Develop a strategy for navigating the many possible spatial data processing methods

The next sections will require loading these libraries into the R session.

# SPC and soil database interface
library(aqp)
library(soilDB)

# "Simple Feature" (vector) data structures 
library(sf)

# superseded by sf -- spatial object classes e.g. SpatialPoints/SpatialPolygons
library(sp)

# gridded data management / analysis
library(terra)

# superseded by terra
library(raster)

# interactive maps with leaflet
library(mapview)

There are many packages available for working with spatial data, however we only have time to cover introducing a few common libraries.

A couple resources are linked here for 5 packages that provide different ways displaying spatial data graphically:

4.2 Making Maps with R

R has become a powerful tool for visualization and interaction with spatial data. There are many tools available for making maps with R! It is not all geostatistics and coordinate reference system transformations. There are powerful ways to automate your GIS workflow from beginning to end–from creating terrain derivatives from a source DEM, to high-quality, publication-ready maps and interactive HTML/JavaScript widgets.

4.3 Spatial Data Sources

Spatial data sources: “raster” and “vector”

Raster data sources (grids/images): GeoTIFF, ERDAS, BIL, ASCII grid, WMS, …
Vector data sources (points/lines/polygons): Shape File, ESRI File Geodatabase, KML, GeoJSON, GML, WFS, …

Conventional data sources that can be upgraded to spatial data:

NASIS/LIMS reports: typically point coordinates
Web pages: GeoJSON, WKT, or point coordinates
Excel file: typically point coordinates
CSV files: typically point coordinates
Photo EXIF information: typically point coordinates

Here are some R-based interfaces to NCSS data sources via soilDB package.

Functions that return tabular data which can be upgraded to spatial data:
- fetchNASIS(): NASIS “site” data contain x,y,
- fetchLDM(): KSSL “site” data from Lab Data Mart contain x,y coordinates
- fetchKSSL(): KSSL “site” data from SoilWeb contain x,y coordinates coordinates
Functions that return spatial data:
- fetchSDA_spatial(): polygon, bounding box and centroid data from SSURGO, STATSGO and the sapolygon (Soil Survey Area Polygon) from Soil Data Access (SDA)
- fetchHenry(): sensor / weather station locations as points
- SDA_query(): SSURGO data as points, lines, polygons (via SDA)
- SDA_spatialQuery(): use points or polygons as a “query” to SDA
- seriesExtent() and taxaExtent(): extent of series and taxonomic classes derived from SSURGO (SoilWeb) in vector and raster format (800m resolution). The vector output is identical to series extents reported by Series Extent Explorer
- mukey.wcs() and ISSR800.wcs() provide an interface to gSSURGO (mukey), gNATSGO (mukey), and the ISSR-800 (gridded soil property) data.

4.4 Viewing Pedon Locations

( Introducing the sf package with mapview)

4.4.1 Plotting Geographic Data

Plotting the data as an R graphic can give you some idea of how data look spatially and whether their distribution is what you expect.

Typos are relatively common when coordinates are manually entered. Viewing the data spatially is a quick way to see if any points plot far outside of the geographic area of interest and therefore clearly have an error.

# plot the locations of the gopheridge pedons with R
# 
# Steps:
# 1) create and inspect an sf data.frame object
# 2) plot the data with mapview

# load libraries
library(aqp)
library(soilDB)
library(sf)
library(mapview)

# this creates sample gopheridge object in your environment
data("gopheridge", package = "soilDB")

# replace gopheridge object with fetchNASIS() (your data)
# gopheridge <- fetchNASIS()

# create simple features POINT geometry data.frame
# st_as_sf(): convert data.frame to spatial simple features, with points in $geometry 
# st_crs(): set EPSG:4326 Coordinate Reference System (CRS) as Well-Known Text (WKT)
gopher.locations <- st_as_sf(
  site(gopheridge), 
  coords = c('x_std','y_std'),
  crs = st_crs(4326)
)

# create interactive map with sfc_POINT object
#  use site_id in sf data.frame as labels
mapview(gopher.locations, label = gopher.locations$site_id)

4.4.2 Exercise 1: Spatial Intro

In this exercise, you will create an interactive map with the pedons in your selected set. Then you will export them to a shapefile. Modify the code snippets below to make an R plot and a shapefile of pedon data loaded from your NASIS selected set. You will plot pedon locations using the standard WGS84 longitude/latitude decimal degrees fields from Site table of NASIS. In some cases, these data might be incomplete; you need to handle this possibility.

In this exercise you will create a subset SoilProfileCollection for the pedons that are not missing spatial data (x_std and y_std).

Make a new R script, load the aqp, soilDB, sf and mapview packages and some pedons via fetchNASIS() (or similar source).

library(aqp)
library(soilDB)
library(sf)
library(mapview)

# get pedons from the selected set
pedons <- fetchNASIS(from = 'pedons')

Use the base R subset() function to create a subset of your SoilProfileCollection using is.na()

x_std and y_std variables contain WGS84 longitude and latitude in decimal degrees. This is the standard format for location information used in NASIS.

# modify this code (replace ...) to create a subset
pedons.sp  <- aqp::subset(pedons, ...)

Create a sf data.frame from the site data in the SoilProfileCollection object pedons.sp using aqp::site(). Replace the ... in the following code. Promoting a data.frame to sf POINT geometry requires that the X and Y columns be specified.

pedon.locations <- sf::st_as_sf(
  ..., 
  coords = c('x_std', 'y_std'),
  crs = sf::st_crs(4326) #WGS84 GCS
)

View your sf object pedon.locations interactively with mapview::mapview(), and change the map.types argument to 'Esri.WorldImagery'. Use the pedon.locations column named site_id for the label argument.

# plot an interactive map
mapview(pedon.locations, 
        legend = FALSE, 
        map.types = 'OpenStreetMap',
        ...)

Create a subset sf data.frame with only the following “site data” columns: pedlabsampnum, pedon_id, taxonname, hillslopeprof, elev_field, slope_field, aspect_field, plantassocnm, bedrckdepth, bedrckkind, pmkind, pmorigin. Select the target columns with dplyr::select() (or another method) by replacing the ... in the following code.

pedon.locations_sub <- dplyr::select(pedon.locations, ...) 
# see also base::subset(x, select=...)

Export the spatial information in pedon.locations_sub to a shape file (.shp) with sf::st_write()

# write to SHP; output CRS is geographic coordinate system WGS84
sf::st_write(pedon.locations_sub, "./NASIS-pedons.shp")

For an example of exporting data to shapefile with the sp package, see this tutorial: Export Pedons to Shapefile with sp.

4.5 Many Packages, Many Spatial Representations

4.5.1 The `sf` package

Simple Features Access is a set of standards that specify a common storage and access model of geographic features. It is used mostly for two-dimensional geometries such as point, line, polygon, multi-point, multi-line, etc.

This is one of many ways of modeling the geometry of shapes in the real world. This model happens to be widely adopted in the R ecosystem via the sf package, and very convenient for typical data encountered by soil survey operations.

The sf package represents the latest and greatest in spatial data processing within the comfort of an R session. It provides a “main” object class sf to contain geometric data and associated tabular data in a familiar data.frame format. sf methods work on a variety of different levels of abstraction and manipulation of those geometries.

Most of the sf package functions start with the prefix st_, such as: st_crs() (get/set coordinate reference system), st_transform() (project feature class to different coordinate reference system), st_bbox() (bounding box), st_buffer() (buffer). Many of these are “verbs” that are common GIS operations.

4.5.1.1 `sf` vignettes

You can the following sf package vignettes for details, sample data sets and usage of sf objects.

4.5.2 The `sp` Package

The data structures (“classes”) and functions provided by the sp package have served a foundational role in the handling of spatial data in R for years.

Many of the following examples will reference names such as SpatialPoints, SpatialPointsDataFrame, and SpatialPolygonsDataFrame. These are specialized (S4) classes implemented by the sp package.

Objects of these classes maintain linkages between all of the components of spatial data. For example, a point, line, or polygon feature will typically be associated with:

coordinate geometry
bounding box
coordinate reference system
attribute table

4.5.3 Converting `sp` and `sf`

sp provides access to the same compiled code libraries (PROJ, GDAL, GEOS) through sf package.

For now the different package object types are interchangeable, and you may find yourself having to do this for a variety of reasons. You can convert between object types as needed using sf::as_Spatial() or sf::st_as_sf().

Check the documentation (?functionname) to figure out what object types different methods need as input; and check an input object’s class with class() or inherits().

4.5.4 Importing / Exporting Vector Data

Import a feature class from a ESRI File Geodatabase or shape file.

If you have a .shp file, you can specify the whole path, including the file extension in the dsn argument, or just the folder.

For a Geodatabase, you should specify the feature class using the layer argument. Note that a trailing “/” is omitted from the dsn (data source name) and the “.shp” suffix is omitted from the layer.

4.5.4.1 `sf`

x <- sf::st_read(dsn = 'E:/gis_data/ca630/FG_CA630_OFFICIAL.gdb', layer = 'ca630_a')
x <- sf::read_sf(dsn = 'E:/gis_data/ca630/pedon_locations.shp')

sf::st_write(x, dsn = 'E:/gis_data/ca630/pedon_locations.shp')
sf::write_sf(x, dsn = 'E:/gis_data/ca630/pedon_locations.shp')

4.5.4.2 `sp`

Export object x to shapefile using the sf syntax. rgdal is no longer available on CRAN.

The sf st_read() / read_sf() / st_write() / write_sf() functions have many arguments, so it is worth spending some time reviewing the associated manual pages.

4.5.5 Interactive mapping with `mapview` and `leaflet`

The mapview and leaflet packages make it possible to display interactive maps of sf objects in RStudio viewer pane, or within an HTML document generated via R Markdown (e.g. this document).

mapview package
- Basics
- Advanced Features
- See other “Articles” in this series, you can make complex, interactive maps using the mapview package.
leaflet package
leafem: ‘leaflet’ Extensions for ‘mapview’

4.5.6 Exercise 2: Map your favorite soil series extents

The seriesExtent function in soilDB returns an sf object showing generalized extent polygons for a given soil series.

# load required packages, just in case
library(soilDB)
library(sf)
library(mapview)

# series extents from SoilWeb (sf objects)
pentz <- seriesExtent('pentz')

## Reading layer `file30a036f03c9d' from data source 
##   `C:\Users\stephen.roecker\AppData\Local\Temp\RtmpiiOLdE\file30a036f03c9d' using driver `GeoJSON'
## Simple feature collection with 1 feature and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -122.26 ymin: 37.19 xmax: -120.1 ymax: 40.68
## Geodetic CRS:  WGS 84

amador <- seriesExtent('amador')

## Reading layer `file30a0191d1476' from data source 
##   `C:\Users\stephen.roecker\AppData\Local\Temp\RtmpiiOLdE\file30a0191d1476' using driver `GeoJSON'
## Simple feature collection with 1 feature and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -121.193 ymin: 37.183 xmax: -120.078 ymax: 38.588
## Geodetic CRS:  WGS 84

# combine into a single object
s <- rbind(pentz, amador)

# colors used in the map
# add more colors as needed
cols <- c('royalblue', 'firebrick')

# make a simple map, colors set by 'series' column
mapview(s, zcol = 'series', col.regions = cols, legend = TRUE)

The following code demonstrates how to fetch / convert / map soil series extents, using a vector of soil series names.

Results appear in the RStudio “Viewer” pane. Be sure to try the “Export” and “show in window” (next to the broom icon) buttons.

# load required packages, just in case
library(soilDB)
library(sf)
library(mapview)

# vector of series names, letter case does not matter
# try several (2-9)!
series.names <- c('auberry', 'sierra', 'holland', 'cagwin')

# iterate over series names, get extent
# result is a list of sf objects
s <- lapply(series.names, soilDB::seriesExtent)

# flatten list -> single sf object
s <- do.call('rbind', s)

# colors used in the map
# note trick used to dynamically set the number of colors
cols <- RColorBrewer::brewer.pal(n = length(series.names), name = 'Set1')

# make a simple map, colors set by 'series' column
# click on polygons for details
# try pop-out / export buttons
mapview(s, zcol = 'series', col.regions = cols, legend = TRUE)

Question: What do you notice about the areas where the extent polygons occur? Share your thoughts with your peers or mentor

4.5.7 The `terra` Package

The terra package package provides most of the commonly used grid and vector processing functionality that one might find in a conventional GIS. It provides high-level data structures and functions for the GDAL (Geospatial Data Abstraction Library).

re-sampling / interpolation
projection and warping (coordinate system transformations of gridded data)
cropping, mosaicing, masking
local and focal functions
raster algebra
contouring
raster/vector conversions
terrain analysis
model-based prediction (more on this in Part 2) #### Importing / Exporting Rasters

# use an example from the terra package
f <- system.file("ex", "elev.tif", package = "terra")

# corresponding luxembourg vector (polygon) data
g <- system.file("ex", "lux.shp", package = "terra")

r <- terra::rast(f)
r

## class       : SpatRaster 
## dimensions  : 90, 95, 1  (nrow, ncol, nlyr)
## resolution  : 0.008333333, 0.008333333  (x, y)
## extent      : 5.741667, 6.533333, 49.44167, 50.19167  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326) 
## source      : elev.tif 
## name        : elevation 
## min value   :       141 
## max value   :       547

v <- terra::vect(g)
v

##  class       : SpatVector 
##  geometry    : polygons 
##  dimensions  : 12, 6  (geometries, attributes)
##  extent      : 5.74414, 6.528252, 49.44781, 50.18162  (xmin, xmax, ymin, ymax)
##  source      : lux.shp
##  coord. ref. : lon/lat WGS 84 (EPSG:4326) 
##  names       :  ID_1   NAME_1  ID_2   NAME_2  AREA   POP
##  type        : <num>    <chr> <num>    <chr> <num> <int>
##  values      :     1 Diekirch     1 Clervaux   312 18081
##                    1 Diekirch     2 Diekirch   218 32543
##                    1 Diekirch     3  Redange   259 18664

# convert r to a RasterLayer object
r2 <- raster::raster(f)

# show SpatRaster details
print(r)

## class       : SpatRaster 
## dimensions  : 90, 95, 1  (nrow, ncol, nlyr)
## resolution  : 0.008333333, 0.008333333  (x, y)
## extent      : 5.741667, 6.533333, 49.44167, 50.19167  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326) 
## source      : elev.tif 
## name        : elevation 
## min value   :       141 
## max value   :       547

# show RasterLayer details
print(r2)

## class      : RasterLayer 
## dimensions : 90, 95, 8550  (nrow, ncol, ncell)
## resolution : 0.008333333, 0.008333333  (x, y)
## extent     : 5.741667, 6.533333, 49.44167, 50.19167  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs 
## source     : elev.tif 
## names      : elevation 
## values     : 141, 547  (min, max)

# default plot method
plot(r)
lines(v)

# interactive (leaflet) plot method
p <- plet(r, tiles = "OpenTopoMap")

lines(p, v)

The R object only stores a reference to the data until they are needed to be loaded into memory. This allows for internal raster manipulation algorithms to intelligently deal with very large grids that may not fit in memory.

4.5.7.1 Other approaches to raster data

4.5.7.1.1 `raster`

A more complete background on the capabilities of the raster package, and the replacement terra, are described in the Spatial Data Science with R online book.

Introduction to the raster package vignette

4.5.7.1.2 `stars`

There is also a package called stars (Spatiotemporal Arrays: Raster and Vector Datacubes) that is the sf-centric way of dealing with higher dimensional raster and vector “datacubes.” Data cubes have dimensions related to time, spectral band, and sensor. The stars data structures are often used for processing satellite data sources.

4.5.8 Converting Vector to Raster

4.5.8.1 `terra::rasterize()`

4.5.8.2 `raster::rasterize()`

4.5.8.3 `fasterize::fasterize()`

4.6 Coordinate Reference Systems (CRS)

Spatial data aren’t all that useful without an accurate description of the Coordinate Reference System (CRS). This type of information is typically stored within the “.prj” component of a shapefile, or in the header of a GeoTIFF.

Without a CRS it is not possible to perform coordinate transformations (e.g. conversion of geographic coordinates to projected coordinates), spatial overlay (e.g. intersection), or geometric calculations (e.g. distance or area).

The “old” way (PROJ.4) of specifying coordinate reference systems is using character strings containing, for example: +proj or +init arguments. In general, this still “works,” so you may encounter it and need to know about it. But you also may encounter cases where CRS are specified using integers, strings of the form authority:code, or well-known text (WKT).

Some common examples of coordinate system “EPSG” codes and their legacy “PROJ.4” strings. 4

“EPSG” stands for European Petroleum Survey Group. The “EPSG Geodetic Parameter Dataset” is a public registry of geodetic datums, spatial reference systems, Earth ellipsoids, coordinate transformations and related units of measurement.
“OGC” refers to the Open Geospatial Consortium, which is an example of another important authority:code. “ESRI” (company that develops ArcGIS) also defines many CRS codes.
“PROJ” is the software responsible for transforming coordinates from one CRS to another. The current version of PROJ available is 9, and in PROJ > 6 major changes were made to the way that coordinate reference systems are defined and transformed led to the “PROJ.4” syntax falling out of favor.
EPSG: 4326 / PROJ.4:+proj=longlat +datum=WGS84 - geographic, WGS84 datum (NASIS Standard)
OGC:CRS84 - geographic, WGS84 datum (same as above but explicit longitude, latitude XY order)
EPSG: 4269 / PROJ.4:+proj=longlat +datum=NAD83 - geographic, NAD83 datum
EPSG: 4267 / PROJ.4:+proj=longlat +datum=NAD27 - geographic, NAD27 datum
EPSG: 26910 / PROJ.4:+proj=utm +zone=10 +datum=NAD83 - projected (UTM zone 10), NAD83 datum
EPSG: 5070 / PROJ.4: +proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23.0 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs - Albers Equal Area CONUS (gSSURGO)
More on the EPSG codes and specifics of CRS definitions:
- https://spatialreference.org/ref/epsg/
- https://epsg.io/

While you may encounter PROJ.4 strings, these are no longer considered the preferred method of referencing Coordinate Reference Systems – and, in general, newer methods are “easier.”

Well-known text (WKT) is a human- machine-readable standard format for geometry, so storing the Coordinate Reference System information in a similar format makes sense. This format is returned by the sf::st_crs() method.

For example: the WKT representation of EPSG:4326:

st_crs(4326)

## Coordinate Reference System:
##   User input: EPSG:4326 
##   wkt:
## GEOGCRS["WGS 84",
##     ENSEMBLE["World Geodetic System 1984 ensemble",
##         MEMBER["World Geodetic System 1984 (Transit)"],
##         MEMBER["World Geodetic System 1984 (G730)"],
##         MEMBER["World Geodetic System 1984 (G873)"],
##         MEMBER["World Geodetic System 1984 (G1150)"],
##         MEMBER["World Geodetic System 1984 (G1674)"],
##         MEMBER["World Geodetic System 1984 (G1762)"],
##         MEMBER["World Geodetic System 1984 (G2139)"],
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1]],
##         ENSEMBLEACCURACY[2.0]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["geodetic latitude (Lat)",north,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["geodetic longitude (Lon)",east,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     USAGE[
##         SCOPE["Horizontal component of 3D system."],
##         AREA["World."],
##         BBOX[-90,-180,90,180]],
##     ID["EPSG",4326]]

This is using the OGC WKT CRS standard. Adoption of this standard caused some significant changes in packages in the R ecosystem.

So you can get familiar, what follows are several examples of doing the same thing: setting the CRS of spatial objects with WGS84 longitude/latitude geographic coordinates. If you have another target coordinate system, it is just a matter of using the correct codes to identify it.

4.6.1 Assigning and Transforming Coordinate Systems

Returning to the example from above, lets assign a CRS to our series extent s using different methods.

s <- seriesExtent('san joaquin')

## Reading layer `file30a0594977ff' from data source 
##   `C:\Users\stephen.roecker\AppData\Local\Temp\RtmpiiOLdE\file30a0594977ff' using driver `GeoJSON'
## Simple feature collection with 1 feature and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -122.82 ymin: 35.88 xmax: -118.81 ymax: 39.31
## Geodetic CRS:  WGS 84

The following sections give equivalent sf versus sp syntax.

4.6.1.1 `sf`

Use st_crs<- to set, or st_crs() get CRS of sf objects. Supply the target EPSG code as an integer.

# the CRS of s is EPSG:4326
st_crs(s) == st_crs(4326)

## [1] TRUE

# set CRS using st_crs<- (replace with identical value)
st_crs(s) <- st_crs(4326)

Transformation of points, lines, and polygons with sf requires an “origin” CRS be defined in the argument x. The “target” CRS is defined as an integer (EPSG code) in the crs argument or is the output of st_crs().

# transform to UTM zone 10
s.utm <- st_transform(x = s, crs = 26910)

# transform to GCS NAD27
s.nad27 <- st_transform(x = s, crs = st_crs(4267))

4.6.1.2 `sp`

You can do the same thing several different ways with sp objects. An equivalent EPSG, OGC and PROJ.4 can be set or get using proj4string<-/proj4string and either a sp CRS object or a PROJ.4 string for Spatial objects.

# s is an sf object (we converted it), convert back to Spatial* object
s.sp <- sf::as_Spatial(s) 

# these all create the same internal sp::CRS object
proj4string(s.sp) <- sp::CRS('EPSG:4326')          # proj >6; EPSG
proj4string(s.sp) <- sp::CRS('OGC:CRS84')          # proj >6; OGC
proj4string(s.sp) <- '+init=epsg:4326'             # proj4 style +init string
proj4string(s.sp) <- '+proj=longlat +datum=WGS84'  # proj4 style +proj string

Here, we do the same transformations we did above only using sp: spTransform().

# transform to UTM zone 10
s.utm <- spTransform(s.sp, CRS('+proj=utm +zone=10 +datum=NAD83'))

# transform to GCS NAD27
s.nad27 <- spTransform(s.sp, CRS('+proj=longlat +datum=NAD27'))

4.6.1.3 `terra` and `raster`

To assign or get the coordinate reference system for raster, terra or sp CRS objects use the crs() functions.

r <- terra::rast(system.file("ex", "elev.tif", package="terra"))

# inspect CRS
terra::crs(r)

# r is a SpatRaster object; set CRS to current CRS
terra::crs(r) <- terra::crs("OGC:CRS84")

“Transforming” or “warping” a raster is a different from with a vector as it requires interpolation of pixels to a target resolution and CRS.

The method provided by terra is project() and in raster it is projectRaster().

It works the same as the above “transform” methods in that you specify an object to transform, and the target reference system or a template for the object.

t.wgs84 <- terra::project(r, terra::crs("+proj=igh"))
r.wgs84 <- raster::projectRaster(raster::raster(r), crs = CRS("+proj=igh"))

Note that the default warping of raster uses bilinear interpolation (method='bilinear'), which is appropriate for continuous variables.

You also have the option of using nearest-neighbor (method='ngb') for categorical variables (class maps) where interpolation would not make sense.

If we want to save this transformed raster to file, we can use something like this for terra

terra::writeRaster(t.wgs84, filename='s_wgs84.tif', gdal=c("COMPRESS=LZW"))

Similarly for raster:

terra::writeRaster(r.wgs84, filename='s_wgs84.tif', gdal=c("COMPRESS=LZW"))

4.7 Load Required Packages

Load required packages into a fresh RStudio Session (Ctrl + Shift + F10)

library(aqp)
library(soilDB)
library(sf)
library(terra)

4.8 Download Example Data

Run the following to create a path for the example data. Be sure to set a valid path to a local disk.

# store path as a variable, in case you want to keep it somewhere else
ch4.data.path <- 'C:/workspace2/chapter-4'

# make a place to store chapter 2b example data
dir.create(ch4.data.path, recursive = TRUE)

# download polygon example data from github
download.file(
  'https://github.com/ncss-tech/stats_for_soil_survey/raw/master/data/chapter_4-spatial-data/chapter-4-mu-polygons.zip', 
  file.path(ch4.data.path, 'chapter-4-mu-polygons.zip')
)

# download raster example data from github 
download.file(
  'https://github.com/ncss-tech/stats_for_soil_survey/raw/master/data/chapter_4-spatial-data/chapter-4-PRISM.zip', 
  file.path(ch4.data.path, 'chapter-4-PRISM.zip')
)

# unzip
unzip(
  file.path(ch4.data.path, 'chapter-4-mu-polygons.zip'), 
  exdir = ch4.data.path, overwrite = TRUE
)

unzip(
  file.path(ch4.data.path, 'chapter-4-PRISM.zip'), 
  exdir = ch4.data.path, overwrite = TRUE
)

4.9 Load Example MLRA Data

We will be using polygons associated with MLRA 15 and 18 as part of this demonstration.

Import these data with sf::st_read().

# load MLRA polygons
mlra <- sf::st_read(file.path(ch4.data.path, 'mlra-18-15-AEA.shp'))

## alternately, use your own MLRA
# mlra <- soilDB::fetchSDA_spatial(c("15", "18"), by.col="MLRARSYM", geom.src = "MLRAPOLYGON") |> sf::st_transform("EPSG:5070")

We will load the sample MLRA 15 and 18 (California) raster data (PRISM derived) using terra::rast(). If using your own MLRA, you will need to update file paths to use your own rasters.

# mean annual air temperature, Deg C
maat <- terra::rast(file.path(ch4.data.path, 'MAAT.tif'))

# mean annual precipitation, mm
map <- terra::rast(file.path(ch4.data.path, 'MAP.tif'))

# frost-free days
ffd <- terra::rast(file.path(ch4.data.path, 'FFD.tif'))

# growing degree days
gdd <- terra::rast(file.path(ch4.data.path, 'GDD.tif'))

# percent of annual PPT as rain
rain_fraction <- terra::rast(file.path(ch4.data.path, 'rain_fraction.tif'))

# annual sum of monthly PPT - ET_p
ppt_eff <- terra::rast(file.path(ch4.data.path, 'effective_precipitation.tif'))

Sometimes it is convenient to “stack” raster data that share a common grid size, extent, and coordinate reference system into a multilayer terra SpatRaster object. Calling terra::rast() on a list of SpatRaster is equivalent to making a RasterStack from several RasterLayer with raster::stack().

# create a raster stack (multiple rasters aligned)
rs <- terra::rast(list(maat, map, ffd, gdd, rain_fraction, ppt_eff))

# inspect
rs

## class       : SpatRaster 
## dimensions  : 762, 616, 6  (nrow, ncol, nlyr)
## resolution  : 0.008333333, 0.008333333  (x, y)
## extent      : -123.2708, -118.1375, 34.44583, 40.79583  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat NAD83 (EPSG:4269) 
## sources     : MAAT.tif  
##               MAP.tif  
##               FFD.tif  
##               ... and 3 more source(s)
## names       :      MAAT,  MAP, FFD,  GDD, rain_~ction, effec~ation 
## min values  : -4.073542,  114,  35,   76,          12,   -825.5897 
## max values  : 18.676420, 2958, 365, 3173,         100,   2782.3914

plot(rs)

4.10 Raster data

4.10.1 Object Properties

SpatRaster and RasterLayer objects are similar to sf, sp and other R spatial objects in that they keep track of the linkages between data, coordinate reference system, and optional attribute tables. Getting and setting the contents of raster objects should be performed using functions such as:

terra::NAflag(r) / raster::NAvalue(r): get / set the NODATA value
terra::crs(r) / raster::wkt(r) : get / set the coordinate reference system
terra::res(r) / raster::res(r): get / set the resolution
terra::ext(r) / raster::extent(r): get / set the extent
terra::datatype(r) / raster::dataType(r): get / set the data type
… many more, see the raster and terra package manuals

4.10.2 Rasters “In Memory” v.s. “File-Based”

Processing of raster data in memory is always faster than processing on disk, as long as there is sufficient memory. The terra package handles basically all of the logic delegating in v.s. out of memory processing internally–so it is rare that any adjustments to defaults are required.

With the raster package, the initial file/disk-based reference can be converted to an in-memory RasterLayer with the readAll() function. You can achieve a similar effect in terra by doing set.values(object).

4.10.3 Writing Rasters to File

Exporting data requires consideration of the output format, datatype, encoding of NODATA, and other options such as compression.

With terra, “LZW” compression is used by default when writing GeoTIFF files. Using the gdal argument e.g.: terra::writeRaster(..., gdal=) is equivalent to specifying option argument to raster::writeRaster().

# using previous example data set
terra::writeRaster(t.wgs84, filename = 't.wgs84.tif')

For example, a RasterLayer object that you wanted to save to disk as an internally-compressed GeoTIFF:

# using previous example data set
raster::writeRaster(r.wgs84, filename = 'r.tif', options = c("COMPRESS=LZW"))

4.10.4 Data Types

Commonly used raster datatype include: “unsigned integer”, “signed integer”, and “floating point” of variable precision.

INT1U: integers from 0 to 255
INT2U: integers from 0 to 65,534
INT2S: integers from -32,767 to 32,767
INT4S: integers from -2,147,483,647 to 2,147,483,647
FLT4S: floating point from -3.4e+38 to 3.4e+38
FLT8S: floating point from -1.7e+308 to 1.7e+308

It is wise to manually specify an output datatype that will “just fit” the required precision.

For example, if you have generated a RasterLayer that warrants integer precision and ranges from 0 to 100, then the INT1U data type would provide enough precision to store all possible values and the NODATA value. Raster data stored as integers will always be smaller (sometimes 10-100x) than those stored as floating point, especially when internal compression is enabled.

# integer grid with a range of 0-100
# maybe soil texture classes
raster::writeRaster(r, filename = 'r.tif', datatype = 'INT1U')

# floating point grid with very wide range
# maybe DSM soil property model output
terra::writeRaster(t.wgs84, filename = 'r.tif', datatype = 'FLT4S')

4.10.4.1 Notes on Compression

It is often a good idea to create internally-compressed raster data.

The GeoTiff format can accommodate many different compression algorithms, including lossy (JPEG) compression. Usually, the default “LZW” or “DEFLATE” compression will result in significant savings, especially for data encoded as integers.

For example, the CONUS gSSURGO map unit key grid at 30m resolution is about 55Gb (GeoTiff, no compression) vs. 2.4Gb after LZW compression.

# reasonable compression using LZW is the default, compare to 

raster::writeRaster(r, filename='r.tif', options=c("COMPRESS=NONE"))

# takes longer to write the file, but better compression
terra::writeRaster(t.wgs84, filename='r.tif', gdal=c("COMPRESS=DEFLATE", "PREDICTOR=2", "ZLEVEL=9")

See this article for some ideas on optimization of file read/write times and associated compressed file sizes.

4.11 Vector Data

4.11.1 `sf`

p <- sf::st_as_sf(data.frame(x = -120, y = 37.5),
                  coords = c("x", "y"),
                  crs = 4326)
p.aea <- st_transform(p, "EPSG:5070")

In sf the functions used to do this are st_intersects() or st_intersection().

st_intersects(p.aea, mlra)

## Sparse geometry binary predicate list of length 1, where the predicate was `intersects'
##  1: 2

st_intersection(p.aea, mlra)

## Simple feature collection with 1 feature and 5 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -2079434 ymin: 1870764 xmax: -2079434 ymax: 1870764
## Projected CRS: NAD83 / Conus Albers
##   MLRARSYM MLRA_ID               MLRA_NAME LRRSYM
## 1       18      23 Sierra Nevada Foothills      C
##                                                         LRR_NAME                 geometry
## 1 California Subtropical Fruit, Truck, and Specialty Crop Region POINT (-2079434 1870764)

4.11.2 `terra`

p <- terra::vect(data.frame(x = -120, y = 37.5),
                 geom = c("x", "y"),
                 crs = "EPSG:4326")
p.aea <- project(p, "EPSG:5070")

In terra the functions used to determine the intersection is relate().

mlra[relate(vect(mlra), p.aea, relation = "intersects"), 
]

## Simple feature collection with 1 feature and 5 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -2181926 ymin: 1548989 xmax: -1970476 ymax: 2264711
## Projected CRS: Albers
##   MLRARSYM MLRA_ID               MLRA_NAME LRRSYM
## 2       18      23 Sierra Nevada Foothills      C
##                                                         LRR_NAME                       geometry
## 2 California Subtropical Fruit, Truck, and Specialty Crop Region POLYGON ((-2160599 2264711,...

4.11.3 `sp`

In sp objects, you do these operations with the sp::over() function. Access the associated vignette by pasting vignette("over") in the console when the sp package is loaded.

# hand make a SpatialPoints object
# note that this is GCS
p <- SpatialPoints(coords = cbind(-120, 37.5), 
                   proj4string = CRS('+proj=longlat +datum=WGS84'))

mlra.sp <- sf::as_Spatial(mlra)
# spatial extraction of MLRA data requires a CRS transformation
p.aea <- spTransform(p, proj4string(mlra.sp))
over(p.aea, mlra.sp)

4.12 Spatial Operations

Spatial data are lot more useful when “related” (overlay, intersect, spatial query, etc.) to generate something new. The CRS of the two objects being overlaid must match.

4.12.1 Working with Vector and Raster Data

Typically, spatial queries of raster data by geometry features (point, line, polygon) are performed in two ways:

For each geometry, collect all pixels that overlap (exactextractr approach)
For each geometry, collect a sample of pixels defined by sampling points

The first method ensures that all data are included in the analysis, however, processing can be slow for multiple/detailed rasters, and the results may not fit into memory.

The second method is more efficient (10-100x faster), requires less memory, and can remain statistically sound–as long as a reasonable sampling strategy is applied. Sampling may also help you avoid low-acreage “anomalies” in the raster product. More on sampling methods in the next chapter.

The extract() function can perform several operations in one call, such as buffering (in projected units) with buffer argument. See the manual page for an extensive listing of optional arguments and what they do.

Sampling and extraction with terra the results in a SpatVector object. Sampling and extraction with raster methods results in a matrix object.

# sampling single layer SpatRaster
terra::spatSample(maat, size = 10)

##         MAAT
## 1         NA
## 2   9.491992
## 3  17.133940
## 4  10.949153
## 5  11.780624
## 6  15.016464
## 7         NA
## 8         NA
## 9   9.910385
## 10        NA

# sampling SpatRaster
terra::spatSample(rs, size = 10)

##         MAAT MAP FFD  GDD rain_fraction effective_precipitation
## 1  18.506054 181 335 3134           100              -781.39319
## 2  14.970438 480 339 2282           100              -259.15225
## 3  11.394429 144 159 1965            95              -535.72083
## 4         NA  NA  NA   NA            NA                      NA
## 5  17.689205 197 316 2952            99              -737.10382
## 6  16.204180 514 294 2590            99              -315.05930
## 7   8.346684 572 106 1532            88               -12.34948
## 8  14.082307 605 338 1956           100               -80.17986
## 9  17.980888 188 316 3026           100              -749.85864
## 10 15.065522 706 246 2415            97               -91.40856

par(mfcol = c(1, 2), mar = c(1, 1, 3, 1))

# regular sampling + extraction of raster values
x.regular <-  terra::spatSample(
  maat,
  method = "regular",
  size = 100,
  as.points = TRUE
)
x.regular

##  class       : SpatVector 
##  geometry    : points 
##  dimensions  : 96, 1  (geometries, attributes)
##  extent      : -123.2667, -118.1417, 34.64167, 40.6  (xmin, xmax, ymin, ymax)
##  coord. ref. : lon/lat NAD83 (EPSG:4269) 
##  names       :  MAAT
##  type        : <num>
##  values      :    NA
##                16.41
##                   11

# see also raster::sampleRegular()

plot(maat,
     axes = FALSE,
     legend = FALSE,
     main = 'Regular Sampling')
points(x.regular)

# random sample + extraction of raster values
# note that NULL values are removed
x.random <- terra::spatSample(
  maat,
  size = 100,
  as.points = TRUE,
  na.rm = TRUE
)

# see also raster::sampleRandom()

plot(maat,
     axes = FALSE,
     legend = FALSE,
     main = 'Random Sampling with NA Removal')
points(x.random)

Note that the mean can be efficiently estimated, even with a relatively small number of samples.

# all values: slow for large grids
mean(terra::values(maat), na.rm = TRUE)

# regular sampling: efficient, central tendency comparable to above
mean(x.regular$MAAT, na.rm = TRUE)

# this value will be pseudorandom
#  depends on number of samples, pattern of NA
mean(x.random$MAAT, na.rm = TRUE)

Just how much variation can we expect when collecting 100, randomly-located samples over such a large area?

# 10 replications of samples of n=100
z <- replicate(10, {
  mean(terra::spatSample(maat,
                         size = 100,
                         na.rm = TRUE)$MAAT,
       na.rm = TRUE)
})

# 90% of the time the mean MAAT values were within:
quantile(z, probs = c(0.05, 0.95))

Do the above routine 100 times: compute the mean MAAT from 100 randomly-located samples. Does it make a difference in your estimates?

# MLRA polygons in native coordinate system
plot(sf::st_geometry(mlra), main = 'MLRA 15 and 18')
box()

# MAAT raster
plot(maat, main = 'PRISM Mean Annual Air Temperature (deg C)')

# plot MAAT raster with MLRA polygons on top
# this requires transforming to CRS of MAAT
mlra.gcs <- sf::st_transform(mlra, sf::st_crs(maat))
plot(maat, main = 'PRISM Mean Annual Air Temperature (deg C)')
plot(sf::st_geometry(mlra.gcs), main = 'MLRA 15 and 18', add = TRUE)

4.12.2 Exercise 3: Extracting Raster Data

4.12.2.1 Raster Summary By Point: NASIS Pedon Locations

Extract PRISM data at the coordinates associated with NASIS pedons that have been correlated to the Loafercreek series.

We will use the sample dataset loafercreek from the soilDB package to get NASIS data. This example can be easily adapted to your own pedon data extracted from NASIS using fetchNASIS(), but if your points are not in California, you will need to supply your own raster data.

Get some NASIS data and upgrade the “site” data to a sf object.

data("loafercreek", package="soilDB")

# result is a SoilProfileCollection object
pedons <- loafercreek

## alternately, use fetchNASIS()
# pedons <- fetchNASIS()

# extract site data
s <-  sf::st_as_sf(aqp::site(pedons),
                   coords = c("x_std", "y_std"),
                   crs = 4326,
                   na.fail = FALSE)

Extract PRISM data (the SpatRaster object we made earlier) at the Loafercreek pedon locations and summarize.

# convert sf object s to terra SpatVector
# and project to CRS of the raster
s2 <- project(terra::vect(s), rs)

# pass to terra::extract()
e <- terra::extract(rs, s2, df = TRUE)

# summarize: remove first (ID) column using [, -1] j index
summary(e[, -1])

##       MAAT            MAP              FFD             GDD       rain_fraction   effective_precipitation
##  Min.   :13.15   Min.   : 432.0   Min.   :189.0   Min.   :2085   Min.   :96.00   Min.   :-433.14        
##  1st Qu.:15.59   1st Qu.: 576.0   1st Qu.:261.2   1st Qu.:2479   1st Qu.:99.00   1st Qu.:-263.46        
##  Median :15.99   Median : 682.5   Median :285.0   Median :2540   Median :99.00   Median :-152.00        
##  Mean   :15.82   Mean   : 680.4   Mean   :281.0   Mean   :2515   Mean   :98.81   Mean   :-146.05        
##  3rd Qu.:16.24   3rd Qu.: 771.0   3rd Qu.:307.8   3rd Qu.:2592   3rd Qu.:99.00   3rd Qu.: -36.87        
##  Max.   :16.58   Max.   :1049.0   Max.   :330.0   Max.   :2654   Max.   :99.00   Max.   : 201.61

Join the extracted PRISM data with the original SoilProfileCollection object.

# combine site data (sf) with extracted raster values (data.frame), row-order is identical, result is sf
res <- cbind(s, e)

# extract unique IDs and PRISM data
# dplyr verbs work with sf data.frames
res2 <- dplyr::select(res, pedon_id, MAAT, MAP, FFD, GDD, rain_fraction, effective_precipitation)

# join with original SoilProfileCollection object via pedon_key
site(pedons) <- res2

The extracted values are now part of the “pedons” SoilProfileCollection object via site(<SoilProfileCollection>) <- data.frame LEFT JOIN method.

Let’s summarize the data we extracted using quantiles.

# define some custom functions for calculating range observed in site data
my_low_function <- function(x) quantile(x, probs = 0.05, na.rm = TRUE)
my_rv_function <- function(x) median(x, na.rm = TRUE)
my_high_function <- function(x) quantile(x, probs = 0.95, na.rm = TRUE)

site(pedons) |> 
  dplyr::select(pedon_id, MAAT, MAP, FFD, GDD,
                rain_fraction, effective_precipitation) |> 
  dplyr::summarize(dplyr::across(
    MAAT:effective_precipitation,
    list(low = my_low_function,
         rv = my_rv_function,
         high = my_high_function)
  ))

##   MAAT_low  MAAT_rv MAAT_high MAP_low MAP_rv MAP_high FFD_low FFD_rv FFD_high GDD_low GDD_rv GDD_high
## 1 14.33665 15.98908  16.51595   479.5  682.5      904     220    285      320 2274.75 2540.5  2638.75
##   rain_fraction_low rain_fraction_rv rain_fraction_high effective_precipitation_low
## 1             97.25               99                 99                   -369.3428
##   effective_precipitation_rv effective_precipitation_high
## 1                  -151.9985                     94.25339

4.12.2.2 Raster Summary By Polygon: Series Extent

The seriesExtent() function from the soilDB package provides a simple interface to Series Extent Explorer data files. Note that these series extents have been generalized for rapid display at regional to continental scales. A more precise representation of “series extent” can be generated from SSURGO polygons and queried from SDA.

Get an approximate extent for the Loafercreek soil series from SEE. See the seriesExtent tutorial and manual page for additional options and related functions.

# get (generalized) amador soil series extent from SoilWeb
x <- soilDB::seriesExtent(s = 'loafercreek')

## Reading layer `file30a02ec42cdb' from data source 
##   `C:\Users\stephen.roecker\AppData\Local\Temp\RtmpiiOLdE\file30a02ec42cdb' using driver `GeoJSON'
## Simple feature collection with 1 feature and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -121.55 ymin: 37.18 xmax: -119.9 ymax: 39.66
## Geodetic CRS:  WGS 84

# convert to EPSG:5070 Albers Equal Area
x <- sf::st_transform(x, 5070)

Generate 100 sampling points within the extent using a hexagonal grid. These point locations will be used to extract raster values from our SpatRaster of PRISM data. Note that using a “hexagonal” grid is not supported on geographic coordinates.

samples <- sf::st_sample(x, size = 100, type = 'hexagonal')

For comparison, extract a single point from each SSURGO map unit delineation that contains Loafercreek as a major component. This will require a query to SDA for the set of matching map unit keys (mukey), followed by a second request to SDA for the geometry.

The SDA_query function is used to send arbitrary queries written in SQL to SDA, the results may be a data.frame or list, depending on the complexity of the query. The fetchSDA_spatial function returns map unit geometry as either polygons, polygon envelopes, or a single point within each polygon as selected by mukey or nationalmusym.

# result is a data.frame
mukeys <- soilDB::SDA_query("SELECT DISTINCT mukey FROM component
                             WHERE compname = 'Loafercreek' AND majcompflag = 'Yes';")

# result is a sf data.frame
loafercreek.pts <- soilDB::fetchSDA_spatial(
  mukeys$mukey,
  by.col = 'mukey',
  method = 'point',
  chunk.size = 35
)

Graphically check both methods:

# prepare samples and mapunit points for viewing on PRISM data
hexagonal <- sf::st_transform(samples, sf::st_crs(maat))
x_gcs <- sf::st_transform(x, sf::st_crs(maat))
maatcrop <- terra::crop(maat, x_gcs)

# adjust margins and setup plot device for two columns
par(mar = c(1, 1, 3, 1), mfcol = c(1, 2))

# first figure
plot(maatcrop,
     main = 'PRISM MAAT\n100 Sampling Points from Extent',
     axes = FALSE)
plot(sf::st_geometry(x_gcs), add = TRUE)
plot(hexagonal, cex = 0.25, add = T)

plot(maatcrop,
     main = 'PRISM MAAT\n"Loafercreek" Polygon Centroids',
     axes = FALSE)
plot(loafercreek.pts, cex = 0.25, add = TRUE)

Extract PRISM data (the SpatRaster object we made earlier) at the sampling locations (100 regularly-spaced and from MU polygon centroids) and summarize. Note that CRS transformations are automatic (when possible), with a warning.

# return the result as a data.frame object
e <- terra::extract(rs, terra::vect(hexagonal), df = TRUE)
e.pts <- terra::extract(rs, terra::vect(loafercreek.pts), df = TRUE)

# check out the extracted data
summary(e[,-1])

##       MAAT            MAP              FFD             GDD       rain_fraction   effective_precipitation
##  Min.   :13.48   Min.   : 336.0   Min.   :207.0   Min.   :2121   Min.   :94.00   Min.   :-560.16        
##  1st Qu.:15.87   1st Qu.: 527.0   1st Qu.:278.5   1st Qu.:2523   1st Qu.:99.00   1st Qu.:-326.77        
##  Median :16.22   Median : 648.0   Median :303.0   Median :2575   Median :99.00   Median :-176.97        
##  Mean   :16.03   Mean   : 668.1   Mean   :293.0   Mean   :2549   Mean   :98.72   Mean   :-169.22        
##  3rd Qu.:16.53   3rd Qu.: 783.5   3rd Qu.:318.0   3rd Qu.:2630   3rd Qu.:99.00   3rd Qu.: -62.05        
##  Max.   :16.98   Max.   :1223.0   Max.   :340.0   Max.   :2740   Max.   :99.00   Max.   : 354.08

# all pair-wise correlations
knitr::kable(cor(e[,-1]), digits = 2)

	MAAT	MAP	FFD	GDD	rain_fraction	effective_precipitation
MAAT	1.00	-0.48	0.96	0.99	0.83	-0.61
MAP	-0.48	1.00	-0.43	-0.53	-0.36	0.99
FFD	0.96	-0.43	1.00	0.94	0.71	-0.55
GDD	0.99	-0.53	0.94	1.00	0.83	-0.66
rain_fraction	0.83	-0.36	0.71	0.83	1.00	-0.47
effective_precipitation	-0.61	0.99	-0.55	-0.66	-0.47	1.00

Quickly compare the two sets of samples.

# compile results into a list
maat.comparison <- list('regular samples' = e$MAAT,
                        'polygon centroids' = e.pts$MAAT)

# number of samples per method
lapply(maat.comparison, length)

## $`regular samples`
## [1] 103
## 
## $`polygon centroids`
## [1] 2336

# summary() applied by group
lapply(maat.comparison, summary)

## $`regular samples`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   13.48   15.87   16.22   16.03   16.53   16.98 
## 
## $`polygon centroids`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   12.70   15.82   16.19   16.02   16.44   17.41

# box-whisker plot
par(mar = c(4.5, 8, 3, 1), mfcol = c(1, 1))
boxplot(
  maat.comparison,
  horizontal = TRUE,
  las = 1,
  xlab = 'MAAT (deg C)',
  varwidth = TRUE,
  boxwex = 0.5,
  main = 'MAAT Comparison'
)

Basic climate summaries from a standardized source (e.g. PRISM) might be a useful addition to an OSD, or checking the ranges reported in mapunits.

4.12.2.3 Raster Summary By Polygon: MLRA

The following example is a simplified version of what is available in the soilReports package, reports on the ncss-tech GitHub repository.

Efficient summary of large raster data sources can be accomplished using:

internally-compressed raster data sources, stored on a local disk, can be in any coordinate system
polygons stored in an equal-area or UTM coordinate system, with CRS units of meters
fixed-density sampling of polygons
estimation of quantiles from collected raster samples

Back to our example data. The first step is to check the MLRA polygons (mlra); how many features per MLRA symbol? Note that some MLRA have more than one polygon.

table(mlra$MLRARSYM)

Convert polygon area from square meters to acres and summarize. Note that this will only make sense when using a projected CRS with units of meters (equal area)!

poly.area <- terra::expanse(terra::vect(mlra)) / 4046.86 

sf::sf_use_s2(TRUE)
poly.area.s2 <- units::set_units(x = sf::st_area(mlra), value = "acre")

sf::sf_use_s2(FALSE)
poly.area.sf <- units::set_units(x = sf::st_area(mlra), value = "acre")

summary(poly.area)

sum(poly.area)
sum(poly.area.s2)
sum(poly.area.sf)

Sample each polygon at a constant sampling density of 0.001 samples per acre (1 sample for every 1,000 ac.). At this sampling density we should expect approximately 16,700 samples–more than enough for our simple example.

library(sharpshootR)

# the next function requires a polygon ID: 
#  each polygon gets a unique number 1--number of polygons
mlra$pID <- 1:nrow(mlra)
cds <- constantDensitySampling(mlra, n.pts.per.ac = 0.001)

Extract MLRA symbol at sample points using the sf::st_intersection() function. The result will be a sf object with attributes from our MLRA polygons which intersect the sampling points (cds).

# spatial overlay: sampling points and MLRA polygons
res <- sf::st_intersection(sf::st_transform(sf::st_as_sf(cds), sf::st_crs(mlra)), mlra)

# row / feature order is preserved, so we can directly copy
cds$mlra <- res$MLRARSYM

# tabulate number of samples per MLRA
table(cds$mlra)

## 
##    15    18 
## 11620  5137

Extract values from the SpatVector of PRISM data as a data.frame.

e <- terra::extract(rs, terra::project(cds, terra::crs(rs)))

# join columns from extracted values and sampling points
s.df <- cbind(as(cds, 'data.frame'), e)

# check results
head(s.df)

##   MLRARSYM MLRA_ID                      MLRA_NAME LRRSYM
## 1       15      20 Central California Coast Range      C
## 2       15      20 Central California Coast Range      C
## 3       15      20 Central California Coast Range      C
## 4       15      20 Central California Coast Range      C
## 5       15      20 Central California Coast Range      C
## 6       15      20 Central California Coast Range      C
##                                                         LRR_NAME pID mlra ID     MAAT  MAP FFD  GDD
## 1 California Subtropical Fruit, Truck, and Specialty Crop Region   1   15  1 15.19286 1149 306 2303
## 2 California Subtropical Fruit, Truck, and Specialty Crop Region   1   15  2 15.33926 1049 307 2369
## 3 California Subtropical Fruit, Truck, and Specialty Crop Region   1   15  3 15.42254 1041 313 2381
## 4 California Subtropical Fruit, Truck, and Specialty Crop Region   1   15  4 15.44636 1087 308 2382
## 5 California Subtropical Fruit, Truck, and Specialty Crop Region   1   15  5 15.39205 1116 316 2349
## 6 California Subtropical Fruit, Truck, and Specialty Crop Region   1   15  6 15.43280 1058 313 2387
##   rain_fraction effective_precipitation
## 1            99                385.6023
## 2            99                252.4252
## 3            99                242.8284
## 4            99                283.1933
## 5            99                314.3419
## 6            99                258.3234

Summarizing multivariate data by group (MLRA) is usually much simpler after reshaping data from “wide” to “long” format.

# reshape from wide to long format
m <- tidyr::pivot_longer(s.df, cols = c(MAAT, MAP, FFD, GDD, rain_fraction, effective_precipitation))

# check "wide" format
head(m)

## # A tibble: 6 × 10
##   MLRARSYM MLRA_ID MLRA_NAME                      LRRSYM LRR_NAME                 pID mlra     ID name   value
##   <chr>      <int> <chr>                          <chr>  <chr>                  <int> <chr> <dbl> <chr>  <dbl>
## 1 15            20 Central California Coast Range C      California Subtropica…     1 15        1 MAAT    15.2
## 2 15            20 Central California Coast Range C      California Subtropica…     1 15        1 MAP   1149  
## 3 15            20 Central California Coast Range C      California Subtropica…     1 15        1 FFD    306  
## 4 15            20 Central California Coast Range C      California Subtropica…     1 15        1 GDD   2303  
## 5 15            20 Central California Coast Range C      California Subtropica…     1 15        1 rain…   99  
## 6 15            20 Central California Coast Range C      California Subtropica…     1 15        1 effe…  386.

A tabular summary of means by MLRA and PRISM variable using dplyr v.s. base tapply().

# tabular summary of mean values
dplyr::group_by(m, mlra, name) %>% 
  dplyr::summarize(mean(value)) %>%
  dplyr::arrange(name)

## # A tibble: 12 × 3
## # Groups:   mlra [2]
##    mlra  name                    `mean(value)`
##    <chr> <chr>                           <dbl>
##  1 15    FFD                             284. 
##  2 18    FFD                             273. 
##  3 15    GDD                            2387. 
##  4 18    GDD                            2496. 
##  5 15    MAAT                             15.2
##  6 18    MAAT                             15.7
##  7 15    MAP                             588. 
##  8 18    MAP                             631. 
##  9 15    effective_precipitation        -197. 
## 10 18    effective_precipitation        -193. 
## 11 15    rain_fraction                    98.6
## 12 18    rain_fraction                    97.2

# base R
tapply(m$value, list(m$mlra, m$name), mean, na.rm = TRUE)

##    effective_precipitation      FFD      GDD     MAAT      MAP rain_fraction
## 15               -196.8961 284.3748 2386.711 15.24977 587.8348      98.60990
## 18               -192.9192 273.1376 2496.125 15.66251 631.3798      97.21803

4.12.3 Example: Faster with `exactextractr`

This example shows how to determine the distribution of Frost-Free Days across a soil series extent.

The data are extracted from the raster data source very rapidly using the exactextractr package.

library(sf)
library(soilDB)
library(terra)
library(lattice)
library(exactextractr)

# 5-10 seconds to download Series Extent Explorer data 
series <- c('holland', 'san joaquin')

# make SpatialPolygonsDataFrame
s <- do.call('rbind', lapply(series, seriesExtent))

# load pointer to PRISM data
r <- rast('C:/workspace2/chapter-4/FFD.tif')

# transform extent to CRS of raster with sf
s <- st_transform(st_as_sf(s), crs = st_crs(r))

# inspect
s

# use `st_union(s)` to create a MULTI- POINT/LINE/POLYGON from single
# use `sf::st_cast(s, 'POLYGON')` to create other types

system.time({ ex <- exactextractr::exact_extract(r, s) })

# ex is a list(), with data.frame [value, coverage_fraction]
#  for each polygon in s (we have one MULTIPOLYGON per series)

# combine all list elements `ex` into single data.frame `ex.all`
#  - use do.call('rbind', ...) to stack data.frames row-wise
#  - an anonymous function that iterates along length of `ex`
#  - adding the series name to as a new variable, calculated using `i`
ex.all <- do.call('rbind', lapply(seq_along(ex), function(i) {
  cbind(data.frame(group = series[i]), ex[[i]])
}))

# simple summary
densityplot(~ value | group, data = ex.all, 
            plot.points = FALSE, bw = 2, lwd = 2,
            strip = strip.custom(bg = grey(0.85)),
            scales = list(alternating = 1),
            col = c('RoyalBlue'), layout = c(1, 2),
            ylab = 'Density', from = 0, to = 400,
            xlab = 'Frost-Free Days (50% chance)\n800m PRISM Data (1981-2010)', 
            main = 'FFD Estimate for Extent of San Joaquin and Holland Series'
)

4.12.4 Example: Summarizing MLRA Raster Data with `lattice` graphics

Lattice graphics are useful for summarizing grouped comparisons.

The syntax is difficult to learn and remember, but there is a lot of documentation online.

library(lattice)

tps <- list(
    box.rectangle = list(col = 'black'),
    box.umbrella = list(col = 'black', lty = 1),
    box.dot = list(cex = 0.75),
    plot.symbol = list(
      col = rgb(0.1, 0.1, 0.1, alpha = 0.25, maxColorValue = 1),
      cex = 0.25
    )
  )

bwplot(mlra ~ value | name, data = m,                   # setup plot and data source
       as.table=TRUE,                                   # start panels in top/left corner
       varwidth=TRUE,                                   # scale width of box by number of obs
       scales=list(alternating=3, relation='free'),     # setup scales
       strip=strip.custom(bg=grey(0.9)),                # styling for strips
       par.settings=tps,                                # apply box/line/point styling
       panel=function(...) {                            # within in panel, do the following
          panel.grid(-1, -1)                            # make grid lines at all tick marks
          panel.bwplot(...)                             # make box-whisker plot
      }
)

4.13 Additional Reading (Spatial)

Ahmed, Zia. 2020. Geospatial Data Science with R.
Gimond, M., 2019. Intro to GIS and Spatial Analysis https://mgimond.github.io/Spatial/
Hijmans, R.J. 2019. Spatial Data Science with R. https://rspatial.org/
Lovelace, R., J. Nowosad, and J. Muenchow, 2019. Geocomputation with R. CRC Press. https://bookdown.org/robinlovelace/geocompr/
Pebesma, E., and R.S. Bivand. 2005. Classes and methods for spatial data: The sp package. https://cran.r-project.org/web/packages/sp/vignettes/intro_sp.pdf.
Pebesma, E. and R. Bivand, 2019. Spatial Data Science. https://keen-swartz-3146c4.netlify.com/
Applied Spatial Data Analysis with R