Chapter 4 Spatial Data in R
This chapter is a brief demonstration of possible ways to process spatial data in R.
4.1 Objectives (Spatial Data)
- Gain experience with creating, editing, and exporting spatial data objects in R.
- Learn about making maps with R
- Learn the basics of
sf
andsp
representation of vector data - Learn the basics of
terra
classes and functions - Learn about some interfaces to NCSS spatial data sources
- Develop a strategy for navigating the many possible spatial data processing methods
The next sections will require loading these libraries into the R session.
# SPC and soil database interface
library(aqp)
library(soilDB)
# "Simple Feature" (vector) data structures
library(sf)
# superseded by sf -- spatial object classes e.g. SpatialPoints/SpatialPolygons
library(sp)
# gridded data management / analysis
library(terra)
# superseded by terra
library(raster)
# interactive maps with leaflet
library(mapview)
There are many packages available for working with spatial data, however we only have time to cover introducing a few common libraries.
A couple resources are linked here for 5 packages that provide different ways displaying spatial data graphically:
4.2 Making Maps with R
R has become a powerful tool for visualization and interaction with spatial data. There are many tools available for making maps with R! It is not all geostatistics and coordinate reference system transformations. There are powerful ways to automate your GIS workflow from beginning to end–from creating terrain derivatives from a source DEM, to high-quality, publication-ready maps and interactive HTML/JavaScript widgets.
4.3 Spatial Data Sources
Spatial data sources: “raster” and “vector”
- Raster data sources (grids/images): GeoTIFF, ERDAS, BIL, ASCII grid, WMS, …
- Vector data sources (points/lines/polygons): Shape File, ESRI File Geodatabase, KML, GeoJSON, GML, WFS, …
Conventional data sources that can be upgraded to spatial data:
- NASIS/LIMS reports: typically point coordinates
- Web pages: GeoJSON, WKT, or point coordinates
- Excel file: typically point coordinates
- CSV files: typically point coordinates
- Photo EXIF information: typically point coordinates
Here are some R-based interfaces to NCSS data sources via soilDB
package.
Functions that return tabular data which can be upgraded to spatial data:
fetchNASIS()
: NASIS “site” data contain x,y,fetchLDM()
: KSSL “site” data from Lab Data Mart contain x,y coordinatesfetchKSSL()
: KSSL “site” data from SoilWeb contain x,y coordinates coordinates
Functions that return spatial data:
fetchSDA_spatial()
: polygon, bounding box and centroid data from SSURGO, STATSGO and thesapolygon
(Soil Survey Area Polygon) from Soil Data Access (SDA)fetchHenry()
: sensor / weather station locations as pointsSDA_query()
: SSURGO data as points, lines, polygons (via SDA)SDA_spatialQuery()
: use points or polygons as a “query” to SDAseriesExtent()
andtaxaExtent()
: extent of series and taxonomic classes derived from SSURGO (SoilWeb) in vector and raster format (800m resolution). The vector output is identical to series extents reported by Series Extent Explorermukey.wcs()
andISSR800.wcs()
provide an interface to gSSURGO (mukey), gNATSGO (mukey), and the ISSR-800 (gridded soil property) data.
4.4 Viewing Pedon Locations
( Introducing the sf
package with mapview
)
4.4.1 Plotting Geographic Data
Plotting the data as an R graphic can give you some idea of how data look spatially and whether their distribution is what you expect.
Typos are relatively common when coordinates are manually entered. Viewing the data spatially is a quick way to see if any points plot far outside of the geographic area of interest and therefore clearly have an error.
# plot the locations of the gopheridge pedons with R
#
# Steps:
# 1) create and inspect an sf data.frame object
# 2) plot the data with mapview
# load libraries
library(aqp)
library(soilDB)
library(sf)
library(mapview)
# this creates sample gopheridge object in your environment
data("gopheridge", package = "soilDB")
# replace gopheridge object with fetchNASIS() (your data)
# gopheridge <- fetchNASIS()
# create simple features POINT geometry data.frame
# st_as_sf(): convert data.frame to spatial simple features, with points in $geometry
# st_crs(): set EPSG:4326 Coordinate Reference System (CRS) as Well-Known Text (WKT)
<- st_as_sf(
gopher.locations site(gopheridge),
coords = c('x_std','y_std'),
crs = st_crs(4326)
)
# create interactive map with sfc_POINT object
# use site_id in sf data.frame as labels
mapview(gopher.locations, label = gopher.locations$site_id)
4.4.2 Exercise 1: Spatial Intro
In this exercise, you will create an interactive map with the pedons in your selected set. Then you will export them to a shapefile. Modify the code snippets below to make an R plot and a shapefile of pedon data loaded from your NASIS selected set. You will plot pedon locations using the standard WGS84 longitude/latitude decimal degrees fields from Site table of NASIS. In some cases, these data might be incomplete; you need to handle this possibility.
In this exercise you will create a subset SoilProfileCollection for the pedons that are not missing spatial data (x_std
and y_std
).
- Make a new R script, load the
aqp
,soilDB
,sf
andmapview
packages and some pedons viafetchNASIS()
(or similar source).
library(aqp)
library(soilDB)
library(sf)
library(mapview)
# get pedons from the selected set
<- fetchNASIS(from = 'pedons') pedons
- Use the base R
subset()
function to create a subset of your SoilProfileCollection usingis.na()
x_std
andy_std
variables contain WGS84 longitude and latitude in decimal degrees. This is the standard format for location information used in NASIS.
# modify this code (replace ...) to create a subset
<- aqp::subset(pedons, ...) pedons.sp
- Create a
sf
data.frame from the site data in the SoilProfileCollection objectpedons.sp
usingaqp::site()
. Replace the...
in the following code. Promoting a data.frame to sf POINT geometry requires that the X and Y columns be specified.
<- sf::st_as_sf(
pedon.locations
..., coords = c('x_std', 'y_std'),
crs = sf::st_crs(4326) #WGS84 GCS
)
- View your
sf
objectpedon.locations
interactively withmapview::mapview()
, and change themap.types
argument to'Esri.WorldImagery'
. Use thepedon.locations
column namedsite_id
for thelabel
argument.
# plot an interactive map
mapview(pedon.locations,
legend = FALSE,
map.types = 'OpenStreetMap',
...)
- Create a subset
sf
data.frame with only the following “site data” columns:pedlabsampnum
,pedon_id
,taxonname
,hillslopeprof
,elev_field
,slope_field
,aspect_field
,plantassocnm
,bedrckdepth
,bedrckkind
,pmkind
,pmorigin
. Select the target columns withdplyr::select()
(or another method) by replacing the...
in the following code.
<- dplyr::select(pedon.locations, ...)
pedon.locations_sub # see also base::subset(x, select=...)
- Export the spatial information in
pedon.locations_sub
to a shape file (.shp) withsf::st_write()
# write to SHP; output CRS is geographic coordinate system WGS84
::st_write(pedon.locations_sub, "./NASIS-pedons.shp") sf
For an example of exporting data to shapefile with the sp
package, see this tutorial: Export Pedons to Shapefile with sp
.
4.5 Many Packages, Many Spatial Representations
4.5.1 The sf
package
Simple Features Access is a set of standards that specify a common storage and access model of geographic features. It is used mostly for two-dimensional geometries such as point, line, polygon, multi-point, multi-line, etc.
This is one of many ways of modeling the geometry of shapes in the real world. This model happens to be widely adopted in the R ecosystem via the sf
package, and very convenient for typical data encountered by soil survey operations.
The sf package represents the latest and greatest in spatial data processing within the comfort of an R session. It provides a “main” object class sf
to contain geometric data and associated tabular data in a familiar data.frame
format. sf
methods work on a variety of different levels of abstraction and manipulation of those geometries.
Most of the sf package functions start with the prefix st_
, such as: st_crs()
(get/set coordinate reference system), st_transform()
(project feature class to different coordinate reference system), st_bbox()
(bounding box), st_buffer()
(buffer). Many of these are “verbs” that are common GIS operations.
4.5.2 The sp
Package
The data structures (“classes”) and functions provided by the sp
package have served a foundational role in the handling of spatial data in R for years.
Many of the following examples will reference names such as SpatialPoints
, SpatialPointsDataFrame
, and SpatialPolygonsDataFrame
. These are specialized (S4) classes implemented by the sp
package.
Objects of these classes maintain linkages between all of the components of spatial data. For example, a point, line, or polygon feature will typically be associated with:
- coordinate geometry
- bounding box
- coordinate reference system
- attribute table
4.5.3 Converting sp
and sf
sp
provides access to the same compiled code libraries (PROJ, GDAL, GEOS) through sf
package.
For now the different package object types are interchangeable, and you may find yourself having to do this for a variety of reasons. You can convert between object types as needed using sf::as_Spatial()
or sf::st_as_sf()
.
Check the documentation (?functionname
) to figure out what object types different methods need as input; and check an input object’s class with class()
or inherits()
.
4.5.4 Importing / Exporting Vector Data
Import a feature class from a ESRI File Geodatabase or shape file.
If you have a .shp file, you can specify the whole path, including the file extension in the dsn
argument, or just the folder.
For a Geodatabase, you should specify the feature class using the layer
argument. Note that a trailing “/” is omitted from the dsn
(data source name) and the “.shp” suffix is omitted from the layer
.
4.5.5 Interactive mapping with mapview
and leaflet
The mapview
and leaflet
packages make it possible to display interactive maps of sf
objects in RStudio viewer pane, or within an HTML document generated via R Markdown (e.g. this document).
mapview
package- Basics
- Advanced Features
- See other “Articles” in this series, you can make complex, interactive maps using the
mapview
package.
leaflet
packageleafem
: ‘leaflet’ Extensions for ‘mapview’
4.5.6 Exercise 2: Map your favorite soil series extents
The seriesExtent
function in soilDB
returns an sf
object showing generalized extent polygons for a given soil series.
# load required packages, just in case
library(soilDB)
library(sf)
library(mapview)
# series extents from SoilWeb (sf objects)
<- seriesExtent('pentz') pentz
## Reading layer `file30a036f03c9d' from data source
## `C:\Users\stephen.roecker\AppData\Local\Temp\RtmpiiOLdE\file30a036f03c9d' using driver `GeoJSON'
## Simple feature collection with 1 feature and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -122.26 ymin: 37.19 xmax: -120.1 ymax: 40.68
## Geodetic CRS: WGS 84
<- seriesExtent('amador') amador
## Reading layer `file30a0191d1476' from data source
## `C:\Users\stephen.roecker\AppData\Local\Temp\RtmpiiOLdE\file30a0191d1476' using driver `GeoJSON'
## Simple feature collection with 1 feature and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -121.193 ymin: 37.183 xmax: -120.078 ymax: 38.588
## Geodetic CRS: WGS 84
# combine into a single object
<- rbind(pentz, amador) s
# colors used in the map
# add more colors as needed
<- c('royalblue', 'firebrick')
cols
# make a simple map, colors set by 'series' column
mapview(s, zcol = 'series', col.regions = cols, legend = TRUE)
The following code demonstrates how to fetch / convert / map soil series extents, using a vector of soil series names.
Results appear in the RStudio “Viewer” pane. Be sure to try the “Export” and “show in window” (next to the broom icon) buttons.
# load required packages, just in case
library(soilDB)
library(sf)
library(mapview)
# vector of series names, letter case does not matter
# try several (2-9)!
<- c('auberry', 'sierra', 'holland', 'cagwin')
series.names
# iterate over series names, get extent
# result is a list of sf objects
<- lapply(series.names, soilDB::seriesExtent)
s
# flatten list -> single sf object
<- do.call('rbind', s)
s
# colors used in the map
# note trick used to dynamically set the number of colors
<- RColorBrewer::brewer.pal(n = length(series.names), name = 'Set1')
cols
# make a simple map, colors set by 'series' column
# click on polygons for details
# try pop-out / export buttons
mapview(s, zcol = 'series', col.regions = cols, legend = TRUE)
Question: What do you notice about the areas where the extent polygons occur? Share your thoughts with your peers or mentor
4.5.7 The terra
Package
The terra
package package provides most of the commonly used grid and vector processing functionality that one might find in a conventional GIS. It provides high-level data structures and functions for the GDAL (Geospatial Data Abstraction Library).
- re-sampling / interpolation
- projection and warping (coordinate system transformations of gridded data)
- cropping, mosaicing, masking
- local and focal functions
- raster algebra
- contouring
- raster/vector conversions
- terrain analysis
- model-based prediction (more on this in Part 2) #### Importing / Exporting Rasters
# use an example from the terra package
<- system.file("ex", "elev.tif", package = "terra")
f
# corresponding luxembourg vector (polygon) data
<- system.file("ex", "lux.shp", package = "terra")
g
<- terra::rast(f)
r r
## class : SpatRaster
## dimensions : 90, 95, 1 (nrow, ncol, nlyr)
## resolution : 0.008333333, 0.008333333 (x, y)
## extent : 5.741667, 6.533333, 49.44167, 50.19167 (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326)
## source : elev.tif
## name : elevation
## min value : 141
## max value : 547
<- terra::vect(g)
v v
## class : SpatVector
## geometry : polygons
## dimensions : 12, 6 (geometries, attributes)
## extent : 5.74414, 6.528252, 49.44781, 50.18162 (xmin, xmax, ymin, ymax)
## source : lux.shp
## coord. ref. : lon/lat WGS 84 (EPSG:4326)
## names : ID_1 NAME_1 ID_2 NAME_2 AREA POP
## type : <num> <chr> <num> <chr> <num> <int>
## values : 1 Diekirch 1 Clervaux 312 18081
## 1 Diekirch 2 Diekirch 218 32543
## 1 Diekirch 3 Redange 259 18664
# convert r to a RasterLayer object
<- raster::raster(f)
r2
# show SpatRaster details
print(r)
## class : SpatRaster
## dimensions : 90, 95, 1 (nrow, ncol, nlyr)
## resolution : 0.008333333, 0.008333333 (x, y)
## extent : 5.741667, 6.533333, 49.44167, 50.19167 (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326)
## source : elev.tif
## name : elevation
## min value : 141
## max value : 547
# show RasterLayer details
print(r2)
## class : RasterLayer
## dimensions : 90, 95, 8550 (nrow, ncol, ncell)
## resolution : 0.008333333, 0.008333333 (x, y)
## extent : 5.741667, 6.533333, 49.44167, 50.19167 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +datum=WGS84 +no_defs
## source : elev.tif
## names : elevation
## values : 141, 547 (min, max)
# default plot method
plot(r)
lines(v)
# interactive (leaflet) plot method
<- plet(r, tiles = "OpenTopoMap")
p
lines(p, v)
The R object only stores a reference to the data until they are needed to be loaded into memory. This allows for internal raster manipulation algorithms to intelligently deal with very large grids that may not fit in memory.
4.5.7.1 Other approaches to raster data
4.5.7.1.1 raster
A more complete background on the capabilities of the raster
package, and the replacement terra
, are described in the Spatial Data Science with R online book.
Introduction to the raster package vignette
4.5.7.1.2 stars
There is also a package called stars
(Spatiotemporal Arrays: Raster and Vector Datacubes) that is the sf
-centric way of dealing with higher dimensional raster and vector “datacubes.” Data cubes have dimensions related to time, spectral band, and sensor. The stars
data structures are often used for processing satellite data sources.
4.5.8 Converting Vector to Raster
4.5.8.1 terra::rasterize()
4.5.8.2 raster::rasterize()
4.5.8.3 fasterize::fasterize()
4.6 Coordinate Reference Systems (CRS)
Spatial data aren’t all that useful without an accurate description of the Coordinate Reference System (CRS). This type of information is typically stored within the “.prj” component of a shapefile, or in the header of a GeoTIFF.
Without a CRS it is not possible to perform coordinate transformations (e.g. conversion of geographic coordinates to projected coordinates), spatial overlay (e.g. intersection), or geometric calculations (e.g. distance or area).
The “old” way (PROJ.4) of specifying coordinate reference systems is using character strings containing, for example: +proj
or +init
arguments. In general, this still “works,” so you may encounter it and need to know about it. But you also may encounter cases where CRS are specified using integers, strings of the form authority:code
, or well-known text (WKT).
Some common examples of coordinate system “EPSG” codes and their legacy “PROJ.4” strings. 4
“EPSG” stands for European Petroleum Survey Group. The “EPSG Geodetic Parameter Dataset” is a public registry of geodetic datums, spatial reference systems, Earth ellipsoids, coordinate transformations and related units of measurement.
“OGC” refers to the Open Geospatial Consortium, which is an example of another important
authority:code
. “ESRI” (company that develops ArcGIS) also defines many CRS codes.“PROJ” is the software responsible for transforming coordinates from one CRS to another. The current version of PROJ available is 9, and in PROJ > 6 major changes were made to the way that coordinate reference systems are defined and transformed led to the “PROJ.4” syntax falling out of favor.
EPSG:
4326
/ PROJ.4:+proj=longlat +datum=WGS84
- geographic, WGS84 datum (NASIS Standard)OGC:
CRS84
- geographic, WGS84 datum (same as above but explicit longitude, latitude XY order)EPSG:
4269
/ PROJ.4:+proj=longlat +datum=NAD83
- geographic, NAD83 datumEPSG:
4267
/ PROJ.4:+proj=longlat +datum=NAD27
- geographic, NAD27 datumEPSG:
26910
/ PROJ.4:+proj=utm +zone=10 +datum=NAD83
- projected (UTM zone 10), NAD83 datumEPSG:
5070
/ PROJ.4:+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23.0 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs
- Albers Equal Area CONUS (gSSURGO)More on the EPSG codes and specifics of CRS definitions:
While you may encounter PROJ.4 strings, these are no longer considered the preferred method of referencing Coordinate Reference Systems – and, in general, newer methods are “easier.”
Well-known text (WKT) is a human- machine-readable standard format for geometry, so storing the Coordinate Reference System information in a similar format makes sense. This format is returned by the sf::st_crs()
method.
For example: the WKT representation of EPSG:4326
:
st_crs(4326)
## Coordinate Reference System:
## User input: EPSG:4326
## wkt:
## GEOGCRS["WGS 84",
## ENSEMBLE["World Geodetic System 1984 ensemble",
## MEMBER["World Geodetic System 1984 (Transit)"],
## MEMBER["World Geodetic System 1984 (G730)"],
## MEMBER["World Geodetic System 1984 (G873)"],
## MEMBER["World Geodetic System 1984 (G1150)"],
## MEMBER["World Geodetic System 1984 (G1674)"],
## MEMBER["World Geodetic System 1984 (G1762)"],
## MEMBER["World Geodetic System 1984 (G2139)"],
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]],
## ENSEMBLEACCURACY[2.0]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["geodetic latitude (Lat)",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["geodetic longitude (Lon)",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## USAGE[
## SCOPE["Horizontal component of 3D system."],
## AREA["World."],
## BBOX[-90,-180,90,180]],
## ID["EPSG",4326]]
This is using the OGC WKT CRS standard. Adoption of this standard caused some significant changes in packages in the R ecosystem.
So you can get familiar, what follows are several examples of doing the same thing: setting the CRS of spatial objects with WGS84 longitude/latitude geographic coordinates. If you have another target coordinate system, it is just a matter of using the correct codes to identify it.
4.6.1 Assigning and Transforming Coordinate Systems
Returning to the example from above, lets assign a CRS to our series extent s
using different methods.
<- seriesExtent('san joaquin') s
## Reading layer `file30a0594977ff' from data source
## `C:\Users\stephen.roecker\AppData\Local\Temp\RtmpiiOLdE\file30a0594977ff' using driver `GeoJSON'
## Simple feature collection with 1 feature and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -122.82 ymin: 35.88 xmax: -118.81 ymax: 39.31
## Geodetic CRS: WGS 84
The following sections give equivalent sf
versus sp
syntax.
4.6.1.1 sf
Use st_crs<-
to set, or st_crs()
get CRS of sf
objects. Supply the target EPSG code as an integer.
# the CRS of s is EPSG:4326
st_crs(s) == st_crs(4326)
## [1] TRUE
# set CRS using st_crs<- (replace with identical value)
st_crs(s) <- st_crs(4326)
Transformation of points, lines, and polygons with sf
requires an “origin” CRS be defined in the argument x
. The “target” CRS is defined as an integer (EPSG code) in the crs
argument or is the output of st_crs()
.
# transform to UTM zone 10
<- st_transform(x = s, crs = 26910)
s.utm
# transform to GCS NAD27
<- st_transform(x = s, crs = st_crs(4267)) s.nad27
4.6.1.2 sp
You can do the same thing several different ways with sp
objects. An equivalent EPSG, OGC and PROJ.4 can be set or get using proj4string<-
/proj4string
and either a sp
CRS
object or a PROJ.4 string for Spatial
objects.
# s is an sf object (we converted it), convert back to Spatial* object
<- sf::as_Spatial(s)
s.sp
# these all create the same internal sp::CRS object
proj4string(s.sp) <- sp::CRS('EPSG:4326') # proj >6; EPSG
proj4string(s.sp) <- sp::CRS('OGC:CRS84') # proj >6; OGC
proj4string(s.sp) <- '+init=epsg:4326' # proj4 style +init string
proj4string(s.sp) <- '+proj=longlat +datum=WGS84' # proj4 style +proj string
Here, we do the same transformations we did above only using sp
: spTransform()
.
# transform to UTM zone 10
<- spTransform(s.sp, CRS('+proj=utm +zone=10 +datum=NAD83'))
s.utm
# transform to GCS NAD27
<- spTransform(s.sp, CRS('+proj=longlat +datum=NAD27')) s.nad27
4.6.1.3 terra
and raster
To assign or get the coordinate reference system for raster
, terra
or sp
CRS
objects use the crs()
functions.
<- terra::rast(system.file("ex", "elev.tif", package="terra"))
r
# inspect CRS
::crs(r)
terra
# r is a SpatRaster object; set CRS to current CRS
::crs(r) <- terra::crs("OGC:CRS84") terra
“Transforming” or “warping” a raster is a different from with a vector as it requires interpolation of pixels to a target resolution and CRS.
The method provided by terra
is project()
and in raster
it is projectRaster()
.
It works the same as the above “transform” methods in that you specify an object to transform, and the target reference system or a template for the object.
<- terra::project(r, terra::crs("+proj=igh"))
t.wgs84 <- raster::projectRaster(raster::raster(r), crs = CRS("+proj=igh")) r.wgs84
Note that the default warping of raster uses bilinear interpolation (method='bilinear'
), which is appropriate for continuous variables.
You also have the option of using nearest-neighbor (method='ngb'
) for categorical variables (class maps) where interpolation would not make sense.
If we want to save this transformed raster to file, we can use something like this for terra
::writeRaster(t.wgs84, filename='s_wgs84.tif', gdal=c("COMPRESS=LZW")) terra
Similarly for raster
:
::writeRaster(r.wgs84, filename='s_wgs84.tif', gdal=c("COMPRESS=LZW")) terra
4.7 Load Required Packages
Load required packages into a fresh RStudio Session (Ctrl + Shift + F10)
library(aqp)
library(soilDB)
library(sf)
library(terra)
4.8 Download Example Data
Run the following to create a path for the example data. Be sure to set a valid path to a local disk.
# store path as a variable, in case you want to keep it somewhere else
<- 'C:/workspace2/chapter-4'
ch4.data.path
# make a place to store chapter 2b example data
dir.create(ch4.data.path, recursive = TRUE)
# download polygon example data from github
download.file(
'https://github.com/ncss-tech/stats_for_soil_survey/raw/master/data/chapter_4-spatial-data/chapter-4-mu-polygons.zip',
file.path(ch4.data.path, 'chapter-4-mu-polygons.zip')
)
# download raster example data from github
download.file(
'https://github.com/ncss-tech/stats_for_soil_survey/raw/master/data/chapter_4-spatial-data/chapter-4-PRISM.zip',
file.path(ch4.data.path, 'chapter-4-PRISM.zip')
)
# unzip
unzip(
file.path(ch4.data.path, 'chapter-4-mu-polygons.zip'),
exdir = ch4.data.path, overwrite = TRUE
)
unzip(
file.path(ch4.data.path, 'chapter-4-PRISM.zip'),
exdir = ch4.data.path, overwrite = TRUE
)
4.9 Load Example MLRA Data
We will be using polygons associated with MLRA 15 and 18 as part of this demonstration.
Import these data with sf::st_read()
.
# load MLRA polygons
<- sf::st_read(file.path(ch4.data.path, 'mlra-18-15-AEA.shp'))
mlra
## alternately, use your own MLRA
# mlra <- soilDB::fetchSDA_spatial(c("15", "18"), by.col="MLRARSYM", geom.src = "MLRAPOLYGON") |> sf::st_transform("EPSG:5070")
We will load the sample MLRA 15 and 18 (California) raster data (PRISM derived) using terra::rast()
. If using your own MLRA, you will need to update file paths to use your own rasters.
# mean annual air temperature, Deg C
<- terra::rast(file.path(ch4.data.path, 'MAAT.tif'))
maat
# mean annual precipitation, mm
<- terra::rast(file.path(ch4.data.path, 'MAP.tif'))
map
# frost-free days
<- terra::rast(file.path(ch4.data.path, 'FFD.tif'))
ffd
# growing degree days
<- terra::rast(file.path(ch4.data.path, 'GDD.tif'))
gdd
# percent of annual PPT as rain
<- terra::rast(file.path(ch4.data.path, 'rain_fraction.tif'))
rain_fraction
# annual sum of monthly PPT - ET_p
<- terra::rast(file.path(ch4.data.path, 'effective_precipitation.tif')) ppt_eff
Sometimes it is convenient to “stack” raster data that share a common grid size, extent, and coordinate reference system into a multilayer terra
SpatRaster
object. Calling terra::rast()
on a list
of SpatRaster
is equivalent to making a RasterStack
from several RasterLayer
with raster::stack()
.
# create a raster stack (multiple rasters aligned)
<- terra::rast(list(maat, map, ffd, gdd, rain_fraction, ppt_eff))
rs
# inspect
rs
## class : SpatRaster
## dimensions : 762, 616, 6 (nrow, ncol, nlyr)
## resolution : 0.008333333, 0.008333333 (x, y)
## extent : -123.2708, -118.1375, 34.44583, 40.79583 (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat NAD83 (EPSG:4269)
## sources : MAAT.tif
## MAP.tif
## FFD.tif
## ... and 3 more source(s)
## names : MAAT, MAP, FFD, GDD, rain_~ction, effec~ation
## min values : -4.073542, 114, 35, 76, 12, -825.5897
## max values : 18.676420, 2958, 365, 3173, 100, 2782.3914
plot(rs)
4.10 Raster data
4.10.1 Object Properties
SpatRaster
and RasterLayer
objects are similar to sf
, sp
and other R spatial objects in that they keep track of the linkages between data, coordinate reference system, and optional attribute tables. Getting and setting the contents of raster objects should be performed using functions such as:
terra::NAflag(r)
/raster::NAvalue(r)
: get / set the NODATA valueterra::crs(r)
/raster::wkt(r)
: get / set the coordinate reference systemterra::res(r)
/raster::res(r)
: get / set the resolutionterra::ext(r)
/raster::extent(r)
: get / set the extentterra::datatype(r)
/raster::dataType(r)
: get / set the data type- … many more, see the
raster
andterra
package manuals
4.10.2 Rasters “In Memory” v.s. “File-Based”
Processing of raster data in memory is always faster than processing on disk, as long as there is sufficient memory. The terra
package handles basically all of the logic delegating in v.s. out of memory processing internally–so it is rare that any adjustments to defaults are required.
With the raster
package, the initial file/disk-based reference can be converted to an in-memory RasterLayer
with the readAll()
function. You can achieve a similar effect in terra
by doing set.values(object)
.
4.10.3 Writing Rasters to File
Exporting data requires consideration of the output format, datatype, encoding of NODATA, and other options such as compression.
With terra, “LZW” compression is used by default when writing GeoTIFF files. Using the gdal
argument e.g.: terra::writeRaster(..., gdal=)
is equivalent to specifying option
argument to raster::writeRaster()
.
# using previous example data set
::writeRaster(t.wgs84, filename = 't.wgs84.tif') terra
For example, a RasterLayer
object that you wanted to save to disk as an internally-compressed GeoTIFF:
# using previous example data set
::writeRaster(r.wgs84, filename = 'r.tif', options = c("COMPRESS=LZW")) raster
4.10.4 Data Types
Commonly used raster datatype
include: “unsigned integer”, “signed integer”, and “floating point” of variable precision.
INT1U
: integers from 0 to 255INT2U
: integers from 0 to 65,534INT2S
: integers from -32,767 to 32,767INT4S
: integers from -2,147,483,647 to 2,147,483,647FLT4S
: floating point from -3.4e+38 to 3.4e+38FLT8S
: floating point from -1.7e+308 to 1.7e+308
It is wise to manually specify an output datatype
that will “just fit” the required precision.
For example, if you have generated a RasterLayer
that warrants integer precision and ranges from 0 to 100, then the INT1U
data type would provide enough precision to store all possible values and the NODATA value. Raster data stored as integers will always be smaller (sometimes 10-100x) than those stored as floating point, especially when internal compression is enabled.
# integer grid with a range of 0-100
# maybe soil texture classes
::writeRaster(r, filename = 'r.tif', datatype = 'INT1U')
raster
# floating point grid with very wide range
# maybe DSM soil property model output
::writeRaster(t.wgs84, filename = 'r.tif', datatype = 'FLT4S') terra
4.10.4.1 Notes on Compression
It is often a good idea to create internally-compressed raster data.
The GeoTiff format can accommodate many different compression algorithms, including lossy (JPEG) compression. Usually, the default “LZW” or “DEFLATE” compression will result in significant savings, especially for data encoded as integers.
For example, the CONUS gSSURGO map unit key grid at 30m resolution is about 55Gb (GeoTiff, no compression) vs. 2.4Gb after LZW compression.
# reasonable compression using LZW is the default, compare to
::writeRaster(r, filename='r.tif', options=c("COMPRESS=NONE"))
raster
# takes longer to write the file, but better compression
::writeRaster(t.wgs84, filename='r.tif', gdal=c("COMPRESS=DEFLATE", "PREDICTOR=2", "ZLEVEL=9") terra
See this article for some ideas on optimization of file read/write times and associated compressed file sizes.
4.11 Vector Data
4.11.1 sf
<- sf::st_as_sf(data.frame(x = -120, y = 37.5),
p coords = c("x", "y"),
crs = 4326)
<- st_transform(p, "EPSG:5070") p.aea
In sf
the functions used to do this are st_intersects()
or st_intersection()
.
st_intersects(p.aea, mlra)
## Sparse geometry binary predicate list of length 1, where the predicate was `intersects'
## 1: 2
st_intersection(p.aea, mlra)
## Simple feature collection with 1 feature and 5 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -2079434 ymin: 1870764 xmax: -2079434 ymax: 1870764
## Projected CRS: NAD83 / Conus Albers
## MLRARSYM MLRA_ID MLRA_NAME LRRSYM
## 1 18 23 Sierra Nevada Foothills C
## LRR_NAME geometry
## 1 California Subtropical Fruit, Truck, and Specialty Crop Region POINT (-2079434 1870764)
4.11.2 terra
<- terra::vect(data.frame(x = -120, y = 37.5),
p geom = c("x", "y"),
crs = "EPSG:4326")
<- project(p, "EPSG:5070") p.aea
In terra
the functions used to determine the intersection is relate()
.
relate(vect(mlra), p.aea, relation = "intersects"),
mlra[ ]
## Simple feature collection with 1 feature and 5 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -2181926 ymin: 1548989 xmax: -1970476 ymax: 2264711
## Projected CRS: Albers
## MLRARSYM MLRA_ID MLRA_NAME LRRSYM
## 2 18 23 Sierra Nevada Foothills C
## LRR_NAME geometry
## 2 California Subtropical Fruit, Truck, and Specialty Crop Region POLYGON ((-2160599 2264711,...
4.11.3 sp
In sp
objects, you do these operations with the sp::over()
function. Access the associated vignette by pasting vignette("over")
in the console when the sp
package is loaded.
# hand make a SpatialPoints object
# note that this is GCS
<- SpatialPoints(coords = cbind(-120, 37.5),
p proj4string = CRS('+proj=longlat +datum=WGS84'))
<- sf::as_Spatial(mlra)
mlra.sp # spatial extraction of MLRA data requires a CRS transformation
<- spTransform(p, proj4string(mlra.sp))
p.aea over(p.aea, mlra.sp)
4.12 Spatial Operations
Spatial data are lot more useful when “related” (overlay, intersect, spatial query, etc.) to generate something new. The CRS of the two objects being overlaid must match.
4.12.1 Working with Vector and Raster Data
Typically, spatial queries of raster data by geometry features (point, line, polygon) are performed in two ways:
For each geometry, collect all pixels that overlap (
exactextractr
approach)For each geometry, collect a sample of pixels defined by sampling points
The first method ensures that all data are included in the analysis, however, processing can be slow for multiple/detailed rasters, and the results may not fit into memory.
The second method is more efficient (10-100x faster), requires less memory, and can remain statistically sound–as long as a reasonable sampling strategy is applied. Sampling may also help you avoid low-acreage “anomalies” in the raster product. More on sampling methods in the next chapter.
The extract()
function can perform several operations in one call, such as buffering (in projected units) with buffer
argument. See the manual page for an extensive listing of optional arguments and what they do.
Sampling and extraction with terra
the results in a SpatVector
object. Sampling and extraction with raster
methods results in a matrix
object.
# sampling single layer SpatRaster
::spatSample(maat, size = 10) terra
## MAAT
## 1 NA
## 2 9.491992
## 3 17.133940
## 4 10.949153
## 5 11.780624
## 6 15.016464
## 7 NA
## 8 NA
## 9 9.910385
## 10 NA
# sampling SpatRaster
::spatSample(rs, size = 10) terra
## MAAT MAP FFD GDD rain_fraction effective_precipitation
## 1 18.506054 181 335 3134 100 -781.39319
## 2 14.970438 480 339 2282 100 -259.15225
## 3 11.394429 144 159 1965 95 -535.72083
## 4 NA NA NA NA NA NA
## 5 17.689205 197 316 2952 99 -737.10382
## 6 16.204180 514 294 2590 99 -315.05930
## 7 8.346684 572 106 1532 88 -12.34948
## 8 14.082307 605 338 1956 100 -80.17986
## 9 17.980888 188 316 3026 100 -749.85864
## 10 15.065522 706 246 2415 97 -91.40856
par(mfcol = c(1, 2), mar = c(1, 1, 3, 1))
# regular sampling + extraction of raster values
<- terra::spatSample(
x.regular
maat,method = "regular",
size = 100,
as.points = TRUE
) x.regular
## class : SpatVector
## geometry : points
## dimensions : 96, 1 (geometries, attributes)
## extent : -123.2667, -118.1417, 34.64167, 40.6 (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat NAD83 (EPSG:4269)
## names : MAAT
## type : <num>
## values : NA
## 16.41
## 11
# see also raster::sampleRegular()
plot(maat,
axes = FALSE,
legend = FALSE,
main = 'Regular Sampling')
points(x.regular)
# random sample + extraction of raster values
# note that NULL values are removed
<- terra::spatSample(
x.random
maat,size = 100,
as.points = TRUE,
na.rm = TRUE
)
# see also raster::sampleRandom()
plot(maat,
axes = FALSE,
legend = FALSE,
main = 'Random Sampling with NA Removal')
points(x.random)
Note that the mean can be efficiently estimated, even with a relatively small number of samples.
# all values: slow for large grids
mean(terra::values(maat), na.rm = TRUE)
# regular sampling: efficient, central tendency comparable to above
mean(x.regular$MAAT, na.rm = TRUE)
# this value will be pseudorandom
# depends on number of samples, pattern of NA
mean(x.random$MAAT, na.rm = TRUE)
Just how much variation can we expect when collecting 100, randomly-located samples over such a large area?
# 10 replications of samples of n=100
<- replicate(10, {
z mean(terra::spatSample(maat,
size = 100,
na.rm = TRUE)$MAAT,
na.rm = TRUE)
})
# 90% of the time the mean MAAT values were within:
quantile(z, probs = c(0.05, 0.95))
Do the above routine 100 times: compute the mean MAAT from 100 randomly-located samples. Does it make a difference in your estimates?
# MLRA polygons in native coordinate system
plot(sf::st_geometry(mlra), main = 'MLRA 15 and 18')
box()
# MAAT raster
plot(maat, main = 'PRISM Mean Annual Air Temperature (deg C)')
# plot MAAT raster with MLRA polygons on top
# this requires transforming to CRS of MAAT
<- sf::st_transform(mlra, sf::st_crs(maat))
mlra.gcs plot(maat, main = 'PRISM Mean Annual Air Temperature (deg C)')
plot(sf::st_geometry(mlra.gcs), main = 'MLRA 15 and 18', add = TRUE)
4.12.2 Exercise 3: Extracting Raster Data
4.12.2.1 Raster Summary By Point: NASIS Pedon Locations
Extract PRISM data at the coordinates associated with NASIS pedons that have been correlated to the Loafercreek series.
We will use the sample dataset loafercreek
from the soilDB
package to get NASIS data. This example can be easily adapted to your own pedon data extracted from NASIS using fetchNASIS()
, but if your points are not in California, you will need to supply your own raster data.
Get some NASIS data and upgrade the “site” data to a sf
object.
data("loafercreek", package="soilDB")
# result is a SoilProfileCollection object
<- loafercreek
pedons
## alternately, use fetchNASIS()
# pedons <- fetchNASIS()
# extract site data
<- sf::st_as_sf(aqp::site(pedons),
s coords = c("x_std", "y_std"),
crs = 4326,
na.fail = FALSE)
Extract PRISM data (the SpatRaster
object we made earlier) at the Loafercreek pedon locations and summarize.
# convert sf object s to terra SpatVector
# and project to CRS of the raster
<- project(terra::vect(s), rs)
s2
# pass to terra::extract()
<- terra::extract(rs, s2, df = TRUE)
e
# summarize: remove first (ID) column using [, -1] j index
summary(e[, -1])
## MAAT MAP FFD GDD rain_fraction effective_precipitation
## Min. :13.15 Min. : 432.0 Min. :189.0 Min. :2085 Min. :96.00 Min. :-433.14
## 1st Qu.:15.59 1st Qu.: 576.0 1st Qu.:261.2 1st Qu.:2479 1st Qu.:99.00 1st Qu.:-263.46
## Median :15.99 Median : 682.5 Median :285.0 Median :2540 Median :99.00 Median :-152.00
## Mean :15.82 Mean : 680.4 Mean :281.0 Mean :2515 Mean :98.81 Mean :-146.05
## 3rd Qu.:16.24 3rd Qu.: 771.0 3rd Qu.:307.8 3rd Qu.:2592 3rd Qu.:99.00 3rd Qu.: -36.87
## Max. :16.58 Max. :1049.0 Max. :330.0 Max. :2654 Max. :99.00 Max. : 201.61
Join the extracted PRISM data with the original SoilProfileCollection
object.
# combine site data (sf) with extracted raster values (data.frame), row-order is identical, result is sf
<- cbind(s, e)
res
# extract unique IDs and PRISM data
# dplyr verbs work with sf data.frames
<- dplyr::select(res, pedon_id, MAAT, MAP, FFD, GDD, rain_fraction, effective_precipitation)
res2
# join with original SoilProfileCollection object via pedon_key
site(pedons) <- res2
The extracted values are now part of the “pedons” SoilProfileCollection
object via site(<SoilProfileCollection>) <- data.frame
LEFT JOIN method.
Let’s summarize the data we extracted using quantiles.
# define some custom functions for calculating range observed in site data
<- function(x) quantile(x, probs = 0.05, na.rm = TRUE)
my_low_function <- function(x) median(x, na.rm = TRUE)
my_rv_function <- function(x) quantile(x, probs = 0.95, na.rm = TRUE)
my_high_function
site(pedons) |>
::select(pedon_id, MAAT, MAP, FFD, GDD,
dplyr|>
rain_fraction, effective_precipitation) ::summarize(dplyr::across(
dplyr:effective_precipitation,
MAATlist(low = my_low_function,
rv = my_rv_function,
high = my_high_function)
))
## MAAT_low MAAT_rv MAAT_high MAP_low MAP_rv MAP_high FFD_low FFD_rv FFD_high GDD_low GDD_rv GDD_high
## 1 14.33665 15.98908 16.51595 479.5 682.5 904 220 285 320 2274.75 2540.5 2638.75
## rain_fraction_low rain_fraction_rv rain_fraction_high effective_precipitation_low
## 1 97.25 99 99 -369.3428
## effective_precipitation_rv effective_precipitation_high
## 1 -151.9985 94.25339
4.12.2.2 Raster Summary By Polygon: Series Extent
The seriesExtent()
function from the soilDB
package provides a simple interface to Series Extent Explorer data files.
Note that these series extents have been generalized for rapid display at regional to continental scales. A more precise representation of “series extent” can be generated from SSURGO polygons and queried from SDA.
Get an approximate extent for the Loafercreek soil series from SEE. See the seriesExtent
tutorial and manual page for additional options and related functions.
# get (generalized) amador soil series extent from SoilWeb
<- soilDB::seriesExtent(s = 'loafercreek') x
## Reading layer `file30a02ec42cdb' from data source
## `C:\Users\stephen.roecker\AppData\Local\Temp\RtmpiiOLdE\file30a02ec42cdb' using driver `GeoJSON'
## Simple feature collection with 1 feature and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -121.55 ymin: 37.18 xmax: -119.9 ymax: 39.66
## Geodetic CRS: WGS 84
# convert to EPSG:5070 Albers Equal Area
<- sf::st_transform(x, 5070) x
Generate 100 sampling points within the extent using a hexagonal grid. These point locations will be used to extract raster values from our SpatRaster
of PRISM data. Note that using a “hexagonal” grid is not supported on geographic coordinates.
<- sf::st_sample(x, size = 100, type = 'hexagonal') samples
For comparison, extract a single point from each SSURGO map unit delineation that contains Loafercreek as a major component. This will require a query to SDA for the set of matching map unit keys (mukey
), followed by a second request to SDA for the geometry.
The SDA_query
function is used to send arbitrary queries written in SQL to SDA, the results may be a data.frame
or list
, depending on the complexity of the query. The fetchSDA_spatial
function returns map unit geometry as either polygons, polygon envelopes, or a single point within each polygon as selected by mukey
or nationalmusym
.
# result is a data.frame
<- soilDB::SDA_query("SELECT DISTINCT mukey FROM component
mukeys WHERE compname = 'Loafercreek' AND majcompflag = 'Yes';")
# result is a sf data.frame
<- soilDB::fetchSDA_spatial(
loafercreek.pts $mukey,
mukeysby.col = 'mukey',
method = 'point',
chunk.size = 35
)
Graphically check both methods:
# prepare samples and mapunit points for viewing on PRISM data
<- sf::st_transform(samples, sf::st_crs(maat))
hexagonal <- sf::st_transform(x, sf::st_crs(maat))
x_gcs <- terra::crop(maat, x_gcs) maatcrop
# adjust margins and setup plot device for two columns
par(mar = c(1, 1, 3, 1), mfcol = c(1, 2))
# first figure
plot(maatcrop,
main = 'PRISM MAAT\n100 Sampling Points from Extent',
axes = FALSE)
plot(sf::st_geometry(x_gcs), add = TRUE)
plot(hexagonal, cex = 0.25, add = T)
plot(maatcrop,
main = 'PRISM MAAT\n"Loafercreek" Polygon Centroids',
axes = FALSE)
plot(loafercreek.pts, cex = 0.25, add = TRUE)
Extract PRISM data (the SpatRaster
object we made earlier) at the sampling locations (100 regularly-spaced and from MU polygon centroids) and summarize. Note that CRS transformations are automatic (when possible), with a warning.
# return the result as a data.frame object
<- terra::extract(rs, terra::vect(hexagonal), df = TRUE)
e <- terra::extract(rs, terra::vect(loafercreek.pts), df = TRUE)
e.pts
# check out the extracted data
summary(e[,-1])
## MAAT MAP FFD GDD rain_fraction effective_precipitation
## Min. :13.48 Min. : 336.0 Min. :207.0 Min. :2121 Min. :94.00 Min. :-560.16
## 1st Qu.:15.87 1st Qu.: 527.0 1st Qu.:278.5 1st Qu.:2523 1st Qu.:99.00 1st Qu.:-326.77
## Median :16.22 Median : 648.0 Median :303.0 Median :2575 Median :99.00 Median :-176.97
## Mean :16.03 Mean : 668.1 Mean :293.0 Mean :2549 Mean :98.72 Mean :-169.22
## 3rd Qu.:16.53 3rd Qu.: 783.5 3rd Qu.:318.0 3rd Qu.:2630 3rd Qu.:99.00 3rd Qu.: -62.05
## Max. :16.98 Max. :1223.0 Max. :340.0 Max. :2740 Max. :99.00 Max. : 354.08
# all pair-wise correlations
::kable(cor(e[,-1]), digits = 2) knitr
MAAT | MAP | FFD | GDD | rain_fraction | effective_precipitation | |
---|---|---|---|---|---|---|
MAAT | 1.00 | -0.48 | 0.96 | 0.99 | 0.83 | -0.61 |
MAP | -0.48 | 1.00 | -0.43 | -0.53 | -0.36 | 0.99 |
FFD | 0.96 | -0.43 | 1.00 | 0.94 | 0.71 | -0.55 |
GDD | 0.99 | -0.53 | 0.94 | 1.00 | 0.83 | -0.66 |
rain_fraction | 0.83 | -0.36 | 0.71 | 0.83 | 1.00 | -0.47 |
effective_precipitation | -0.61 | 0.99 | -0.55 | -0.66 | -0.47 | 1.00 |
Quickly compare the two sets of samples.
# compile results into a list
<- list('regular samples' = e$MAAT,
maat.comparison 'polygon centroids' = e.pts$MAAT)
# number of samples per method
lapply(maat.comparison, length)
## $`regular samples`
## [1] 103
##
## $`polygon centroids`
## [1] 2336
# summary() applied by group
lapply(maat.comparison, summary)
## $`regular samples`
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 13.48 15.87 16.22 16.03 16.53 16.98
##
## $`polygon centroids`
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 12.70 15.82 16.19 16.02 16.44 17.41
# box-whisker plot
par(mar = c(4.5, 8, 3, 1), mfcol = c(1, 1))
boxplot(
maat.comparison,horizontal = TRUE,
las = 1,
xlab = 'MAAT (deg C)',
varwidth = TRUE,
boxwex = 0.5,
main = 'MAAT Comparison'
)
Basic climate summaries from a standardized source (e.g. PRISM) might be a useful addition to an OSD, or checking the ranges reported in mapunits.
4.12.2.3 Raster Summary By Polygon: MLRA
The following example is a simplified version of what is available in the soilReports
package, reports on the ncss-tech GitHub repository.
Efficient summary of large raster data sources can be accomplished using:
- internally-compressed raster data sources, stored on a local disk, can be in any coordinate system
- polygons stored in an equal-area or UTM coordinate system, with CRS units of meters
- fixed-density sampling of polygons
- estimation of quantiles from collected raster samples
Back to our example data. The first step is to check the MLRA polygons (mlra
); how many features per MLRA symbol? Note that some MLRA have more than one polygon.
table(mlra$MLRARSYM)
Convert polygon area from square meters to acres and summarize. Note that this will only make sense when using a projected CRS with units of meters (equal area)!
<- terra::expanse(terra::vect(mlra)) / 4046.86
poly.area
::sf_use_s2(TRUE)
sf<- units::set_units(x = sf::st_area(mlra), value = "acre")
poly.area.s2
::sf_use_s2(FALSE)
sf<- units::set_units(x = sf::st_area(mlra), value = "acre")
poly.area.sf
summary(poly.area)
sum(poly.area)
sum(poly.area.s2)
sum(poly.area.sf)
Sample each polygon at a constant sampling density of 0.001
samples per acre (1 sample for every 1,000 ac.). At this sampling density we should expect approximately 16,700 samples–more than enough for our simple example.
library(sharpshootR)
# the next function requires a polygon ID:
# each polygon gets a unique number 1--number of polygons
$pID <- 1:nrow(mlra)
mlra<- constantDensitySampling(mlra, n.pts.per.ac = 0.001) cds
Extract MLRA symbol at sample points using the sf::st_intersection()
function. The result will be a sf
object with attributes from our MLRA polygons which intersect the sampling points (cds
).
# spatial overlay: sampling points and MLRA polygons
<- sf::st_intersection(sf::st_transform(sf::st_as_sf(cds), sf::st_crs(mlra)), mlra)
res
# row / feature order is preserved, so we can directly copy
$mlra <- res$MLRARSYM
cds
# tabulate number of samples per MLRA
table(cds$mlra)
##
## 15 18
## 11620 5137
Extract values from the SpatVector
of PRISM data as a data.frame
.
<- terra::extract(rs, terra::project(cds, terra::crs(rs)))
e
# join columns from extracted values and sampling points
<- cbind(as(cds, 'data.frame'), e)
s.df
# check results
head(s.df)
## MLRARSYM MLRA_ID MLRA_NAME LRRSYM
## 1 15 20 Central California Coast Range C
## 2 15 20 Central California Coast Range C
## 3 15 20 Central California Coast Range C
## 4 15 20 Central California Coast Range C
## 5 15 20 Central California Coast Range C
## 6 15 20 Central California Coast Range C
## LRR_NAME pID mlra ID MAAT MAP FFD GDD
## 1 California Subtropical Fruit, Truck, and Specialty Crop Region 1 15 1 15.19286 1149 306 2303
## 2 California Subtropical Fruit, Truck, and Specialty Crop Region 1 15 2 15.33926 1049 307 2369
## 3 California Subtropical Fruit, Truck, and Specialty Crop Region 1 15 3 15.42254 1041 313 2381
## 4 California Subtropical Fruit, Truck, and Specialty Crop Region 1 15 4 15.44636 1087 308 2382
## 5 California Subtropical Fruit, Truck, and Specialty Crop Region 1 15 5 15.39205 1116 316 2349
## 6 California Subtropical Fruit, Truck, and Specialty Crop Region 1 15 6 15.43280 1058 313 2387
## rain_fraction effective_precipitation
## 1 99 385.6023
## 2 99 252.4252
## 3 99 242.8284
## 4 99 283.1933
## 5 99 314.3419
## 6 99 258.3234
Summarizing multivariate data by group (MLRA) is usually much simpler after reshaping data from “wide” to “long” format.
# reshape from wide to long format
<- tidyr::pivot_longer(s.df, cols = c(MAAT, MAP, FFD, GDD, rain_fraction, effective_precipitation))
m
# check "wide" format
head(m)
## # A tibble: 6 × 10
## MLRARSYM MLRA_ID MLRA_NAME LRRSYM LRR_NAME pID mlra ID name value
## <chr> <int> <chr> <chr> <chr> <int> <chr> <dbl> <chr> <dbl>
## 1 15 20 Central California Coast Range C California Subtropica… 1 15 1 MAAT 15.2
## 2 15 20 Central California Coast Range C California Subtropica… 1 15 1 MAP 1149
## 3 15 20 Central California Coast Range C California Subtropica… 1 15 1 FFD 306
## 4 15 20 Central California Coast Range C California Subtropica… 1 15 1 GDD 2303
## 5 15 20 Central California Coast Range C California Subtropica… 1 15 1 rain… 99
## 6 15 20 Central California Coast Range C California Subtropica… 1 15 1 effe… 386.
A tabular summary of means by MLRA and PRISM variable using dplyr
v.s. base
tapply()
.
# tabular summary of mean values
::group_by(m, mlra, name) %>%
dplyr::summarize(mean(value)) %>%
dplyr::arrange(name) dplyr
## # A tibble: 12 × 3
## # Groups: mlra [2]
## mlra name `mean(value)`
## <chr> <chr> <dbl>
## 1 15 FFD 284.
## 2 18 FFD 273.
## 3 15 GDD 2387.
## 4 18 GDD 2496.
## 5 15 MAAT 15.2
## 6 18 MAAT 15.7
## 7 15 MAP 588.
## 8 18 MAP 631.
## 9 15 effective_precipitation -197.
## 10 18 effective_precipitation -193.
## 11 15 rain_fraction 98.6
## 12 18 rain_fraction 97.2
# base R
tapply(m$value, list(m$mlra, m$name), mean, na.rm = TRUE)
## effective_precipitation FFD GDD MAAT MAP rain_fraction
## 15 -196.8961 284.3748 2386.711 15.24977 587.8348 98.60990
## 18 -192.9192 273.1376 2496.125 15.66251 631.3798 97.21803
4.12.3 Example: Faster with exactextractr
This example shows how to determine the distribution of Frost-Free Days across a soil series extent.
The data are extracted from the raster data source very rapidly using the exactextractr
package.
library(sf)
library(soilDB)
library(terra)
library(lattice)
library(exactextractr)
# 5-10 seconds to download Series Extent Explorer data
<- c('holland', 'san joaquin')
series
# make SpatialPolygonsDataFrame
<- do.call('rbind', lapply(series, seriesExtent))
s
# load pointer to PRISM data
<- rast('C:/workspace2/chapter-4/FFD.tif')
r
# transform extent to CRS of raster with sf
<- st_transform(st_as_sf(s), crs = st_crs(r))
s
# inspect
s
# use `st_union(s)` to create a MULTI- POINT/LINE/POLYGON from single
# use `sf::st_cast(s, 'POLYGON')` to create other types
system.time({ ex <- exactextractr::exact_extract(r, s) })
# ex is a list(), with data.frame [value, coverage_fraction]
# for each polygon in s (we have one MULTIPOLYGON per series)
# combine all list elements `ex` into single data.frame `ex.all`
# - use do.call('rbind', ...) to stack data.frames row-wise
# - an anonymous function that iterates along length of `ex`
# - adding the series name to as a new variable, calculated using `i`
<- do.call('rbind', lapply(seq_along(ex), function(i) {
ex.all cbind(data.frame(group = series[i]), ex[[i]])
}))
# simple summary
densityplot(~ value | group, data = ex.all,
plot.points = FALSE, bw = 2, lwd = 2,
strip = strip.custom(bg = grey(0.85)),
scales = list(alternating = 1),
col = c('RoyalBlue'), layout = c(1, 2),
ylab = 'Density', from = 0, to = 400,
xlab = 'Frost-Free Days (50% chance)\n800m PRISM Data (1981-2010)',
main = 'FFD Estimate for Extent of San Joaquin and Holland Series'
)
4.12.4 Example: Summarizing MLRA Raster Data with lattice
graphics
Lattice graphics are useful for summarizing grouped comparisons.
The syntax is difficult to learn and remember, but there is a lot of documentation online.
library(lattice)
<- list(
tps box.rectangle = list(col = 'black'),
box.umbrella = list(col = 'black', lty = 1),
box.dot = list(cex = 0.75),
plot.symbol = list(
col = rgb(0.1, 0.1, 0.1, alpha = 0.25, maxColorValue = 1),
cex = 0.25
)
)
bwplot(mlra ~ value | name, data = m, # setup plot and data source
as.table=TRUE, # start panels in top/left corner
varwidth=TRUE, # scale width of box by number of obs
scales=list(alternating=3, relation='free'), # setup scales
strip=strip.custom(bg=grey(0.9)), # styling for strips
par.settings=tps, # apply box/line/point styling
panel=function(...) { # within in panel, do the following
panel.grid(-1, -1) # make grid lines at all tick marks
panel.bwplot(...) # make box-whisker plot
} )
4.13 Additional Reading (Spatial)
Ahmed, Zia. 2020. Geospatial Data Science with R.
Gimond, M., 2019. Intro to GIS and Spatial Analysis https://mgimond.github.io/Spatial/
Hijmans, R.J. 2019. Spatial Data Science with R. https://rspatial.org/
Lovelace, R., J. Nowosad, and J. Muenchow, 2019. Geocomputation with R. CRC Press. https://bookdown.org/robinlovelace/geocompr/
Pebesma, E., and R.S. Bivand. 2005. Classes and methods for spatial data: The sp package. https://cran.r-project.org/web/packages/sp/vignettes/intro_sp.pdf.
Pebesma, E. and R. Bivand, 2019. Spatial Data Science. https://keen-swartz-3146c4.netlify.com/