Using the concepts (and code) from the Numerical Taxonomy and Ordination chapter, compute pair-wise distances between soil series concepts from a suite of 8 physical and climatic variables. These 8 variables (characteristics) will be used to develop a climate signature for each soil series (individual). Present and interpret the results using both a dendrogram and ordination.
Key steps:
fetchOSD
sammon
or
metaMDS
Annual and monthly climate summaries have been estimated from the SSR2 standard stack of 1981–2010 PRISM data. Weighted percentiles were estimated from a single sample from within each map unit delineations, grouped by component name.
We will be using the annual climate summaries. See this document
for more information on how summaries like these have been developed and
how to access them via fetchOSD
.
First, you will need to load some packages. If you have completed the pre-class assignment then all of these packages should be in place.
library(soilDB)
library(sharpshootR)
library(latticeExtra)
library(reshape2)
library(RColorBrewer)
library(cluster)
library(ape)
library(vegan)
library(MASS)
The fetchOSD
function from the soilDB
package is a simple interface data that have been extracted from the
text OSDs and summarized from the current SSURGO snapshot. The
extended=TRUE
argument is used to access the climate data
we will be using in this assignment.
soils <- c('Ava', 'Drummer', 'Cisne', 'Pierre', 'Cecil', 'Appling', 'San Joaquin', 'Redding', 'Corning')
s <- fetchOSD(soils, extended = TRUE)
Select percentiles are provided for 7 annual climate summaries and
elevation. Note that “q50” is the median value and “n” is the number of
samples (i.e. number of delineations) used to estimate the percentiles.
series | climate_var | minimum | q01 | q05 | q25 | q50 | q75 | q95 | q99 | maximum | n |
---|---|---|---|---|---|---|---|---|---|---|---|
APPLING | Elevation (m) | 1.00 | 55.00 | 74.00 | 115.00 | 181.00 | 246.00 | 312.00 | 354.0 | 486.00 | 91524 |
APPLING | Effective Precipitation (mm) | 156.63 | 237.49 | 269.97 | 332.87 | 358.30 | 409.63 | 491.73 | 552.4 | 903.67 | 91524 |
APPLING | Frost-Free Days | 177.00 | 188.00 | 193.00 | 200.00 | 213.00 | 225.00 | 233.00 | 236.0 | 251.00 | 91524 |
APPLING | Mean Annual Air Temperature (degrees C) | 11.52 | 12.98 | 13.33 | 13.90 | 15.44 | 16.24 | 16.76 | 17.3 | 18.01 | 91524 |
APPLING | Mean Annual Precipitation (mm) | 1031.00 | 1068.00 | 1082.00 | 1114.00 | 1147.00 | 1242.00 | 1315.00 | 1348.0 | 1695.00 | 91524 |
APPLING | Growing Degree Days (degrees C) | 1976.00 | 2290.00 | 2367.00 | 2478.00 | 2747.00 | 2930.00 | 3057.00 | 3177.0 | 3333.00 | 91524 |
APPLING | Fraction of Annual PPT as Rain | 95.00 | 95.00 | 96.00 | 96.00 | 98.00 | 99.00 | 99.00 | 99.0 | 100.00 | 91524 |
APPLING | Design Freeze Index (degrees C) | 25.00 | 30.00 | 33.00 | 49.00 | 74.00 | 135.00 | 166.00 | 181.0 | 259.00 | 91524 |
The data look something like this. Your assignment is to compute pair-wise distances between series concepts, using median values (filled circles in the figure) from these data.
The results might look something like this.
We will help with the data preparation but it will be up to you to finish the process.
Adjust the vector of soil series names assigned to the
soils
character vector below. 5 to 10 should be enough, but
feel free to use all 50
state soils.
# define series names and get data
## you will need to edit this accordingly
soils <- c('your favorite here', 'another favorite', 'maybe the state soil', ...)
s <- fetchOSD(soils, extended = TRUE)
# extract annual climate + elevation data
x <- s$climate.annual
# check structure of the data
str(x)
head(x)
Develop the data matrix. The data are in “long format” (multiple rows per series, one for each variable) but we need the data in “wide format” (single row per series, columns containing medians).
# re-shape into wide format
x.wide <- dcast(x, series ~ climate_var, value.var = 'q50')
Check the structure of x.wide
one more time: note that
the first column contains the soil series name. This is something we
want to “keep track of” but not include in the pair-wise distance
calculation.
head(x.wide)
# save the series names into row names for later
# remember code examples from chapter 5?
row.names(x.wide) <- x.wide$series
# consider two possible ways of excluding the first column
# [ ] subsetting via negative column index
# this doesn't change x.wide but is handy for an on-the-fly modification
head(x.wide[, -1])
Don’t forget to standardize characteristics: dist()
doesn’t know how to standardize but daisy()
does. See
?daisy
for clues.
# dist() or daisy()
There are several methods and linkage criteria. How do you choose?
# hclust(), agnes(), diana()
# possibly convert to ape class for better figures, via hclust class
# as.hclust()
# as.phylo()
Note that this is based on the distance matrix. Be sure to inspect
the structure of the results with str()
.
# sammon() or metaMDS()
# most hierarchical clustering objects have a plot() method