Objectives

Using the concepts (and code) from the Numerical Taxonomy and Ordination chapter, compute pair-wise distances between soil series concepts from a suite of 8 physical and climatic variables. These 8 variables (characteristics) will be used to develop a climate signature for each soil series (individual). Present and interpret the results using both a dendrogram and ordination.

Key steps:

Brief Description of the Data Source

Annual and monthly climate summaries have been estimated from the SSR2 standard stack of 1981–2010 PRISM data. Weighted percentiles were estimated from a single sample from within each map unit delineations, grouped by component name.

We will be using the annual climate summaries. See this document for more information on how summaries like these have been developed and how to access them via fetchOSD.

Setup

First, you will need to load some packages. If you have completed the pre-class assignment then all of these packages should be in place.

library(soilDB)
library(sharpshootR)
library(latticeExtra)
library(reshape2)
library(RColorBrewer)
library(cluster)
library(ape)
library(vegan)
library(MASS)

An Example

The fetchOSD function from the soilDB package is a simple interface data that have been extracted from the text OSDs and summarized from the current SSURGO snapshot. The extended=TRUE argument is used to access the climate data we will be using in this assignment.

soils <- c('Ava', 'Drummer', 'Cisne', 'Pierre', 'Cecil', 'Appling', 'San Joaquin', 'Redding', 'Corning')
s <- fetchOSD(soils, extended = TRUE)
Select percentiles are provided for 7 annual climate summaries and elevation. Note that “q50” is the median value and “n” is the number of samples (i.e. number of delineations) used to estimate the percentiles.
series climate_var minimum q01 q05 q25 q50 q75 q95 q99 maximum n
APPLING Elevation (m) 1.00 55.00 74.00 115.00 181.00 246.00 312.00 354.0 486.00 91524
APPLING Effective Precipitation (mm) 156.63 237.49 269.97 332.87 358.30 409.63 491.73 552.4 903.67 91524
APPLING Frost-Free Days 177.00 188.00 193.00 200.00 213.00 225.00 233.00 236.0 251.00 91524
APPLING Mean Annual Air Temperature (degrees C) 11.52 12.98 13.33 13.90 15.44 16.24 16.76 17.3 18.01 91524
APPLING Mean Annual Precipitation (mm) 1031.00 1068.00 1082.00 1114.00 1147.00 1242.00 1315.00 1348.0 1695.00 91524
APPLING Growing Degree Days (degrees C) 1976.00 2290.00 2367.00 2478.00 2747.00 2930.00 3057.00 3177.0 3333.00 91524
APPLING Fraction of Annual PPT as Rain 95.00 95.00 96.00 96.00 98.00 99.00 99.00 99.0 100.00 91524
APPLING Design Freeze Index (degrees C) 25.00 30.00 33.00 49.00 74.00 135.00 166.00 181.0 259.00 91524

The data look something like this. Your assignment is to compute pair-wise distances between series concepts, using median values (filled circles in the figure) from these data.

The results might look something like this.


Your Turn

We will help with the data preparation but it will be up to you to finish the process.

Get and Prepare Data

Adjust the vector of soil series names assigned to the soils character vector below. 5 to 10 should be enough, but feel free to use all 50 state soils.

# define series names and get data
## you will need to edit this accordingly
soils <- c('your favorite here', 'another favorite', 'maybe the state soil', ...)
s <- fetchOSD(soils, extended = TRUE)

# extract annual climate + elevation data
x <- s$climate.annual

# check structure of the data
str(x)
head(x)

Develop the data matrix. The data are in “long format” (multiple rows per series, one for each variable) but we need the data in “wide format” (single row per series, columns containing medians).

# re-shape into wide format
x.wide <- dcast(x, series ~ climate_var, value.var = 'q50')

Check the structure of x.wide one more time: note that the first column contains the soil series name. This is something we want to “keep track of” but not include in the pair-wise distance calculation.

head(x.wide)

# save the series names into row names for later
# remember code examples from chapter 5?
row.names(x.wide) <- x.wide$series


# consider two possible ways of excluding the first column
# [ ] subsetting via negative column index
# this doesn't change x.wide but is handy for an on-the-fly modification
head(x.wide[, -1])

Develop Distance Matrix

Don’t forget to standardize characteristics: dist() doesn’t know how to standardize but daisy() does. See ?daisy for clues.

# dist() or daisy() 

Hierarchical Clustering

There are several methods and linkage criteria. How do you choose?

# hclust(), agnes(), diana()

# possibly convert to ape class for better figures, via hclust class
# as.hclust()
# as.phylo()

Ordination

Note that this is based on the distance matrix. Be sure to inspect the structure of the results with str().

# sammon() or metaMDS()

Visualize Pair-Wise Distances via Dendrogram and Ordination

# most hierarchical clustering objects have a plot() method

Interpret

  • Do these figures tell you anything that you didn’t already know?
  • Does the hierarchy suggested by the dendrogram mean anything?
  • Does cutting the dendrogram into n clusters result in groups that follow intuition?
  • Which representation (dendrogram vs. ordination) is more useful?
  • Do the axes of the 2D representation (e.g. ordination) of the original 8D data space map to meaningful climatic gradients?
  • How might we use the pair-wise distances computed from select characteristics of all soil series concepts?