Digital summaries of legacy pedon descriptions

Stephen Roecker, Dylan Beaudette, Jay Skovlin, Skye Wills
6/1/2015

NRCS soil databases

  1. National Soil Information System (NASIS) (SQL Server)
    • SSURGO and Soil Data Access
    • STASTGO2
  2. Soil Characterization Database (Access)
  3. Ecological Site Descriptions (Text)
  4. Official Series Descriptions (Text)

* sorted by database sophistication

Legacy pedon data within the US

library(aqp)
library(soilDB)
library(plyr)
library(ggplot2)
library(reshape2)
library(stringr)
library(knitr)


pedons <- c(577, 6152, 9517, 19058, 42587, 112182, 231609, 184913)
year <- c("<1950s", "1950s", "1960s", "1970s", "1980s", "1990s", "2000s", "2010s")

cat("# pedons = ", formatC(sum(pedons), big.mark = ",", format = "fg"), "\n", "# lab pedons = ~64,000", sep = "")
# pedons = 606,595
# lab pedons = ~64,000
ggplot(data.frame(pedons, year), aes(x=year, y=pedons)) + geom_bar(stat="identity")

plot of chunk unnamed-chunk-1

# There has been lots of talk about the number or Soil Series, Components, Map units, etc... but little focus on the point data resource.
# Lots of talk about collecting new data, but little appreciation for existing data.

NASIS data structure

alt text

  • Released in 1994, custom Microsft SQL Server
  • Tables for: field pedons, lab pedons, component data, map units, legends, and projects
  • Functions for: queries, tables, reports, interpretations, calculations/validations, and exports

NASIS data structure

Horizon data

Site (Covariate) data

  • slope
  • landform
  • precipitation
  • etc…

Tools for interacting with soil data

Tabular analysis

1. Pencil and paper
2. Excel
3. PedonPC and AnalysisPC (Microsoft Access template) 
4. NASIS
5. R

Spatial analysis

1. SoilWeb
2. Web Soil Survey
3. Soil Data Viewer
4. SSURGO file geodatabases
5. R

* sorted by user sophistication

Objective

Problems

1. Data is underutilized
2. Inefficient tools
3. Fluid series concepts
4. Vaguely defined uncertainty metrics
5. Data isn't digitized
6. Tools are difficulat (especially R?)

Solution ?

1. standardized R reports

Why hasn't this been done already

alt text

  • description styles
  • legacy nomenclature
  • varying depths

Methods

  1. Setup ODBC connection and install additional packages
  2. Develop and assign a generic horizonation
  3. Generate report and evaluate

→ extended tutorial for horizon generalization
→ extended tutorial for R reports (Region 11 SharePoint)

Assumptions

  • horizonation is exists and is accurate
  • a subset of pedons (sample) should represent a component (population or aggregate)
  • some semi-automated process is necessary to efficiently summarize soil data
  • null hypthesis - pedons are assumed similar unless significantly(?) different
  • low-rv-high values should approximate the bulk of the distribution

Develop a typical horization

  • Look up the series RIC if available

alt text

  • sort by frequency
A Bt1 Bt2 Cr Bt3 R Oi Crt BA 2Bt3
d 62 59 58 35 27 22 21 18 11 5
  • graphically examine

Assign generic horizonation

  • pattern matching via regular expression (REGEX)

    • this is where most micro-correlation decisions are defined
  • GHL and rules for our sample dataset:

    • A: ^A$|Ad|Ap
    • Bt1: Bt1$
    • Bt2: ^Bt2$
    • Bt3: ^Bt3|^Bt4|CBt$|BCt$|2Bt|2CB$|^C$
    • Cr: Cr
    • R: R
  • special characters in REGEX rules:

    • | = “or”
    • ^ = anchor to left-side
    • $ = anchor to right-side

Evaluate typical horizonation

2BCt 2Bt2 2Bt3 2Bt4 2CB 2Cr 2Crt 2R A AB
A 0 0 0 0 0 0 0 0 62 0
Bt1 0 0 0 0 0 0 0 0 0 0
Bt2 0 0 0 0 0 0 0 0 0 0
Bt3 1 1 5 4 1 0 0 0 0 0
Cr 0 0 0 0 0 3 1 0 0 0
R 0 0 0 0 0 0 0 2 0 0
not-used 0 0 0 0 0 0 0 0 0 1
Sum 1 1 5 4 1 3 1 2 62 1

Evaluate typical horizonation

plot of chunk plot-ghl-1

Demonstrate Reports

  • open existing reports

→ examples of R reports

Closing thoughts

  • we have a wealth of existing data
  • data on soil series should be viewed in aggregate
  • “we shouldn't let the perfect be the enemy of the good”
  • reproducible research is good
  • Soil scientists are great at collecting data, but we have to just as good at analyzing it.

Thank you, any questions...?

Links to Reports and supporting material

Additional AQP Contributors:

  • Pierre Roudier (Landcare Research)

Acknowledgements

  • Alena Stephens, John Hammerly, Jennifer Outcalt, Henry Ferguson, Paul Finnell, and others…