Soil Data Aggregation using R










This document is based on:

  • aqp version 1.8-3
  • soilDB version 1.5-3
  • sharpshootR version 0.7-2

What is R?

  • an analysis platform: calculator, statistics program, GIS, etc…
  • R syntax: vocabulary to explore, summarize, and model data alt text

What is R?

  • an integrator: analysis + GIS + database connectivity
  • ODBC and GDAL link R to nearly all possible formats/interfaces alt text

What can R do?

  • Base R (functionality is extended through packages)

    • basic summaries of quantitative or qualitative data
    • data exploration via graphics
    • GIS data processing and analysis
  • Integrated Workflow (self-documenting, repeatable)

    • report writing: code + summaries in a single document
    • digital soil mapping: modeling properties and classes
    • evaluation of OSD, KSSL, NASIS, … for SDJR or related
  • Soil Science Specific Packages

What can R do?



alt text

estimate a most-likely horizonation, details to follow

What can R do?



alt text

estimate reasonable RIC, details to follow

Setup your local machine

install.packages('aqp', dep=TRUE) 
install.packages("RODBC", dep=TRUE)
install.packages('soilDB', dep=TRUE)
  • Every subsequent time, load required packages:
library(aqp)
library(soilDB)
library(latticeExtra)
# ... other libraries you might be using

Connect to NASIS via ODBC

  • connection allows R to pull data from a NASIS selected set via the soilDB package
  • connection sets up a small file that stores the NASIS login and password
  • this is a one-time operation → ODBC setup instructions

  • getting data from NASIS is as simple as:

# get pedons from the selected set
pedons <- fetchNASIS()

# plot the first 15
plot(pedons[1:15, ])

# get component data from the selected set
components <- get_component_data_from_NASIS_db()

Aggregation of Pedon Data: Common Problems

alt text

  • description styles

    • the age-old splitters vs. lumpers
    • use of transitional horizons - I say tomato, you say tomato!
  • legacy horizon nomenclature

    • old school meets new school and they ain't speakin' the same language…..
  • soils described to varying depths - it's the pits!

Generalized Horizon Labels: Micro-correlation

alt text

  • generalized horizon labels = GHL
  • determine the core concept
  • which way do transition horizons go?
    • GHL process allows for flexibility
  • horizon redundancy - how many Bt horizons are necessary?
  • outlier horizons - what happens to them?

Summary of GHL workflow

Most of these steps are implemented in the reports presented last

  1. Select a set of GHL that best represents a group of pedons to be aggregated.

  2. Assign GHL to each horizon using whatever information is available for grouping horizons.

  3. Evaluate GHL assignments and manually refine as needed.

  4. Keep track of final GHL assignments in NASIS or text file.

  5. Estimate a most likely horizonation e.g., top and bottom depths for each generalized horizon label.

  6. Compute range in characteristics, aka low-rv-high values, for clay, sand, pH, etc. over GHL. (next document in this series)

→ extended tutorial

Assumptions

  • a collection of pedons for a component should represent an aggregated component
  • a GHL process of some kind is required to efficiently summarize and aggregate horizon data from these collections
  • low-rv-high values must be tied to some metric:
    • 5th-50th-95th percentile is one option
  • linked pedons represent examples of the aggregate data, but aggregate data should not be limited to a very small set of linked pedons

  • IMPORTANT these reports and data summaries are a useful starting point for the aggregation process

Summarizing Data with Quantiles (Percentiles)

  • suggested basis for RIC:
    • 5th, 50th, and 95th percentiles are low-rv-high values
    • no assumptions of distribution, simple interpretation

plot of chunk quantiles-normal-dist

Summarizing Data with Quantiles (Percentiles)

  • suggested basis for RIC:
    • 5th, 50th, and 95th percentiles are low-rv-high values
    • no assumptions of distribution, simple interpretation

plot of chunk quantiles-exp-dist

A Sample Dataset

  • 15 pedons correlated to the Loafercreek soil series
  • Fine-loamy, mixed, superactive, thermic Ultic Haploxeralfs
  • Common soil formed on meta-volcanic rocks of the Sierra Nevada Foothills, MLRA 18
  • Included in the soilDB package for testing purposes

plot of chunk load-data

Consult the OSD for Ideas

  • Look up the series RIC if available

alt text

  • Horizons from the OSD may be a good starting point for GHL template

Tabulate Horizon Designations

  • sort by frequency
A Bt1 Bt2 Bt3 Oi R Cr Crt BA BCt Bt4 Bw CBt 2BCt 2Bt3 2CB 2Cr 2R
15 15 15 10 9 9 8 5 3 3 2 2 2 1 1 1 1 1
  • sort alphabetically
2BCt 2Bt3 2CB 2Cr 2R A BA BCt Bt1 Bt2 Bt3 Bt4 Bw CBt Cr Crt Oi R
1 1 1 1 1 15 3 3 15 15 10 2 2 2 8 5 9 9
  • plot ranges in horizon depths plot of chunk horizonation-mid-point

Summarize Available Soil Properties

plot of chunk univariate-eval-clay

plot of chunk univariate-eval-rf

Determination of a GHL Template and Rules

  • pattern matching via regular expression (REGEX)

    • this is where most micro-correlation decisions are defined
  • GHL and rules for our sample dataset:

    • A: ^A$|Ad|Ap
    • Bt1: Bt1$
    • Bt2: ^Bt2$
    • Bt3: ^Bt3|^Bt4|CBt$|BCt$|2Bt|2CB$|^C$
    • Cr: Cr
    • R: R
  • special characters in REGEX rules:

    • | = “or”
    • ^ = anchor to left-side
    • $ = anchor to right-side

Assignment of GHL

  • cross-tabulation of original names (columns) vs. GHL (rows)
  • be sure to check the not-used row
2BCt 2Bt3 2CB 2Cr 2R A BA BCt Bt1 Bt2 Bt3 Bt4 Bw CBt Cr Crt Oi R Sum
A 0 0 0 0 0 15 0 0 0 0 0 0 0 0 0 0 0 0 15
Bt1 0 0 0 0 0 0 0 0 15 0 0 0 0 0 0 0 0 0 15
Bt2 0 0 0 0 0 0 0 0 0 15 0 0 0 0 0 0 0 0 15
Bt3 1 1 1 0 0 0 0 3 0 0 10 2 0 2 0 0 0 0 20
Cr 0 0 0 1 0 0 0 0 0 0 0 0 0 0 8 5 0 0 14
R 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 9 10
not-used 0 0 0 0 0 0 3 0 0 0 0 0 2 0 0 0 9 0 14
Sum 1 1 1 1 1 15 3 3 15 15 10 2 2 2 8 5 9 9 103

Evaluation of GHL

  • plot profiles with horizons colored according to GHL
  • does it make sense?
  • what about horizons in the not-used group?

plot of chunk plot-ghl-1

Evaluation of GHL

  • plot range in GHL depths, check for overlap

plot of chunk eval-ghl-2

Evaluation of GHL

  • multivariate summary of depth, clay content, and RF volume

plot of chunk mds-plot

Importing GHL to NASIS

  • generalized horizon labels (so far) are stored in R session
  • Use an R script to create a text file, called horizon_agg.txt, of horizon ID's and corresponding GHL assignments
  • run the NASIS calculation to update the comp layer id field with the GHL for each pedon horizon
  • Calculations/Validations –> Pedon Horizon
    • 'Update horizon group aggregations using a text file'
    • Calculation will look for the horizon_agg.txt file in: C:/data/horizon_agg.txt
    • Calculation will load the file and update the comp layer id field
  • manual adjustments in NASIS: outliers, special cases, etc.

→ full instructions here

Demonstration of Several Report Styles

Thank You

Questions, comments, ideas

R resources for pedologists

Additional AQP Contributors:

  • Pierre Roudier (Landcare Research)