Common Data Sources

Jay Skovlin, Dylan Beaudette, Stephen Roecker








This document is based on:

  • aqp (1.19)
  • soilDB (2.5.1)
  • sharpshootR (1.6)

Chapter 2: Common Data Sources


You need data before you can analyze it

  • loading data from various sources
  • visualizing pedon / component data via “sketches”
  • filtering pedon / component data via pattern matching
  • exporting pedon / component data to text files or GIS data files

Most of our data aren't in the form of CSV files

  • R packages to assist with loading soil survey data
  • R packages for modeling the complexities of soil data
  • R packages for routine analysis of pedon / component / ESC objects



Chapter 2 reference material

Why do all of this?

plot of chunk unnamed-chunk-2

That is a lot of (perhaps underutilized) data!

plot of chunk pedons_a1

Importance of Pedon Data

  • We've got a lot of data to work with and likely much more to bring online
  • Archiving quality observations of soils made in the past, present, and future is difficult work and we will need many different tools to help us tackle simple to complex analysis tasks
  • QC of pedon data is worth spending some time on!
  • These data are valuable

Common Issues with Pedon Data

  • Consistency
    • Missing data
  • Confidence in the observations
    • Uncertainty with depth
  • Description style differences
    • Depth described, horizonation usage styles
  • Legacy data vintage
    • Decadal span of data
    • Taxonomy updates, horizon nomenclature
  • Location confidence
    • Origin of the location information
    • Datum used for data collection
    • Accuracy for GPS values at the time of data collection

Suite of R packages specific to Soil Survey work

alt text

  • soil data are complex, inherently visual
  • reproducibility is increasingly important
  • focus on the interesting questions, not boilerplate
  • a common vocabulary for soil data analysis would be nice

aqp: Algorithms for Quantitative Pedology

alt text

  • special data structures: avoids annoying book-keeping code
  • visualization: soil profile sketches, transect diagrams, Munsell →  RGB
  • re-sampling: regular depth-slicing or EA spline (coming soon)
  • aggregation: summary by depth-slice or arbitrary “slabs”
  • classification: pair-wise dissimilarity of profiles
  • utility functions: soil depth, missing data eval., simulation, …

SoilProfileCollection Objects

Formal class 'SoilProfileCollection' [package "aqp"] with 11 slots
  ..@ idcol       : chr "peiid"
  ..@ hzidcol     : chr "phiid"
  ..@ hzdesgncol  : chr "hzname"
  ..@ hztexclcol  : chr "texcl"
  ..@ depthcols   : chr [1:2] "hzdept" "hzdepb"
  ..@ metadata    :'data.frame':    1 obs. of  2 variables:
  ..@ horizons    :'data.frame':    626 obs. of  69 variables:
  ..@ site        :'data.frame':    106 obs. of  87 variables:
  ..@ sp          :Formal class 'SpatialPoints' [package "sp"] with 3 slots
  ..@ diagnostic  :'data.frame':    330 obs. of  4 variables:
  ..@ restrictions:'data.frame':    112 obs. of  8 variables:

plot of chunk SPC-2

aqp: Algorithms for Quantitative Pedology

soilDB: Soil Database Interface

sharpshootR: Prototypes / Specialized Stuff

soilReports: Summarize / Compare Map Unit Concepts

Lets Do This




  • Live coding examples / discussion
  • Self study and tinkering
  • Ask us questions!