Common Data Sources

Jay Skovlin, Dylan Beaudette, Stephen Roecker

This document is based on:

aqp (1.19)
soilDB (2.5.1)
sharpshootR (1.6)

Chapter 2: Common Data Sources

You need data before you can analyze it

loading data from various sources
visualizing pedon / component data via “sketches”
filtering pedon / component data via pattern matching
exporting pedon / component data to text files or GIS data files

Most of our data aren't in the form of CSV files

R packages to assist with loading soil survey data
R packages for modeling the complexities of soil data
R packages for routine analysis of pedon / component / ESC objects

Chapter 2 reference material

Why do all of this?

plot of chunk unnamed-chunk-2

That is a lot of (perhaps underutilized) data!

plot of chunk pedons_a1

Importance of Pedon Data

We've got a lot of data to work with and likely much more to bring online
Archiving quality observations of soils made in the past, present, and future is difficult work and we will need many different tools to help us tackle simple to complex analysis tasks
QC of pedon data is worth spending some time on!
These data are valuable

Common Issues with Pedon Data

Consistency
- Missing data
Confidence in the observations
- Uncertainty with depth
Description style differences
- Depth described, horizonation usage styles
Legacy data vintage
- Decadal span of data
- Taxonomy updates, horizon nomenclature
Location confidence
- Origin of the location information
- Datum used for data collection
- Accuracy for GPS values at the time of data collection

Suite of R packages specific to Soil Survey work

alt text

soil data are complex, inherently visual
reproducibility is increasingly important
focus on the interesting questions, not boilerplate
a common vocabulary for soil data analysis would be nice

aqp: Algorithms for Quantitative Pedology

alt text

special data structures: avoids annoying book-keeping code
visualization: soil profile sketches, transect diagrams, Munsell → RGB
re-sampling: regular depth-slicing or EA spline (coming soon)
aggregation: summary by depth-slice or arbitrary “slabs”
classification: pair-wise dissimilarity of profiles
utility functions: soil depth, missing data eval., simulation, …

SoilProfileCollection Objects

Formal class 'SoilProfileCollection' [package "aqp"] with 11 slots
  ..@ idcol       : chr "peiid"
  ..@ hzidcol     : chr "phiid"
  ..@ hzdesgncol  : chr "hzname"
  ..@ hztexclcol  : chr "texcl"
  ..@ depthcols   : chr [1:2] "hzdept" "hzdepb"
  ..@ metadata    :'data.frame':    1 obs. of  2 variables:
  ..@ horizons    :'data.frame':    626 obs. of  69 variables:
  ..@ site        :'data.frame':    106 obs. of  87 variables:
  ..@ sp          :Formal class 'SpatialPoints' [package "sp"] with 3 slots
  ..@ diagnostic  :'data.frame':    330 obs. of  4 variables:
  ..@ restrictions:'data.frame':    112 obs. of  8 variables:

Common Data Sources

Chapter 2: Common Data Sources

Why do all of this?

That is a lot of (perhaps underutilized) data!

Importance of Pedon Data

Common Issues with Pedon Data

Suite of R packages specific to Soil Survey work

aqp: Algorithms for Quantitative Pedology

SoilProfileCollection Objects

aqp: Algorithms for Quantitative Pedology

soilDB: Soil Database Interface

sharpshootR: Prototypes / Specialized Stuff

soilReports: Summarize / Compare Map Unit Concepts

Lets Do This