A quick tutorial on how to search for and accommodate messy data. Inconsistent horizon depths, missing horizons, overlapping horizons, and other such mayhem can lead to unexpected results. It is best to search for and filter (or fix) these kind of errors before proceeding to analyze soil profile data. While classes and methods within the aqp
package are fairly tolerant of messy data, it is recommended that you apply these tests to your data before feeding into aqp
functions.
Copy and paste this code into an R session to familiarize yourself with the sample data set used in this tutorial.
library(aqp)
# load sample data set, a simple data.frame object with horizon-level data from 10 profiles
data(sp1)
str(sp1)
# optionally read about it...
# ?sp1
# upgrade to SoilProfileCollection
# 'id' is the name of the column containing the profile ID
# 'top' is the name of the column containing horizon upper boundaries
# 'bottom' is the name of the column containing horizon lower boundaries
depths(sp1) <- id ~ top + bottom
# check it out:
class(sp1)
print(sp1)
plot(sp1)
Lets break some good data to make a point. Checking the horizon logic of a SoilProfileCollection
object is performed by the checkHzDepthLogic
function. Note that the kable
function from the knitr
package is used for nice-looking output in the tutorial, you wouldn’t normally use this in an interactive session. It is handy for use in RMarkdown reports.
# load required libraries
library(aqp)
library(knitr)
# load sample data and promote to SoilProfileCollection
data(sp1)
depths(sp1) <- id ~ top + bottom
# make a copy of the example data set, we will insert errors later
bad <- sp1
# insert a missing horizon boundary in profile P001
bad$top[4] <- NA
# create an overlapping horizon in profile P002
bad$top[9] <- 15
# create a depth logic error in profile P003
bad$bottom[13] <- 12
# perform battery of checks
sp1.chk <- checkHzDepthLogic(sp1)
bad.chk <- checkHzDepthLogic(bad)
# graphical reference
par(mar=c(0, 0, 0, 0))
plot(sp1)
Check the original data; looks like there is a problem with top/bottom depths sharing the save value! It is fairly common for Cr and R horizons to be described with only a top depth or equal top / bottom depths.
Note that we are using [
(square bracket) style indexing of data.frame
and SoilProfileCollection
objects. Be sure to review the SoilProfileCollection
Object Reference for details.
kable(sp1.chk)
id | valid | depthLogic | sameDepth | missingDepth | overlapOrGap |
---|---|---|---|---|---|
P001 | FALSE | FALSE | TRUE | FALSE | FALSE |
P002 | TRUE | FALSE | FALSE | FALSE | FALSE |
P003 | TRUE | FALSE | FALSE | FALSE | FALSE |
P004 | TRUE | FALSE | FALSE | FALSE | FALSE |
P005 | TRUE | FALSE | FALSE | FALSE | FALSE |
P006 | TRUE | FALSE | FALSE | FALSE | FALSE |
P007 | TRUE | FALSE | FALSE | FALSE | FALSE |
P008 | TRUE | FALSE | FALSE | FALSE | FALSE |
P009 | TRUE | FALSE | FALSE | FALSE | FALSE |
# highlight the problem
idx <- which(sp1.chk$sameDepth)
kable(sp1.chk[idx, ])
id | valid | depthLogic | sameDepth | missingDepth | overlapOrGap |
---|---|---|---|---|---|
P001 | FALSE | FALSE | TRUE | FALSE | FALSE |
# check horizon data for this profile
# using profile-indexing to select the profile
# using column-indexing to select the first 8 columns from horizon attributes
kable(horizons(sp1[idx, ])[, 1:8])
group | id | top | bottom | bound_distinct | bound_topography | name | texture |
---|---|---|---|---|---|---|---|
2 | P001 | 0 | 2 | A | S | A1 | GRVLS |
2 | P001 | 2 | 14 | G | S | A2 | GRVLS |
2 | P001 | 14 | 49 | G | S | AB | GRVLS |
2 | P001 | 49 | 57 | C | S | BA | GRVLS |
2 | P001 | 57 | 89 | C | S | Bt | GRVSCL |
2 | P001 | 89 | 89 | NA | NA | Rt | NA |
Check the “bad” version of sp1
that we intentionally broke.
kable(bad.chk)
id | valid | depthLogic | sameDepth | missingDepth | overlapOrGap |
---|---|---|---|---|---|
P001 | FALSE | TRUE | TRUE | TRUE | FALSE |
P002 | FALSE | FALSE | FALSE | FALSE | TRUE |
P003 | FALSE | TRUE | FALSE | FALSE | TRUE |
P004 | TRUE | FALSE | FALSE | FALSE | FALSE |
P005 | TRUE | FALSE | FALSE | FALSE | FALSE |
P006 | TRUE | FALSE | FALSE | FALSE | FALSE |
P007 | TRUE | FALSE | FALSE | FALSE | FALSE |
P008 | TRUE | FALSE | FALSE | FALSE | FALSE |
P009 | TRUE | FALSE | FALSE | FALSE | FALSE |
# merge test results into site-level attributes of SoilProfileCollection
site(bad) <- bad.chk
# convert validity test to a factor
bad$valid <- factor(bad$valid)
# plot valid vs. invalid hz depth test results
# par() is used to tighten default margins
par(mar=c(0,0,3,0))
groupedProfilePlot(bad, groups='valid', group.name.offset = -20)
title('Valid Horizonation Check')
What happens if you attempt to use profiles with horizon depth errors? Functions like slice
can tolerate some errors, but not overlaps.
try(slice(bad, 1:100 ~ .))
## Error : bad horizonation in IDs: P002
You can ignore the error by setting strict=FALSE
, but there are still warnings printed to the console.
slice(bad, 1:100 ~ ., strict = FALSE)
idx <- which(bad.chk$valid)
# keep the good ones
good <- bad[idx, ]
# check
par(mar=c(0,0,0,0))
plot(good)
Soil data often contain records where the lower depth of the deepest horizon is missing (NA
) or the same as the top depth (R, Cd, Cr, etc.). These data can either be filtered out or “cleaned” by replacing the missing lower depth with the corresponding upper depth.
Note: this functionality will eventually be put into a proper aqp
function.
# replace missing lower boundaries with top + 10
idx <- which(!is.na(sp1$top) & is.na(sp1$bottom))
if(length(idx) > 0) {
sp1$bottom[idx] <- sp1$top[idx] + 10
}
# replace same top/bottom with top + 10
idx <- which(sp1$top == sp1$bottom)
if(length(idx) > 0) {
sp1$bottom[idx] <- sp1$bottom[idx] + 10
}
# check data-cleaning:
kable(checkHzDepthLogic(sp1))
Older soils data with may have O horizons that start above the soil surface (e.g. "Oe 3 to 0 cm). Data containing these type of horizons must either be filtered or fixed before use with classes or methods defined in the aqp
package. Functions in the aqp
and soilDB
packages always nag about horizon depth errors. It is in your best interest to fix these errors if you can, as output from plotSPC
, slice
, slab
and related functions may not be reliable.
This document is based on aqp
version 1.27.