Chapter 2 The Data We Use

2.1 Objectives (Tabular Data Structures)

  • Learn more about R and how to inspect objects and data types
  • Use the soilDB package to load NASIS pedon data into R
  • Understand the structure of data stored in a SoilProfileCollection (SPC)
  • Learn about the checks run on data loaded by the fetchNASIS() function
  • Learn ways to logically filter and subset data in R
  • Learn how functions can be used to bundle operations
  • Review additional data that is accessible via extended data functions
  • Introduce soilReports R package

2.2 The Structure of Soil Data

What if you could extract, organize, and visualize data from NASIS and many other commonly used soil database sources with a couple of lines of code?

The aqp (Algorithms for Quantitative Pedology) and soilDB packages enable data to be fetched from various sources and cast into a SoilProfileCollection(SPC) object. Tabular and spatial data objects fetched via soilDB and aqp methods can simplify the process of working with commonly used soil data.

2.2.1 Package References

The manual pages for soilDB and aqp are accessible (click index at the bottom of the Help tab in RStudio) by entering the following into the R console:

# load the libaries
library(soilDB)
library(aqp)

# links to lots of great examples
help(soilDB)
help(aqp)

2.2.2 soilDB functions for tabular data

soilDB functions are the quickest way to get up and running:

The following are still functional but not actively maintained:

  • fetchPedonPC()
    • Fetches commonly used site and horizon data from a PedonPC v.5 database.
  • fetchRaCA()
    • Gets Rapid Carbon Assessment (RaCA) data by State, geographic bounding-box, RaCA site ID, or series query from the SoilWeb system.

2.2.3 Importance of Pedon Data

The importance of pedon data for present and future work cannot be overstated. These data represent decades of on-the-ground observations of the soil resource for a given area.

As difficult as it may be to take the time to enter legacy pedon data, it is vitally important that we capture this resource and get these data into NASIS as an archive of point observations.

2.2.4 Some Issues With Pedon Data

  • Making and documenting observations of soil requires hard work. Digging is difficult, and writing soil descriptions is time consuming!

  • Our confidence in observations typically weakens with the depth of the material described.

    • If we acknowledge this, which we must, then how do we deal with it in pedon data?
      • Use a cutoff depth, for example 100 cm, can be used to truncate observations to a zone of greater confidence.
      • Show the relative confidence of the data with depth.

2.3 Challenges with Pedon Data

  • Consistency
    • Missing data
  • Confidence in the observations
    • Uncertainty with depth
  • Description style differences
    • Depth described, horizonation usage styles
  • Legacy data vintage
    • Decadal span of data
    • Taxonomy updates, horizon nomenclature changes
  • Location confidence
    • Origin of the location information
    • Datum used for data collection
    • Accuracy for GPS values at the time of data collection

2.3.1 Meeting the Challenges

For more information regarding difficult pedon data, see the following tutorial in the “aqp” package:
Dealing with Troublesome data.

2.4 The SoilProfileCollection

The SoilProfileCollection class (SPC) provided by the aqp package is a specialized data structure designed to support analysis. Before continuing, consider working through the SoilProfileCollection Reference.

It is a type of object that simplifies the process of working with collections of data associated with soil profiles, e.g., site-level data, horizon-level data, spatial data, diagnostic horizon data, metadata, etc.

A SoilProfileCollection is similar to the NASIS Site/Pedon “object” in that it provides generalizations, specific routines and rules about the fundamental tables and their relationships that are relevant to soil observations.

In many ways the SPC is more adaptable than the NASIS Pedon concept, strictly speaking. It can “contain” aggregations of any relevant parts of any soil data schema (table and column structure). Through this mechanism, it can be a mediator between formats and algorithms. However, the SPC is not as expressive as the complex hierarchy of objects in NASIS, which are more aligned with data archival vs. analysis.

2.4.1 SoilProfileCollection methods

Many “familiar” methods are defined for the SoilProfileCollection object. Some are unique, and others operate like more common functions of vector and data.frame objects, such as nrow() (“how many horizons?”) or length() (“how many sites/pedons?”).

Perhaps most importantly, when you access the site() data or the horizons() data of a SoilProfileCollection, you get a data.frame object that you can use like any other you might use or make in R.

2.4.1.1 Promoting a data.frame to SoilProfileCollection

The SoilProfileCollection object is a collection of 1-D profile descriptions, of the type conventionally described on a Form 232, or on tabular data returned from laboratory.

The object is “horizon data forward” in that you start with that, and can create site-level attributes by normalization, joins, calculations and more.

Most of the time if you are using your NASIS data, or an official database, there are defined ways of getting the data “into” an SPC. For example, fetchOSD returns a SoilProfileCollection that has been assembled from horizon and site level attributes gleaned from the OSDs text, Soil Classification database, and other sources.

In the pre-course, we had you set up a process so you could connect to your local NASIS instance to “fetch” data and have methods like fetchNASIS put things together for you.

This input to make a SoilProfileCollection can be represented most simply as a data.frame with unique site or profile ID and depth combinations for each horizon or layer–for example, a subset of the phorizon or chorizon table in NASIS.

A simple demonstration of “tabular horizon data” is the sp4 data set bundled with aqp: some serpentine soil profiles stored in a data.frame in the aqp package (after McGahan et al., 2009).

library(aqp)

# Load sample serpentine soil data (McGahan et al., 2009)
data(sp4, package = "aqp")

# this is a data.frame
# same as if loaded from CSV file etc.
class(sp4)
## [1] "data.frame"
# inspect the first couple of rows
head(sp4)
##       id name top bottom   K   Mg  Ca CEC_7 ex_Ca_to_Mg sand silt clay   CF
## 1 colusa    A   0      3 0.3 25.7 9.0  23.0        0.35   46   33   21 0.12
## 2 colusa  ABt   3      8 0.2 23.7 5.6  21.4        0.23   42   31   27 0.27
## 3 colusa  Bt1   8     30 0.1 23.2 1.9  23.7        0.08   40   28   32 0.27
## 4 colusa  Bt2  30     42 0.1 44.3 0.3  43.0        0.01   27   18   55 0.16
## 5  glenn    A   0      9 0.2 21.9 4.4  18.8        0.20   54   20   25 0.55
## 6  glenn   Bt   9     34 0.3 18.9 4.5  27.5        0.20   49   18   34 0.84

To convert this horizon data into a SoilProfileCollection, we need to identify three parameters: idname, top, and bottom. These parameters refer to the columns of unique profile IDs, top depths and bottom depths, respectively.

There are a couple of important constraints and considerations:

  • records (rows) represent horizons
  • profiles are uniquely identified by a column (user pedon ID, pedon record ID, etc.)
  • profiles IDs cannot contain missing values (NA)
  • horizon top and bottom depths are identified by columns
  • ideally there are no gaps, overlap, or missing top/bottom depths (more on that later)

Use a formula to specify column names in the data.frame, in this case "id", "top" and "bottom".

#      profile ID ~ top depth + bottom depth
depths(sp4) <- id ~ top + bottom

# note new class
class(sp4)
## [1] "SoilProfileCollection"
## attr(,"package")
## [1] "aqp"
# compact summary
sp4
## SoilProfileCollection with 10 profiles and 30 horizons
## profile ID: id  |  horizon ID: hzID 
## Depth range: 16 - 49 cm
## 
## ----- Horizons (6 / 30 rows  |  10 / 14 columns) -----
##      id hzID top bottom name   K   Mg  Ca CEC_7 ex_Ca_to_Mg
##  colusa    1   0      3    A 0.3 25.7 9.0  23.0        0.35
##  colusa    2   3      8  ABt 0.2 23.7 5.6  21.4        0.23
##  colusa    3   8     30  Bt1 0.1 23.2 1.9  23.7        0.08
##  colusa    4  30     42  Bt2 0.1 44.3 0.3  43.0        0.01
##   glenn    5   0      9    A 0.2 21.9 4.4  18.8        0.20
##   glenn    6   9     34   Bt 0.3 18.9 4.5  27.5        0.20
## [... more horizons ...]
## 
## ----- Sites (6 / 10 rows  |  1 / 1 columns) -----
##         id
##     colusa
##      glenn
##      kings
##   mariposa
##  mendocino
##       napa
## [... more sites ...]
## 
## Spatial Data: [EMPTY]

2.4.1.2 Extracting Site and Horizon Data

The SoilProfileCollection is an S4 R object. S4 objects have slots. Of primary importance, are the slots for site-level and horizon-level data.

You can extract values from these slots using the site() and horizons() functions. These create data.frame objects that are separate from the SoilProfileCollection.

# extract site data from SPC into new data.frame 's'
# note that it only contains profile IDs
s <- site(sp4)
str(s)
## 'data.frame':	10 obs. of  1 variable:
##  $ id: chr  "colusa" "glenn" "kings" "mariposa" ...
# extract horizon data from SPC into new data.frame 'h'
h <- horizons(sp4)
str(h)
## 'data.frame':	30 obs. of  14 variables:
##  $ id         : chr  "colusa" "colusa" "colusa" "colusa" ...
##  $ name       : chr  "A" "ABt" "Bt1" "Bt2" ...
##  $ top        : int  0 3 8 30 0 9 0 4 13 0 ...
##  $ bottom     : int  3 8 30 42 9 34 4 13 40 3 ...
##  $ K          : num  0.3 0.2 0.1 0.1 0.2 0.3 0.2 0.6 0.8 0.6 ...
##  $ Mg         : num  25.7 23.7 23.2 44.3 21.9 18.9 12.1 12.1 17.7 28.3 ...
##  $ Ca         : num  9 5.6 1.9 0.3 4.4 4.5 1.4 7 4.4 5.8 ...
##  $ CEC_7      : num  23 21.4 23.7 43 18.8 27.5 23.7 18 20 29.3 ...
##  $ ex_Ca_to_Mg: num  0.35 0.23 0.08 0.01 0.2 0.2 0.58 0.51 0.25 0.2 ...
##  $ sand       : int  46 42 40 27 54 49 43 36 27 42 ...
##  $ silt       : int  33 31 28 18 20 18 55 49 45 26 ...
##  $ clay       : int  21 27 32 55 25 34 3 15 27 32 ...
##  $ CF         : num  0.12 0.27 0.27 0.16 0.55 0.84 0.5 0.75 0.67 0.25 ...
##  $ hzID       : chr  "1" "2" "3" "4" ...

2.4.1.3 Methods like data.frame

The base R functions for accessing and setting data.frame columns by name such as $ and [[ work for SoilProfileCollection objects, too. Review data.frame methods:

  • [[ and $: single columns in data.frame, by name
    • x[['variable']]
    • x$variable
  • [: combinations of rows and columns, by name or index
    • x[i, ]: specified rows, all columns
    • x[, j]: all rows, specified columns
    • x[i, j]: specified rows, specified columns

See Chapter 1 and the Chapter 2 Appendix for additional details and examples.

2.4.1.3.1 Column Access by Name: $ and [[
# sp4 is a SoilProfileCollection
sp4$clay
##  [1] 21 27 32 55 25 34  3 15 27 32 25 31 33 13 21 23 15 17 12 19 14 14 22 25 40 51 67 24 25 32
sp4[['clay']]
##  [1] 21 27 32 55 25 34  3 15 27 32 25 31 33 13 21 23 15 17 12 19 14 14 22 25 40 51 67 24 25 32
# horizon data.frame
h$clay
##  [1] 21 27 32 55 25 34  3 15 27 32 25 31 33 13 21 23 15 17 12 19 14 14 22 25 40 51 67 24 25 32
h[['clay']]
##  [1] 21 27 32 55 25 34  3 15 27 32 25 31 33 13 21 23 15 17 12 19 14 14 22 25 40 51 67 24 25 32
# use $<- or [[<- to set proportional clay content
sp4$clay <- sp4[['clay']] / 100

# undo what we did above; back to percentage
sp4[['clay']] <- sp4$clay * 100

# create new site variable ("numberone" recycled for all sites)
site(sp4)$newvar1 <- "numberone"

# create new horizon variable ("numbertwo" recycled for all horizons)
horizons(sp4)$newvar2 <- "numbertwo"
2.4.1.3.2 Row Access: [

The SoilProfileCollection also has [ (“single bracket”), but with a different interpretation from the [i, j] indexing of data.frame objects.

  • In a data.frame you have object[row, column, drop=TRUE]; the result is a data.frame (or a vector with default drop=TRUE).

  • In a SoilProfileCollection you have object[site, horizon]; the result is a SoilProfileCollection.

# i-index: first 2 profiles, all horizons
sp4[1:2, ]
## SoilProfileCollection with 2 profiles and 6 horizons
## profile ID: id  |  horizon ID: hzID 
## Depth range: 34 - 42 cm
## 
## ----- Horizons (6 / 6 rows  |  10 / 15 columns) -----
##      id hzID top bottom name   K   Mg  Ca CEC_7 ex_Ca_to_Mg
##  colusa    1   0      3    A 0.3 25.7 9.0  23.0        0.35
##  colusa    2   3      8  ABt 0.2 23.7 5.6  21.4        0.23
##  colusa    3   8     30  Bt1 0.1 23.2 1.9  23.7        0.08
##  colusa    4  30     42  Bt2 0.1 44.3 0.3  43.0        0.01
##   glenn    5   0      9    A 0.2 21.9 4.4  18.8        0.20
##   glenn    6   9     34   Bt 0.3 18.9 4.5  27.5        0.20
## 
## ----- Sites (2 / 2 rows  |  2 / 2 columns) -----
##      id   newvar1
##  colusa numberone
##   glenn numberone
## 
## Spatial Data: [EMPTY]
# j-index: all profiles; first 2 horizons of each profile
sp4[, 1:2]
## SoilProfileCollection with 10 profiles and 20 horizons
## profile ID: id  |  horizon ID: hzID 
## Depth range: 5 - 40 cm
## 
## ----- Horizons (6 / 20 rows  |  10 / 15 columns) -----
##      id hzID top bottom name   K   Mg  Ca CEC_7 ex_Ca_to_Mg
##  colusa    1   0      3    A 0.3 25.7 9.0  23.0        0.35
##  colusa    2   3      8  ABt 0.2 23.7 5.6  21.4        0.23
##   glenn    5   0      9    A 0.2 21.9 4.4  18.8        0.20
##   glenn    6   9     34   Bt 0.3 18.9 4.5  27.5        0.20
##   kings    7   0      4    A 0.2 12.1 1.4  23.7        0.58
##   kings    8   4     13  Bt1 0.6 12.1 7.0  18.0        0.51
## [... more horizons ...]
## 
## ----- Sites (6 / 10 rows  |  2 / 2 columns) -----
##         id   newvar1
##     colusa numberone
##      glenn numberone
##      kings numberone
##   mariposa numberone
##  mendocino numberone
##       napa numberone
## [... more sites ...]
## 
## Spatial Data: [EMPTY]

When you use the [ function, everything in the SoilProfileCollection is subset simultaneously depending on the constraints specified by the indices.

# First profile, first 2 horizons
horizons(sp4[1, 1:2])
##       id name top bottom   K   Mg  Ca CEC_7 ex_Ca_to_Mg sand silt clay   CF hzID   newvar2
## 1 colusa    A   0      3 0.3 25.7 9.0  23.0        0.35   46   33   21 0.12    1 numbertwo
## 2 colusa  ABt   3      8 0.2 23.7 5.6  21.4        0.23   42   31   27 0.27    2 numbertwo

All slots in the collection have a relationship to the site or i-index. When you remove sites (profiles), all associated records (e.g. spatial, diagnostics, horizons, etc.) in the object are removed.

Similarly, when all horizons are removed (say, you request the 6th j-index from a profile that has only 5 layers), the site index and all associated data are removed from the collection.

2.4.2 Exercise: Assemble a SoilProfileCollection from several CSV files

https://github.com/ncss-tech/stats_for_soil_survey/blob/master/exercise_ideas/SPC-from-CSV-files.R

2.5 Using the soilDB Package

The soilDB package for R provides functions for accessing data stored in NASIS, KSSL, SDA, SoilWeb, SoilGrids and other sources.

These high-level ‘fetch’ functions bundle or wrap lower-level get functions which access internal database interfaces to NASIS and other data sources. The ODBC connection to NASIS that you set up during the pre-course is an example of this internal database interface.

Basic data checks are run within fetch functions. These checks ensure the basic integrity of the data as it is queried and moved from its existing structure into an SPC. There are times when it is useful to use the lower-level get functions individually. They generally return single data.frame or list of data.frame.

You can set up scripts to make custom queries against these or other sources on your own – there is an example at the end of this section.

For now, we will start with the fetch functions and others that will get you a large variety of data you can use for soil and ecological site analyses.

2.5.1 Open Database Connectivity (ODBC) Connection to NASIS

After setting up an ODBC connection, you can use R to access data from a selected set defined in your local NASIS database.

How to Create an ODBC Connection to local NASIS database for R.

Does NASIS need to be open and running to query data using soilDB?

No, fetchNASIS() works whether the NASIS application is running or not. You just need to make sure that the data you want has been loaded into your selected set.

2.5.2 fetchNASIS()

The fetchNASIS() convenience function extracts data from a NASIS selected set via Structured Query Language (SQL).

Note that the import process in fetchNASIS(), and the other methods, is not comprehensive. It does not pull every column for every table related to pedon data out of NASIS.

Instead, it pulls essential / commonly used pedon and horizon data. Higher level functions like fetchNASIS() bundle a series of lower-level queries to get specific parts of the Pedon or Component data structures. Much of the nested complexity of NASIS is simplified in the resulting object. You may need to make more detailed queries and joins to resolve specific questions.

Many-to-one relationships are “flattened” where possible by fetchNASIS(). This aggregates the data from various tables into one “site” record with related horizon records, per profile.

You can see the child tables that are aggregated using the get_extended_data_from_NASIS() method, which returns a named list of child table sources that can be joined to the SoilProfileCollection made with fetchNASIS() using the internal record IDs.

2.5.2.1 fetchNASIS arguments

fetchNASIS() has a number of different arguments:

  • from = ‘pedons’ or ‘components’ or ‘pedon_report’

    • This option allows you to select which data you want to load from NASIS. Choosing either ‘pedons’ or ‘components’ will load data from your local database. If ‘pedon_report’ is specified then it will load data from the text file generated by the NASIS report ‘fetchNASIS’ (run offline). This is useful for loading more than 20,000 pedons at one time, such for an entire Soil Survey Region.
  • url = string specifying the (temporary) URL for the NASIS pedon_report output generated by the fetchNASIS NASIS Report that can be run “Offline Against National Database” EXAMPLE OUTPUT (MT663)

  • SS = TRUE/FALSE

    • The Selected Set (SS) option allows you to choose whether you want the data to load from your current selected set in NASIS or from the local database tables. The default is set to TRUE so if unspecified fetchNASIS() will always load from the data in the selected set.
  • stringAsFactors = TRUE/FALSE

    • This option allows you to select whether to convert strings into factors or not. The default is set to FALSE, which will handle strings as character formats. Manually set this option to TRUE if you wish to handle character strings as factors.
  • rmHzErrors = TRUE/FALSE

    • Setting this value to TRUE (the default) enables checks for horizon depth consistency. Consider setting this argument to FALSE if you aren’t concerned about horizon-depth errors or if you know that your selected set contains many combination horizons (e.g., consisting of E/Bt horizons or similar two-part horizons described individually for the same depth range). Note that any pedons flagged as having horizon-depth errors (rmHzErrors = TRUE) are omitted from the data returned by fetchNASIS().
  • nullFragsAreZero = TRUE/FALSE

    • Setting this value to TRUE (the default) converts null entries for rock fragment volumes to 0. This is typically the right assumption because rock fragment data are typically populated only when observed. If you know that your data contain a combination of omitted information (e.g. no rock fragment volumes are populated) then consider setting this argument to FALSE.
  • soilColorState = ‘moist’ or ‘dry’

    • Select dry or moist colors to be converted and placed into a horizon-level attribute called soil_color. The default is set to ‘moist’ unless specified. Moist and dry colors are also stored in moist_soil_color and dry_soil_color.
  • lab = TRUE/FALSE

    • This option allows for loading the data associated with horizons that may be in the phlabresults table. The default is set to FALSE, which will not load records from the phlabresults table.

For more information on the data checks and adjusting the default options to fetchNASIS() function, see the following resource: Tips on Getting Data from NASIS into R.

2.5.3 The gopheridge Dataset

The gopheridge sample data set is a sample R object returned from fetchNASIS() in a self-contained .rda file stored in soilDB.

Open RStudio, and set up the environment by loading packages and the gopheridge sample dataset.

library(aqp)
library(soilDB)

# load example dataset
data(gopheridge, package = "soilDB")

# what kind of object is this?
class(gopheridge)
## [1] "SoilProfileCollection"
## attr(,"package")
## [1] "aqp"
# what does the internal structure look like?
str(gopheridge, 2)
## Formal class 'SoilProfileCollection' [package "aqp"] with 9 slots
##   ..@ idcol       : chr "peiid"
##   ..@ hzidcol     : chr "phiid"
##   ..@ depthcols   : chr [1:2] "hzdept" "hzdepb"
##   ..@ metadata    :List of 8
##   ..@ horizons    :'data.frame':	317 obs. of  73 variables:
##   ..@ site        :'data.frame':	52 obs. of  114 variables:
##   ..@ sp          :Formal class 'SpatialPoints' [package "sp"] with 3 slots
##   ..@ diagnostic  :'data.frame':	164 obs. of  4 variables:
##   ..@ restrictions:'data.frame':	56 obs. of  8 variables:
# the fields at the site and horizon levels within the SPC
siteNames(gopheridge)
##   [1] "peiid"                                    "siteiid"                                 
##   [3] "pedon_id"                                 "site_id"                                 
##   [5] "obs_date"                                 "utmzone"                                 
##   [7] "utmeasting"                               "utmnorthing"                             
##   [9] "x"                                        "y"                                       
##  [11] "horizdatnm"                               "x_std"                                   
##  [13] "y_std"                                    "gpspositionalerror"                      
##  [15] "describer"                                "pedonpurpose"                            
##  [17] "pedontype"                                "pedlabsampnum"                           
##  [19] "labdatadescflag"                          "tsectstopnum"                            
##  [21] "tsectinterval"                            "utransectid"                             
##  [23] "tsectkind"                                "tsectselmeth"                            
##  [25] "elev_field"                               "slope_field"                             
##  [27] "aspect_field"                             "plantassocnm"                            
##  [29] "earthcovkind1"                            "earthcovkind2"                           
##  [31] "erocl"                                    "bedrckdepth"                             
##  [33] "bedrckkind"                               "bedrckhardness"                          
##  [35] "hillslopeprof"                            "geomslopeseg"                            
##  [37] "shapeacross"                              "shapedown"                               
##  [39] "slopecomplex"                             "drainagecl"                              
##  [41] "flodfreqcl"                               "floddurcl"                               
##  [43] "flodmonthbeg"                             "pondfreqcl"                              
##  [45] "ponddurcl"                                "pondmonthbeg"                            
##  [47] "geomposhill"                              "geomposmntn"                             
##  [49] "geomposflats"                             "swaterdepth"                             
##  [51] "slope_shape"                              "classdate"                               
##  [53] "classifier"                               "classtype"                               
##  [55] "taxonname"                                "localphase"                              
##  [57] "taxonkind"                                "seriesstatus"                            
##  [59] "taxpartsize"                              "taxorder"                                
##  [61] "taxsuborder"                              "taxgrtgroup"                             
##  [63] "taxsubgrp"                                "soiltaxedition"                          
##  [65] "osdtypelocflag"                           "taxmoistcl"                              
##  [67] "taxtempregime"                            "taxfamother"                             
##  [69] "psctopdepth"                              "pscbotdepth"                             
##  [71] "selection_method"                         "ecositeid"                               
##  [73] "ecositenm"                                "ecositecorrdate"                         
##  [75] "es_classifier"                            "es_selection_method"                     
##  [77] "ochric.epipedon"                          "argillic.horizon"                        
##  [79] "paralithic.contact"                       "lithic.contact"                          
##  [81] "umbric.epipedon"                          "cambic.horizon"                          
##  [83] "mollic.epipedon"                          "paralithic.materials"                    
##  [85] "lithologic.discontinuity"                 "andic.soil.properties"                   
##  [87] "densic.contact"                           "abrupt.textural.change"                  
##  [89] "aquic.conditions"                         "duripan"                                 
##  [91] "slickensides"                             "redox.depletions.with.chroma.2.or.less"  
##  [93] "redox.concentrations"                     "reduced.matrix"                          
##  [95] "densic.materials"                         "volcanic.glass"                          
##  [97] "folistic.epipedon"                        "albic.materials"                         
##  [99] "human.transported.material"               "strongly.contrasting.particle.size.class"
## [101] "secondary.carbonates"                     "anthropic.epipedon"                      
## [103] "surface_fgravel"                          "surface_gravel"                          
## [105] "surface_cobbles"                          "surface_stones"                          
## [107] "surface_boulders"                         "surface_channers"                        
## [109] "surface_flagstones"                       "surface_paragravel"                      
## [111] "surface_paracobbles"                      "landform_string"                         
## [113] "pmkind"                                   "pmorigin"
horizonNames(gopheridge)
##  [1] "phiid"                "peiid"                "hzname"               "genhz"               
##  [5] "hzdept"               "hzdepb"               "bounddistinct"        "boundtopo"           
##  [9] "clay"                 "silt"                 "sand"                 "fragvoltot"          
## [13] "texture"              "texcl"                "lieutex"              "phfield"             
## [17] "effclass"             "labsampnum"           "rupresblkdry"         "rupresblkmst"        
## [21] "rupresblkcem"         "stickiness"           "plasticity"           "ksatpedon"           
## [25] "texture_class"        "d_r"                  "d_g"                  "d_b"                 
## [29] "d_hue"                "d_value"              "d_chroma"             "d_sigma"             
## [33] "m_r"                  "m_g"                  "m_b"                  "m_hue"               
## [37] "m_value"              "m_chroma"             "m_sigma"              "moist_soil_color"    
## [41] "dry_soil_color"       "fine_gravel"          "gravel"               "cobbles"             
## [45] "stones"               "boulders"             "channers"             "flagstones"          
## [49] "parafine_gravel"      "paragravel"           "paracobbles"          "parastones"          
## [53] "paraboulders"         "parachanners"         "paraflagstones"       "unspecified"         
## [57] "total_frags_pct_nopf" "total_frags_pct"      "art_fgr"              "art_gr"              
## [61] "art_cb"               "art_st"               "art_by"               "art_ch"              
## [65] "art_fl"               "art_unspecified"      "total_art_pct"        "huartvol_cohesive"   
## [69] "huartvol_penetrable"  "huartvol_innocuous"   "huartvol_persistent"  "soil_color"          
## [73] "hzID"

2.5.3.1 Make profile sketches

The plotSPC() or plot() function applied to a SoilProfileCollection object generates sketches based on horizon depths, designations, and colors. The SoilProfileCollection Reference contains many examples demonstrating way in which these sketches can be customized.

The fetchNASIS() function automatically converts moist Munsell colors into R-style colors.

Multiple colors per horizon are mixed to make the ones that are shown by default in the calculated soil_color fields. See ?plotSPC for a detailed list of arguments and examples.

par(mar = c(1, 1, 1, 1))

# omitting pedon IDs and horizon designations
plotSPC(gopheridge,
     print.id = FALSE,
     name = NA,
     width = 0.3)
title('Pedons from the `gopheridge` sample dataset', line = -0.5)

2.5.3.2 Pedon Data Checks

When you load pedons using the fetchNASIS() function, the following data checks are performed:

  • Presence of multiple map datums. Results reported to the user and the data are not modified.

  • Inconsistent horizon boundaries. Pedons with inconsistent horizon boundaries are not loaded unless rmHzErrors = FALSE. In most cases, this occurs when the bottom depth of a horizon is not the same as the upper depth of the next lower horizon.

##   hzname top bot
## 1      A   0  30
## 2    Bt1  38  56
## 3    Bt2  56 121
## 4     Bk 121 135
## 5      R 135  NA

Note the issue above. The bottom depth of the A horizon and the upper depth of the Bt1 horizon should be the same: either 30 or 38 cm. The correct depth needs to be determined and fixed in the database.

  • Missing lower horizon depths. Offending horizons are fixed by replacing the missing bottom depth with the top depth plus 2 cm. In the case of the profile shown above, a bottom depth of 137 cm would be inserted where the depth is missing.

  • Sites missing pedon records. Data without corresponding horizons are not loaded.

2.5.3.3 Find Pedons with Errors

If errors in the pedon data are detected when loading data using fetchNASIS(), the following “get” commands can trace them back to the corresponding records in NASIS.

These access an option that is stored in a special object called an Environment associated with the soilDB package – they generally contain vectors of IDs.

  • get('sites.missing.pedons', envir = soilDB.env)
    • Returns user site ID for sites missing pedons
  • get('dup.pedon.ids', envir = soilDB.env)
    • Returns user pedon ID for sites with duplicate pedon ID
  • get('bad.pedon.ids', envir = soilDB.env)
    • Returns user pedon ID for pedons with inconsistent horizon depths
  • get('bad.horizons', envir = soilDB.env)
    • Returns a data.frame of horizon-level information for pedons with inconsistent horizon depths
get('sites.missing.pedons', envir = soilDB.env)
get('dup.pedon.ids', envir = soilDB.env)
get('bad.pedon.ids', envir = soilDB.env)
get('bad.horizons', envir = soilDB.env)

These get() calls access variables stored in the package environment soilDB.env. The variables only exist if there are “problems” / values found by the data checks – and if you fix the errors in the NASIS database and the checks don’t find any errors then nothing will be returned by these functions.

2.5.4 Further Reading / Reference

  • The SoilProfileCollection Reference contains examples based on the sp4 sample dataset. This is a good exercise for the class and reference for future work.

  • The Querying Soil Series Data document contains a more detailed explanation of how the data behind fetchOSD were created and how to interpret its results.

  • The Competing Soil Series document contains a number of examples related to getting OSD morphology using fetchOSD, laboratory data using fetchKSSL, and summary via profile sketches and slice-wise aggregation.

  • Several examples pertaining to SDA, soil texture, and soil profile sketches are explored in this Random Walk 001.

  • The “SPC Plotting Ideas” document outlines some advanced techniques for arranging and annotating soil profile sketches.

2.6 Working with Data in R

2.6.1 Summaries

Now that you’ve loaded some data, you can look at additional ways to summarize and interact with data elements.

2.6.1.1 table()

The base R table() function is very useful for quick summary operations. It returns a named vector with the amount of each unique level of the a given vector.

The numeric vector of “counts” is commonly combined with other functions such as sort(), order(), prop.table(), is.na() or !is.na() (is not NA) to identify abundance, proportions, or missing data (NA).

# load required packages
library(aqp)
library(soilDB)

data("gopheridge", package = "soilDB")

# for these examples, we use the gopheridge dataset as our "selected set"
pedons <- gopheridge 
# NOTE: you can use fetchNASIS to load your own data, like this:
# pedons <- fetchNASIS()

# summarize which soil taxa we have loaded
table(pedons$taxonname)
## 
## Gopheridge 
##         52
# sort results in descending order
sort(table(pedons$taxonname), decreasing = TRUE)
## Gopheridge 
##         52
# could do the same thing for taxonomic subgroups 
# or any column of the SPC at the site or horizon levels
table(pedons$taxsubgrp)
## 
## mollic haploxeralfs  typic haploxerepts  ultic haploxeralfs  ultic haploxerolls 
##                   1                   6                  44                   1
sort(table(pedons$taxsubgrp), decreasing = TRUE)
## 
##  ultic haploxeralfs  typic haploxerepts mollic haploxeralfs  ultic haploxerolls 
##                  44                   6                   1                   1

2.6.1.2 dput()

Another very useful function is dput(), which prints a string-representation of the output of an R expression as code that generates that output so that you can re-use it.

It is also good short-hand for concatenating a comma-delimited list.

Here, we select the first four pedon_id values in pedons site table, and print out a comma-separated c() expression showing those values as a static R expression.

dput(site(pedons)$pedon_id[1:4])
## c("08DWB028", "07RJV098", "07RJV099", "S2007CA009002")

This result string can be copy-pasted as a comma-delimited string, used as a string of UserSiteID’s for entering into NASIS list queries or other things.

The dput() function is helpful when sending questions or example data to colleagues. Here, we see that the results of dput() are equivalent to the input after evaluation by R.

all(c("08DWB028", "07RJV098", "07RJV099", "S2007CA009002") == site(pedons)$pedon_id[1:4])
## [1] TRUE

2.6.2 Missing Values

table(pedons$taxsubgrp, useNA = "ifany")
## 
## mollic haploxeralfs  typic haploxerepts  ultic haploxeralfs  ultic haploxerolls 
##                   1                   6                  44                   1
#  is.na(...) 
table(is.na(pedons$taxsubgrp))
## 
## FALSE 
##    52
#  is NOT NA !is.na(...)
table(!is.na(pedons$taxsubgrp))
## 
## TRUE 
##   52
# it can also be applied to horizon level columns in the SPC
sort(table(pedons$texture), decreasing=TRUE)
## 
##        BR         L      GR-L     GRV-L     CBV-L       SPM     GRX-L       SIL    GRV-CL    CBV-CL    GR-SIL 
##        58        36        33        24        18        14        12        12         9         8         7 
##     CBX-L    GRX-CL    CBX-CL   GRV-SIL        CL   GRV-SCL     GRX-C       MPM        SL      CB-L     GR-CL 
##         6         5         4         4         3         3         3         3         3         2         2 
##   GRX-SCL     PGR-C    PGRX-L      SICL    STV-CL     STV-L     STX-C     STX-L         C      CB-C     CB-CL 
##         2         2         2         2         2         2         2         2         1         1         1 
##    CB-SCL    CB-SIL   CBV-SIL   CBX-SCL      CN-L   CN-SICL     CNX-L  CNX-SICL     FLV-L      GR-C    GR-SIC 
##         1         1         1         1         1         1         1         1         1         1         1 
##  GRV-SICL   GRX-SIC   GRX-SIL  PCB-SICL PCBV-SICL     PCN-C   PCNX-CL    PGRV-C   PGRV-CL  PGRX-SCL  PGRX-SIL 
##         1         1         1         1         1         1         1         1         1         1         1 
##      ST-L     STV-C    STX-CL  STX-SICL 
##         1         1         1         1

2.6.3 Logical Operators

Logical operators act on vectors for the purposes of comparison.

  • == “EQUAL TO”

  • != “NOT EQUAL TO”

  • < LESS than

    • LESS than or equal to <=
  • > GREATER than

    • GREATER than or equal to >=
  • %in% Equivalent to IN () in SQL; same logic as match() but returns a boolean, not integer

    • Example: pedons$taxpartsize %in% c('loamy-skeletal', 'sandy-skeletal')

    • Returns a vector of TRUE/FALSE equal in length to left-hand side

  • & logical AND

  • | logical OR

2.6.4 Pattern Matching

The following examples use the grep() function to pattern match within the data, create an index of the SoilProfileCollection for records that match the specified pattern within that column, and then use that index to filter to specific sites and their corresponding profiles.

Patterns are specified using regular expression (REGEX) syntax.

This process can be applied to many different columns in the SPC based on how you need to filter the data. This example pattern matches on the tax_subgroup column, but another useful application might be to pattern match on geomorphology or parent material.

Say we want to see what the variation of particle size classes are within a specific subgroup? We can use grep() to create a row index, then apply that index to the SoilProfileCollection.

# create a numeric index for pedons with taxsubgroup containing 'typic'
idx <- grep('typic', pedons$taxsubgrp)
idx
## [1] 11 12 13 14 26 50
# use square bracket notation to subset 'typic' soils in `subset1` object
subset1 <- pedons[idx,]

# or use the index directly to summarize taxpartsize for 'typic' soils
sort(table(pedons$taxpartsize[idx]), decreasing = TRUE)
## 
##  loamy-skeletal clayey-skeletal 
##               5               1

Note: grep() below has an invert argument (default FALSE). This option is very useful for excluding the results of the pattern matching process by inverting whatever the result is. grepl() is the logical version of grep(), so you can invert it using the logical NOT operator: !.

Another method is to create an index using which() function. which() takes any logical vector (or expression), and it returns the indices (positions) where that expression returns TRUE. The use of which becomes more important when there are missing values (NA) in an expression.

Do a graphical check to see the “typic” profiles are selected.

Plot them in R using the SoilProfileCollection “plot” method (e.g., specialized version of the generic plot() function).

# adjust margins
par(mar=c(1,0,0,1))

# plot the first 10 profiles of subset1
plot(subset1[1:10, ], label = 'taxsubgrp', max.depth = 60)
title('Pedons with the word "typic" at subgroup-level of Soil Taxonomy', line=-2)

For more information on using regular expressions in grep() for pattern matching operations, see: Regular-expression-syntax.

Quick check: Compare or run these commands with some code, and review the documentation, to answer the questions.

  • True or False: grepl() returns a numeric vector
  • True or False: which(grepl('typic', pedons$taxsubgrp)) is the same as grep('typic', pedons$taxsubgrp).

2.6.4.1 REGEX Quick Start

  • . One character, any character

  • * Zero-or-more Quantifier (of previous token)

  • + One-or-more Quantifier (of previous token)

  • {n} quantifier where n is the the number of a match “repeats” (of previous token)

  • [A-Z!] ONE capital letter, or an exclamation mark

  • [0-9]{2} TWO numbers (using { quantifier)

  • | is equivalent to OR:

    • Example: grep('loamy|sandy', c("loamy-skeletal","sandy","sandy-skeletal"))
      • “loamy OR sandy”
  • ^ Anchor to beginning of string / line:

    • Example: grep('^sandy', c("loamy-skeletal","sandy","sandy-skeletal"))
      • STARTS WITH sandy”
  • $ Anchor to end of string / line:

    • Example: grep('skeletal$', c("loamy-skeletal","sandy","sandy-skeletal"))
      • ENDS WITH skeletal”
  • \\b Anchor to word boundary:

    • Example: grep('\\bmesic', c("mesic","thermic","isomesic"))
      • WORD STARTS WITH mesic” (e.g. not “isomesic”)

2.6.4.2 Resources for Regular Expressions

2.6.5 Filtering

A variety of methods are available to subset or “filter” R data objects, from a simple data.frame or vector, to something more complex like a Spatial object or a SoilProfileCollection.

You can index many R objects using numeric or logical expressions as above. There are also methods that make this process a little easier.

The base R method for this is subset() and it works on data.frame objects. It is nice because you can specify column names without explicitly referencing the data set, since subset uses non-standard evaluation of expressions passed as arguments.

2.6.5.1 Filtering with aqp::subset

We use the SoilProfileCollection subset method, where we first specify a data (pedons) object then we can write expressions for the columns that exist in that object.

Here, we combine two logical expressions to find taxsubgrp containing "alfs" (Alfisols) with obsdate before January 1st, 2010.

subset2 <- subset(
  pedons, 
  grepl("alfs", taxsubgrp) & obs_date < as.POSIXlt("2010-01-01")
)

# check taxonomic range of particle size classes in the data
# overwhelmingly these are described as loamy-skeletal ultic haploxeralfs
sort(table(subset2$taxsubgrp), decreasing = TRUE)
## 
##  ultic haploxeralfs mollic haploxeralfs 
##                  28                   1
sort(table(subset2$taxpartsize), decreasing = TRUE)
## 
##  loamy-skeletal clayey-skeletal            fine      fine-loamy 
##              25               2               1               1
# check year described and taxpartsize
table(subset2$taxpartsize, substr(subset2$obs_date, 0, 4))
##                  
##                   2007 2008 2009
##   clayey-skeletal    1    0    1
##   fine               1    0    0
##   fine-loamy         1    0    0
##   loamy-skeletal    19    1    5
# a double equal sign '==' is used for exact character or numeric criteria
subset3 <- subset(subset2, taxpartsize == 'loamy-skeletal')

table(subset3$taxpartsize)
## 
## loamy-skeletal 
##             25
par(mar = c(0, 0, 2, 1))
plotSPC(subset3[1:12, ], print.id = FALSE)
title('Loamy-skeletal Ultic Haploxeralfs')

2.6.6 Dates and Times

It is important to recognize that date/time is treated as a special format in R. The Unix time is a system for describing a point in time. It is the number of seconds that have elapsed since the Unix epoch, minus leap seconds; the Unix epoch is 00:00:00 UTC on 1 January 1970.

Above, we use the fact that you can logically compare dates and times if their string representation is converted to a common base R UNIX time representation known as POSIXlt.

This conversion accounts for important things like timezone, using your current locale – which is important to keep in mind.

2.7 fetchNASIS data checks

fetchNASIS does a lot of the hard work for you by pulling out a variety of child tables and joining them to site/horizon tables

This results in a SoilProfileCollection object that has many of the NASIS data columns. In order to do this, a variety of processes are run. Output is generated that you should look at. We will walk through the most common parts.

Analysis of Site Coordinate Data for pedons from MT647

multiple horizontal datums present, consider using WGS84 coordinates (x_std, y_std)

Mixing of multiple colors (for “default” moist and dry colors for many:1 case)

mixing dry colors ... [33 of 4108 horizons]
mixing moist colors ... [60 of 4532 horizons]

Quality control on fragments, horizon data, site data, tables

-> QC: some fragsize_h values == 76mm, may be mis-classified as cobbles [52 / 10372 records]
replacing missing lower horizon depths with top depth + 1cm ... [152 horizons]
-> QC: duplicate pedons: use `get('dup.pedon.ids', envir=soilDB.env)` for related peiid values
-> QC: horizon errors detected, use `get('bad.pedon.ids', envir=soilDB.env)` for related userpedonid values or `get('bad.horizons', envir=soilDB.env)` for related horizon designations
-> QC: pedons missing bottom hz depths: use `get('missing.bottom.depths', envir=soilDB.env)` for related pedon IDs

Warnings about default settings and handling of NULL (missing data elements or records)

Warning messages:
1: some records are missing rock fragment volume, these have been removed 
2: some records are missing artifact volume, these have been removed 
3: all records are missing artifact volume (NULL). buffering result with NA. will be converted to zero if nullFragsAreZero = TRUE.

2.7.1 Inspecting Results

Here we inspect occurrence of andic soil properties in MT647.

We will download this “selected set” from the course website as an .rda file to save you the effort of crafting your selected set just for this example.

example.data.dir <- "C:/workspace2"
example.data.path <- file.path(example.data.dir, "mt647.rda")

if(!dir.exists(example.data.dir))
  dir.create(example.data.dir, recursive = TRUE)

download.file("https://github.com/ncss-tech/stats_for_soil_survey/raw/master/data/book/02/mt647.rda", 
              destfile = example.data.path)

2.7.1.1 Load Example Data

To load the sample object data into R, just use load() and the path to the .rda file (example.data.path or "C:/workspace2/mt647.rda")

# load the sample data
example.data.dir <- "C:/workspace2"
load(file.path(example.data.dir, "mt647.rda"))
length(mt647)
## [1] 530
table(site(mt647)$andic.soil.properties, useNA = "ifany")
## 
## FALSE  TRUE  <NA> 
##     2    83   445
# get just the profiles with andic.soil.properties == TRUE
subset(mt647, andic.soil.properties)
## SoilProfileCollection with 83 profiles and 446 horizons
## profile ID: peiid  |  horizon ID: phiid 
## Depth range: 20 - 305 cm
## 
## ----- Horizons (6 / 446 rows  |  10 / 73 columns) -----
##  hzname  peiid   phiid hzdept hzdepb genhz bounddistinct boundtopo clay silt
##      Oe 828140 4005861      0      2    NA          <NA>      <NA>   NA   NA
##       E 828140 4005863      2     13    NA         clear    smooth    5   NA
##     Bs1 828140 4005859     13     23    NA         clear    smooth    8   NA
##     Bs2 828140 4005862     23     33    NA       gradual    smooth    8   NA
##      BC 828140 4005864     33     69    NA       gradual    smooth   10   NA
##      2C 828140 4005860     69    152    NA          <NA>      <NA>   10   NA
## [... more horizons ...]
## 
## ----- Sites (6 / 83 rows  |  10 / 86 columns) -----
##  siteiid  peiid pedon_id     site_id   obs_date utmzone utmeasting utmnorthing         x        y
##   845415 828140  P23-134 93MT6470134 1993-08-12      12   294091.0     5092141 -113.6569 45.95195
##   845429 828152  P92-052 92MT6470052 1992-07-30      12   274420.9     5123423 -113.9253 46.22694
##   845430 828153  P92-046 92MT6470046 1992-07-23      12   282680.7     5161752 -113.8361 46.57417
##   845432 828155  P93-053 93MT6470053 1993-08-10      12   259526.3     5079142 -114.0958 45.82389
##   845434 828157  P91-103 91MT6470103 1991-09-26      12   272221.3     5127708 -113.9558 46.26472
##   845451 828176  P89-011 89MT6470011 1989-08-02      12   276988.7     5110875 -113.8861 46.11500
## [... more sites ...]
## 
## Spatial Data: [EMPTY]

We can compare this to what we see in the NASIS Pedon Diagnostic Features table:

Any profiles that have logic errors detected are stored in soilDB.env bad.pedon.ids variable after you run fetchNASIS:

# these are the troublesome pedon IDs
get('bad.pedon.ids', envir = soilDB.env)
##  [1] "P93-037" "P90-008" "P90-004" "P90-009" "P93-058" "P93-059" "P90-025" "P90-002" "P90-012" "P92-001"
## [11] "P92-085" "P91-105" "P90-019" "P75-006" "P90-010" "P90-015" "P91-094" "P92-029" "P92-076" "P93-026"
## [21] "P93-035" "P93-063" "P93-064" "P93-041" "P93-043" "P93-044" "P93-083" "P93-112" "P93-113" "P93-124"
## [31] "P93-001" "P96-007" "P91-025" "P93-078" "P92-044" "P91-112" "P92-038" "P90-018" "P93-057" "P93-084"
## [41] "P91-059" "P90-016" "P92-063" "P92-048" "P93-052" "F01-230" "F95-420" "F95-114" "F96-205"
# the bad pedons are in mt647 with rmHzErrors = FALSE
mt647$pedon_id %in% get('bad.pedon.ids', envir = soilDB.env)
##   [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
##  [18] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
##  [35]  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [52]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE
##  [69] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [86]  TRUE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [103] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [120] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [137] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE
## [154] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
## [171] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE
## [188] FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
## [205] FALSE  TRUE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [222] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [239]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [256] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [273] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE
## [290]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE
## [307] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [324] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [341] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
## [358] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [375] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [392] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [409] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [426] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [443]  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [460] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [477] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [494] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
## [511] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [528]  TRUE FALSE  TRUE

2.7.1.2 Logic Checks for the SoilProfileCollection

The aqp package has several functions that do logic checks on SoilProfileCollection objects.

The main method that does this in aqp is checkHzDepthLogic which returns a data.frame of results of running four logic tests on the horizon data from each profile.

Checks for:

  1. bottom depths less than top depth / bad top depth order ("depthLogic")

  2. bottom depths equal to top depth ("sameDepth")

  3. overlaps and gaps ("overlapOrGap")

  4. missing depths ("missingDepth")

logic_tests <- checkHzDepthLogic(mt647)

# look at first few (look OK; valid == TRUE)
head(logic_tests)
##    peiid valid depthLogic sameDepth missingDepth overlapOrGap
## 1 828138  TRUE      FALSE     FALSE        FALSE        FALSE
## 2 828139  TRUE      FALSE     FALSE        FALSE        FALSE
## 3 828140  TRUE      FALSE     FALSE        FALSE        FALSE
## 4 828141  TRUE      FALSE     FALSE        FALSE        FALSE
## 5 828142  TRUE      FALSE     FALSE        FALSE        FALSE
## 6 828143  TRUE      FALSE     FALSE        FALSE        FALSE
# these all have overlapOrGap errors
head(logic_tests[!logic_tests$valid, ])
##     peiid valid depthLogic sameDepth missingDepth overlapOrGap
## 11 828148 FALSE      FALSE     FALSE        FALSE         TRUE
## 23 828160 FALSE      FALSE     FALSE        FALSE         TRUE
## 25 828162 FALSE      FALSE     FALSE        FALSE         TRUE
## 34 828171 FALSE      FALSE     FALSE        FALSE         TRUE
## 35 828172 FALSE      FALSE     FALSE        FALSE         TRUE
## 36 828173 FALSE      FALSE     FALSE        FALSE         TRUE
# join the logic test data into the site table
site(mt647) <- logic_tests

Use the $valid vector to subset to find the “bad” (logic_tests$valid == FALSE)

bad.profiles <- subset(mt647, !valid)
bad.profiles
## SoilProfileCollection with 49 profiles and 338 horizons
## profile ID: peiid  |  horizon ID: phiid 
## Depth range: 15 - 152 cm
## 
## ----- Horizons (6 / 338 rows  |  10 / 73 columns) -----
##  hzname  peiid   phiid hzdept hzdepb genhz bounddistinct boundtopo clay silt
##      Oe 828148 4005908      0      3    NA          <NA>      <NA>   NA   NA
##      Oi 828148 4005907      3      5    NA          <NA>      <NA>   NA   NA
##       E 828148 4005905      5     15    NA         clear    smooth    5   NA
##      Bs 828148 4005911     15     23    NA        abrupt    smooth    5   NA
##      2E 828148 4005909     23     25    NA         clear    smooth    8   NA
##      2E 828148 4005906     25     58    NA       gradual    smooth   10   NA
## [... more horizons ...]
## 
## ----- Sites (6 / 49 rows  |  10 / 91 columns) -----
##   peiid siteiid pedon_id     site_id   obs_date utmzone utmeasting utmnorthing         x        y
##  828148  845423  P93-037 93MT6470037 1993-07-28      12   272869.5     5126695 -113.9469 46.25583
##  828160  845435  P90-008 90MT6470008 1990-07-12      12   274656.5     5129224 -113.9250 46.27917
##  828162  845437  P90-004 90MT6470004 1990-07-16      12   273885.5     5128079 -113.9344 46.26861
##  828171  845446  P90-009 90MT6470009 1990-07-24      12   274813.1     5129404 -113.9230 46.28083
##  828172  845447  P93-058 93MT6470058 1993-08-19      12   265401.3     5110256 -114.0356 46.10555
##  828173  845448  P93-059 93MT6470059 1993-08-19      12   265446.7     5110316 -114.0350 46.10611
## [... more sites ...]
## 
## Spatial Data: [EMPTY]

And the “good.” (logic_tests$valid == TRUE)

good.profiles <- subset(mt647, logic_tests$valid)
good.profiles
## SoilProfileCollection with 481 profiles and 2536 horizons
## profile ID: peiid  |  horizon ID: phiid 
## Depth range: 14 - 1552 cm
## 
## ----- Horizons (6 / 2536 rows  |  10 / 73 columns) -----
##  hzname  peiid   phiid hzdept hzdepb genhz bounddistinct boundtopo clay silt
##       O 828138 4005848      0      5    NA          <NA>      <NA>   NA   NA
##       E 828138 4005852      5     18    NA         clear    smooth   12   NA
##     Bw1 828138 4005850     18     38    NA       gradual      wavy   14   NA
##     Bw2 828138 4005849     38     51    NA       gradual      wavy   12   NA
##      BC 828138 4005853     51     71    NA         clear      wavy   12   NA
##       R 828138 4005851     71     81    NA          <NA>      <NA>   NA   NA
## [... more horizons ...]
## 
## ----- Sites (6 / 481 rows  |  10 / 91 columns) -----
##   peiid siteiid pedon_id     site_id   obs_date utmzone utmeasting utmnorthing         x        y
##  828138  845413  P91-043 91MT6470043 1991-07-11      12   268288.0     5068425 -113.9781 45.73056
##  828139  845414  P91-029 91MT6470029 1991-06-27      12   280623.2     5147302 -113.8561 46.44361
##  828140  845415  P23-134 93MT6470134 1993-08-12      12   294091.0     5092141 -113.6569 45.95195
##  828141  845416  P93-025 93MT6470025 1993-07-08      12   279148.0     5151960 -113.8775 46.48500
##  828142  845417  P93-075 93MT6470075 1993-09-14      12   261086.6     5080380 -114.0764 45.83556
##  828143  845418  P93-074 93MT6470074 1993-09-14      12   261036.4     5080197 -114.0769 45.83389
## [... more sites ...]
## 
## Spatial Data: [EMPTY]

2.8 Extended Data Functions

Additional data related to both site and horizon information can be fetched using the get_extended_data_from_NASIS() function. This function returns a named list() with several tables that fetchNASIS draws from–in addition to the typical Site/Pedon/Component/Horizon tables.

There are a variety of calculated fields that are included in the default fetchNASIS result based on the extended data–you can make use of these to simplify queries and aggregations for a variety of analyses… And then follow up with more detailed data as needed.

2.8.1 Elements of get_extended_data_from_NASIS()

  • Ecological Site History ("ecositehistory")

  • Diagnostic Features ("diagnostic")

  • Diagnostic Feature TRUE/FALSE Summary ("diagHzBoolean")

  • Restrictions ("restriction")

  • Fragment and Texture Summaries

    • Horizon Fragments ("frag_summary")

    • Horizon Artifacts ("art_summary")

    • Surface Fragments ("surf_frag_summary")

    • Texture Class Modifiers ("texmodifier")

  • Geomorphic Table Summaries ("geomorph")

  • Parent Material Summaries ("pm")

  • Taxonomic History ("taxhistory")

  • Site Text Notes w/ Photo links("photo")

  • Horizon Structure ("struct")

  • Horizon Designation ("hzdesgn")

2.8.2 Load Example Data

Below is a summary of additional information that can be readily brought into R from your NASIS selected set via the get_extended_data_from_NASIS() function.

To download the sample 2015MT663% data from the course page with R:

example.data.dir <- "C:/workspace2"
example.data.path <- file.path(example.data.dir, "mt663.rda")

if(!dir.exists(example.data.dir))
  dir.create(example.data.dir, recursive = TRUE)

download.file("https://github.com/ncss-tech/stats_for_soil_survey/raw/master/data/book/02/mt663.rda", 
              destfile = example.data.path)

Before continuing, imagine opening the NASIS client, populating your selected set with 2015MT663% using a query like “NSSC Pangaea – POINT-Pedon/Site by User Pedon ID

Load the data like we did above.

# load the sample data
example.data.dir <- "C:/workspace2"
load(file.path(example.data.dir, "mt663.rda"))
## fetch extended site and horizon data from local NASIS
# mt663ext <- get_extended_data_from_NASIS_db()

We could use the get_extended_data_from_NASIS_db() function if 2015MT663% or other data were in the selected set, but we will use the mt663ext data we loaded from the .rda file.

The column names are the names of variables that you could join to your site or horizon data by various means. Generally these variable names, with a few exceptions, mirror the NASIS 7 data model names.

# site and pedon related extended data

# list all dataframes in the extended data
str(mt663ext, 1)
## List of 15
##  $ ecositehistory   :'data.frame':	0 obs. of  5 variables:
##  $ diagnostic       :'data.frame':	292 obs. of  4 variables:
##  $ diagHzBoolean    :'data.frame':	115 obs. of  20 variables:
##  $ restriction      :'data.frame':	11 obs. of  8 variables:
##  $ frag_summary     :'data.frame':	561 obs. of  18 variables:
##  $ frag_summary_v2  :'data.frame':	580 obs. of  18 variables:
##  $ art_summary      :'data.frame':	561 obs. of  14 variables:
##  $ surf_frag_summary:'data.frame':	134 obs. of  10 variables:
##  $ texmodifier      :'data.frame':	622 obs. of  5 variables:
##  $ geomorph         :'data.frame':	241 obs. of  7 variables:
##  $ taxhistory       :'data.frame':	115 obs. of  20 variables:
##  $ photo            :'data.frame':	793 obs. of  5 variables:
##  $ pm               :'data.frame':	179 obs. of  10 variables:
##  $ struct           :'data.frame':	444 obs. of  6 variables:
##  $ hzdesgn          :'data.frame':	561 obs. of  19 variables:
# vegetation data summary
colnames(mt663ext$ecositehistory) 
## [1] "siteiid"         "ecositeid"       "ecositenm"       "ecositecorrdate" "es_classifier"
# diagnostic features
colnames(mt663ext$diagnostic) 
## [1] "peiid"    "featkind" "featdept" "featdepb"
# surface rock fragments
colnames(mt663ext$surf_frag_summary)
##  [1] "peiid"               "surface_fgravel"     "surface_gravel"      "surface_cobbles"    
##  [5] "surface_stones"      "surface_boulders"    "surface_channers"    "surface_flagstones" 
##  [9] "surface_paragravel"  "surface_paracobbles"
# geomorphic description
colnames(mt663ext$geomorph)
## [1] "peiid"        "geomfmod"     "geomfname"    "geomfeatid"   "existsonfeat" "geomfiidref"  "geomftname"
# taxonomic history data
colnames(mt663ext$taxhistory)
##  [1] "peiid"          "classdate"      "classifier"     "classtype"      "taxonname"      "localphase"    
##  [7] "taxonkind"      "seriesstatus"   "taxpartsize"    "taxorder"       "taxsuborder"    "taxgrtgroup"   
## [13] "taxsubgrp"      "soiltaxedition" "osdtypelocflag" "taxmoistcl"     "taxtempregime"  "taxfamother"   
## [19] "psctopdepth"    "pscbotdepth"
# linked photo stored in site textnotes
colnames(mt663ext$photo) 
## [1] "siteiid"   "recdate"   "textcat"   "imagepath" "imagename"
# site parent materials
colnames(mt663ext$pm)
##  [1] "siteiid"      "seqnum"       "pmorder"      "pmdept"       "pmdepb"       "pmmodifier"   "pmgenmod"    
##  [8] "pmkind"       "pmorigin"     "pmweathering"
###
### horizon related extended data
### 

# rock fragments 
colnames(mt663ext$frag_summary) 
##  [1] "phiid"                "fine_gravel"          "gravel"               "cobbles"             
##  [5] "stones"               "boulders"             "channers"             "flagstones"          
##  [9] "parafine_gravel"      "paragravel"           "paracobbles"          "parastones"          
## [13] "paraboulders"         "parachanners"         "paraflagstones"       "unspecified"         
## [17] "total_frags_pct_nopf" "total_frags_pct"
# soil texture modifers
colnames(mt663ext$texmodifier)
## [1] "peiid"  "phiid"  "phtiid" "seqnum" "texmod"
# soil structure data
colnames(mt663ext$struct) 
## [1] "phiid"         "structgrade"   "structsize"    "structtype"    "structid"      "structpartsto"

2.8.3 Visualizing Common Landforms

The following code generates a simple graphical summary of the 10 most commonly occurring "landform_string" (a calculated field in fetchNASIS()) to inspect which are the most common.

# load data from a NASIS selected set (or sample object)
pedons <- mt663

# create 'lf' object of landform factors sorted in descending order
lf <- sort(table(pedons$landform_string), decreasing = TRUE)

# plot top 10 or length, whichever is shorter
Hmisc::dotchart2(lf[1:pmin(10, length(lf))],
                 col = 'black',
                 xlim = c(0, max(lf)),
                 cex.labels = 0.75)

2.8.4 Diagnostic Features

2.8.4.1 Boolean Diagnostic Features in fetchNASIS

If diagnostic features are populated in the Pedon Diagnostic Features table in NASIS, then Boolean (TRUE or FALSE) fields are created for each diagnostic feature type found in the data brought in by fetchNASIS.

These fields can be used to model presence / absence of a diagnostic soil feature by extracting the site data from the SoilProfileCollection with site().

2.8.4.2 Thickness from Diagnostic Features Table

The following is an example of how you could use the diagnostic features (if populated!) from the extended data to determine the thickness of a diagnostic feature of interest.

# get diagnostic features associated with pedons loaded from selected set
d <- diagnostic_hz(mt663)

# summary of the diagnostic features in your data!
unique(d$featkind)
##  [1] "ochric epipedon"          "cambic horizon"           "lithic contact"          
##  [4] "mollic epipedon"          "argillic horizon"         "redox concentrations"    
##  [7] "andic soil properties"    "secondary carbonates"     "sapric soil materials"   
## [10] "aquic conditions"         "reduced matrix"           "albic horizon"           
## [13] "spodic horizon"           "glossic horizon"          "spodic materials"        
## [16] "lithologic discontinuity" "densic materials"         "umbric epipedon"         
## [19] "albic materials"          NA
# tabulate
sort(table(droplevels(factor(d$featkind))), decreasing = TRUE)
## 
##          ochric epipedon           cambic horizon         argillic horizon          mollic epipedon 
##                       61                       54                       43                       42 
##    andic soil properties           lithic contact     secondary carbonates          umbric epipedon 
##                       30                       20                        7                        7 
##            albic horizon           spodic horizon          glossic horizon           reduced matrix 
##                        6                        4                        3                        3 
##    sapric soil materials lithologic discontinuity          albic materials         aquic conditions 
##                        3                        2                        1                        1 
##         densic materials     redox concentrations         spodic materials 
##                        1                        1                        1
# subset argillic horizons
d <- d[d$featkind == 'argillic horizon', ]

# create a new column and subtract the upper from the lower depth
d$argillic_thickness_cm <- d$featdepb - d$featdept

# create another new column with the upper depth to the diagnostic feature
d$depth_to_argillic_cm <- d$featdept

# omit NA values
d <- na.omit(d)

# subset to pedon records IDs and calculated thickness
d <- d[, c('peiid', 'argillic_thickness_cm', 'depth_to_argillic_cm')]
head(d)
##      peiid argillic_thickness_cm depth_to_argillic_cm
## 7  1092610                    56                   30
## 24 1092617                    38                   34
## 26 1092618                    29                   23
## 28 1092619                    38                   32
## 30 1092620                    29                   24
## 33 1092621                    23                   19
# left-join with existing site data
site(mt663) <- d

# plot as histogram
par(mar = c(4.5, 4.5, 1, 1))

# note additional arguments to adjust figure labels
hist(
  mt663$argillic_thickness_cm, 
  xlab = 'Thickness of argillic (cm)', 
  main = '',
  las = 1
)

hist(
  mt663$depth_to_argillic_cm, 
  xlab = 'Depth to argillic top depth (cm)', 
  main = '',
  las = 1
)

Quick check: What can you do with the boolean diagnostic feature data stored in the site table of a fetchNASIS SoilProfileCollection?

2.8.4.3 Diagnostic Feature Diagrams

# work up diagnostic plot based on the mt663 dataset loaded above
library(aqp)
library(soilDB)
library(sharpshootR)

# can limit which diagnostic features to show by setting 'v' manually
v <- c('ochric.epipedon', 'mollic.epipedon', 'andic.soil.properties', 
       'argillic.horizon', 'cambic.horizon', 
       'lithic.contact')

# the default concatenated landform_string may have multiple levels
#  depending on how the geomorphic tables were populated
#  these are concatenated using the ampersand (&) character
#  so take the first string split using ampersand as a delimiter
mt663$first_landform <- sapply(strsplit(mt663$landform_string, "&"), 
                                  function(x) x[[1]])

# plot with diagnostic features ordered according to co-occurrence
# v: site-level attributes to consider
# k: number of clusters to identify
diagnosticPropertyPlot(
  mt663[1:30, ], v = v, k = 5,
  grid.label = 'site_id', 
  dend.label = 'first_landform', 
  sort.vars = TRUE
)

2.8.5 Comparing Horizon Data with Diagnostic “Intervals”

Here is a demo of using Soil Data Access and the soilDB function fetchSDA to inspect the horizon data / properties in portions of the profile corresponding to diagnostic features:

2.8.5.1 Follow along with your own data

Use the following script to generate a diagnostic-feature diagram for the pedon data you’ve loaded from your NASIS selected set.

Select a series of diagnostic properties or automatically pull diagnostic feature columns.

library(aqp)
library(soilDB)
library(sharpshootR)

# Load data
f <- fetchNASIS(from = 'pedons')

# ... May need to use subset() to reduce the number of pedons!

# get all diagnostic feature columns from site data 
# by pattern matching on '[.]' in the site attribute names
# this is not a generic rule, but works here
idx <- grep('[.]', siteNames(f))
v <- siteNames(f)[idx]

# inspect v
v

# insert diagnostics of interest from the possible list in 'v'
v <- c('ochric.epipedon', 'cambic.horizon', 
       'argillic.horizon', 'paralithic.contact',
       'lithic.contact')

# generate diagnostic property diagram
diagnosticPropertyPlot(
  f, v = v, k = 5, 
  grid.label = 'site_id', 
  dend.label = 'taxonname'
)

For more information on generating diagnostic feature diagrams, see the following tutorial:

2.9 Customized Queries to Local NASIS Database

Queries against the NASIS local database can be written in T-SQL which is the dialect of SQL used to communicate with Microsoft SQL Server. This is the connection that you set for the pre-course.

The fetchNASIS and related convenience functions are essentially wrappers around commonly used chunks of SQL.

The following example will return all records from the sitesoiltemp table, along with a couple of fields from the site, siteobs, and pedon tables. This is a convenient way to collect all of the field-based soil temperature data associated with the pedons in your selected set for further analysis.

2.9.1 RODBC package

This example uses RODBC.

library(soilDB)
library(RODBC) 

# write query as a character string
q <- "SELECT siteiid as siteiid, peiid, usiteid as site_id, 
             upedonid as pedon_id, obsdate as obs_date,
             soitemp, soitempdep FROM site_View_1
       INNER JOIN siteobs_View_1 ON site_View_1.siteiid = siteobs_View_1.siteiidref
       LEFT OUTER JOIN sitesoiltemp_View_1 ON siteobs_View_1.siteobsiid = sitesoiltemp_View_1.siteobsiidref
       LEFT OUTER JOIN pedon_View_1 ON siteobs_View_1.siteobsiid = pedon_View_1.siteobsiidref
      ORDER BY obs_date, siteiid;"

# setup connection local NASIS
channel <- odbcDriverConnect(connection = getOption("soilDB.NASIS.credentials"))

# exec query
d <- sqlQuery(channel, q, stringsAsFactors = FALSE)

# close connection
odbcClose(channel)

# check results
str(d)

# remove records missing values
d <- na.omit(d)

# tabulate unique soil depths
table(d$soitempdep)

# extract doy of year
d$doy <- as.integer(format(d$obs_date, "%j"))

# when where measurements collected?
hist(
  d$doy,
  xlim = c(1, 366),
  breaks = 30,
  las = 1,
  main = 'Soil Temperature Measurements',
  xlab = 'Day of Year'
)

# soil temperature by day of year
plot(
  soitemp ~ doy,
  data = d,
  type = 'p',
  xlim = c(1, 366),
  ylim = c(-1, 25),
  xlab = 'Day of Year',
  ylab = 'Soil Temperature at 50cm (deg C)',
  las = 1
)

2.9.2 DBI

The DBI and odbc packages provide a modern alternative to RODBC via a more generic interface.

There are many other drivers available for DBI, for example RSQLite for SQLite databases.

DBI will soon be the database framework used in soilDB for NASIS access, which will allow for some exciting new capabilities allowing for bridging of more gaps between the NASIS data model and R.

2.10 Soil Reports

One of the strengths of NASIS is that it that has many queries and reports to access the complex data. This makes it easy for the average user to load their data, process it and run numerous reports.

The soilReports R package is essentially just a collection of R Markdown (.Rmd) documents.

R Markdown is a plain text markup format for creating reproducible, dynamic reports with R. These .Rmd files can be used to generate HTML, PDF, Word, Markdown documents with a variety of forms, templates and applications.

2.10.1 Extending soilReports

Each report in soilReports has a “manifest” that describes any required dependencies, configuration files or inputs. That means you can convert any of your own R-based analyses to the soilReports format.

The .Rmd format allows R code and text with Markdown markup to be mingled in the same document and then “executed” like an R script.

Running or expanding on one of these reports would be a fine class project – as long as you include relevant interpretation.

2.10.2 Exercise: Run Shiny Pedon Summary

The region2/shiny-pedon-summary report is an interactive Shiny-based report that uses flexdashboard to help the user subset and summarize NASIS pedons from a graphical interface.

  • You can try a ShinyApps.io demo here

The “Shiny Pedon Summary” allows one to rapidly generate reports from a large set of pedons in their NASIS selected set.

The left INPUT sidebar has numerous options for subsetting pedon data. Specifically, you can change REGEX patterns for mapunit symbol, taxon name, local phase, and User Pedon ID. Also, you can use the drop down boxes to filter on taxon kind or compare different “modal”/RV pedons.

Example: Analyzing the Loafercreek Taxadjuncts

  1. Create an instance of the region2/shiny-pedon-summary report with soilReports:
# create new instance of reports 
library(soilReports)

# set path for shiny-pedon-summary report instance
ch4.shinyped.path <- "C:/workspace2/chapter4/shiny-pedon"

# create directories (if needed)
if(!dir.exists(ch4.shinyped.path))
  dir.create(ch4.shinyped.path, recursive = TRUE)

# get report dependencies
reportSetup('region2/shiny-pedon-summary')

# copy report contents to target path
copyReport('region2/shiny-pedon-summary', outputDir = ch4.shinyped.path, overwrite = TRUE)
  1. Update the config.R file

You can update the config.R file in “C:/workspace2/chapter4/shiny-pedon” (or wherever you installed the report) to use the soilDB datasets loafercreek and gopheridge by setting demo_mode <- TRUE. This is the simplest way to demonstrate how this report works. Alternately, when demo_mode <- FALSE, pedons will be loaded from your NASIS selected set.

config.R also allows you to specify a shapefile for overlaying the points on – to determine mapunit symbol – as well as several raster data sources whose values will be extracted at point locations and summarized. The demo dataset does not use either of these by default, due to large file sizes.

Furthermore, a default (very general) set of REGEX generalized horizon patterns is provided to assign generalized horizon labels for provisional grouping. These provided patterns are unlikely to cover ALL cases, and always need to be modified for final correlation. That said, they do a decent job of making a first-pass correlation for diverse types of soils.

The default config.R settings use the general patterns: use_regex_ghz <- TRUE. You are welcome to modify the defaults. If you want to use the values you have populated in NASIS Pedon Horizon Component Layer ID, set use_regex_ghz <- FALSE.

  1. Run the report via shiny.Rmd

This report uses the Shiny flexdashboard interface. Open up shiny.Rmd and click the “Run Document” button to start the report. This will load the pedon and spatial data specified in config.R.

NOTE: If a Viewer Window does not pop-up right away, click the gear icon to the right of the “Run Document” button. Be sure the option “Preview in Window” is checked, then click “Run Document” again.

  1. All of the subset parameters are in the left-hand sidebar. Play around with all of these options – the graphs and plots in the tabs to the right will automatically update as you make changes.

  2. When you like what you have, you can export a non-interactive HTML file for your records. To do this, first, set the “Report name:” box to something informative – this will be your report output file name. Then, scroll down to the bottom of the INPUT sidebar and click “Export Report” button. Check the “output” folder (subdirectory of where you installed the report) for your results.

2.10.3 Exercise: Run Mapunit Comparison

Another popular report in soilReports is the region2/mu-comparison report.

This report uses constant density sampling (sharpshootR::constantDensitySampling()) to extract numeric and categorical values from multiple raster data sources that overlap a set of user-supplied polygons.

In this example, we clip a small portion of SSURGO polygons from the CA630 soil survey area extent. We then select a small set of mapunit symbols (5012, 3046, 7083, 7085, 7088) that occur within the clipping extent. These mapunits have soil forming factors we expect to contrast with one another in several ways. You can inspect other mapunit symbols by changing mu.set in config.R.

  1. Download the demo data:
# set up ch4 path and path for report
ch4.data.path <- "C:/workspace2/chapter4"
ch4.mucomp.path <- paste0(ch4.data.path,"/mucomp")

# create any directories that may be missing
if(!dir.exists(ch4.mucomp.path)) {
  dir.create(ch4.mucomp.path, recursive = TRUE)
}

# download raster data, SSURGO clip from CA630, and sample script for clipping your own raster data 
download.file('https://github.com/ncss-tech/stats_for_soil_survey/raw/master/data/chapter_4-mucomp-data/ch4_mucomp-data.zip', paste0(ch4.mucomp.path, '/chapter_4-mucomp-data.zip'))

unzip(paste0(ch4.mucomp.path, '/chapter_4-mucomp-data.zip'), exdir = ch4.mucomp.path, overwrite = TRUE)
  1. Create an instance of the region2/mu-comparison report with soilReports:
# create new instance of reports 
library(soilReports)

# get report dependencies
reportSetup('region2/mu-comparison')

# create report instance
copyReport('region2/mu-comparison', outputDir = ch4.mucomp.path, overwrite = TRUE)

If you want, you can now set up the default config.R that is created by copyReport() to work with your own data. OR you can use the “sample” config.R file (called new_config.R) in the ZIP file downloaded above.

  1. Run the code below to replace the default config.R with the sample config.R:
# copy config file containing relative paths to rasters downloaded above
file.copy(paste0(ch4.mucomp.path, "/new_config.R"), paste0(ch4.mucomp.path,"/config.R"), overwrite = TRUE)
  1. Open report.Rmd in the C:/workspace2/chapter4/mucomp folder and click the “Knit” button at the top of the RStudio source pane to run the report.

  2. Inspect the report output HTML file, as well as the spatial and tabular data output in the output folder.

Question: What are the major differences that you can see, based on the report, between the five different mapunit symbols that were analysed?

2.10.4 Exercise: Run Lab Summary By Taxon Name Soil Report

For another exercise you can use the region11/lab_summary_by_taxonname report report to summarize laboratory data for a soil series. This report requires you to get some data from the Pedon “NCSS Lab” tables in NASIS.

2.10.4.1 Requirements

  • Data are properly populated, otherwise the report may fail. Common examples include: - Horizon depths don’t lineup - Either the Pedon or Site tables isn’t loaded
  • ODBC connection to NASIS is set up
  • Beware each report has a unique configuration file that may need to be edited.

2.10.4.2 Instructions

  1. Load your NASIS selected set. Run a query such as “POINT - Pedon/Site/NCSSlabdata by upedonid and Taxon Name” from the Region 11 report folder to load your selected set. Be sure to target both the site, pedon and lab layer tables. Remove from your selected set the pedons and sites you wish to exclude from the report.

  2. Install/re-install the soilReports package. This package is updated regularly (e.g. weekly), and should be installed from GitHub regularly.

# Install the soilReports package from GitHub
remotes::install_github("ncss-tech/soilReports", dependencies=FALSE, build=FALSE)
  1. View the list of available reports.
# Load the soilReports 
library(soilReports)

# List reports
listReports()
##                                     name version
## 1  region11/component_summary_by_project     0.1
## 2      region11/lab_summary_by_taxonname     1.0
## 3  region11/mupolygon_summary_by_project     0.1
## 4    region11/pedon_summary_by_taxonname       1
## 5                       region2/dmu-diff     0.7
## 6                    region2/dmu-summary    beta
## 7        region2/mlra-comparison-dynamic     0.1
## 8                region2/mlra-comparison     1.0
## 9        region2/mu-comparison-dashboard   0.0.0
## 10                 region2/mu-comparison   3.4.0
## 11                    region2/mu-summary       1
## 12                 region2/pedon-summary     0.9
## 13                    region2/QA-summary     0.6
## 14           region2/shiny-pedon-summary     1.0
## 15                     templates/minimal     1.0
##                                                                                   description
## 1                                                summarize component data for an MLRA project
## 2                                               summarize lab data from NASIS Lab Layer table
## 3                                                summarize mupolygon layer from a geodatabase
## 4                                               summarize field pedons from NASIS pedon table
## 5                                                              Differences between select DMU
## 6                                                                          DMU Summary Report
## 7                          compare MLRA/LRU-scale delineations, based on mu-comparison report
## 8                                        compare MLRA using pre-made, raster sample databases
## 9  interactively subset and summarize SSURGO data for input to `region2/mu-comparison` report
## 10          compare stack of raster data, sampled from polygons associated with 1-8 map units
## 11                          summarize raster data for a large collection of map unit polygons
## 12                            Generate summaries from pedons (NASIS) and associated GIS data.
## 13                                                                          QA Summary Report
## 14             Interactively subset and summarize NASIS pedon data from one or more map units
## 15                                                             A minimal soilReports template
  1. Copy the lab summary to your working directory.
copyReport(reportName = "region11/lab_summary_by_taxonname", outputDir = "C:/workspace2/lab_sum")
  1. Examine the report folder contents.

The report is titled report.Rmd. Notice there are several other support files. The parameters for the report are contained in the config.R file.

  1. Check or create a genhz_rules file for a soil series. In order to aggregate the pedons by horizon designation, a genhz_rules file (e.g., Miami_rules.R) is needed. See above.

If none exists see the following job aid on how to create one, Assigning Generalized Horizon Labels.

Pay special attention to how caret ^ and dollar $ symbols are used in REGEX. They function as anchors for the beginning and end of the string, respectively.

  • A ^ placed before an A horizon, ^A, will match any horizon designation that starts with A, such as Ap, Ap1, but not something merely containing A, such as BA.

  • Placing a $ after a Bt horizon, Bt$, will match any horizon designation that ends with Bt, such as 2Bt or 3Bt, but not something with a vertical subdivision, such as Bt2.

  • Wrapping pattern with both ^ and $ symbols will result only in exact matches – i.e. that start and end with the contents between ^ and $. For example ^[AC]$, will only match A or C, not Ap, Ap2, or Cg.

  1. Execute the report.

Command-line approach

# Set report parameters
series <- "Miami"
genhz_rules <- "C:/workspace2/lab_sum/Miami_rules.R"

# report file path
report_path <- "C:/workspace2/lab_sum/report.Rmd"

# Run the report
render(input = report_path, 
       output_dir = "C:/workspace2", 
       output_file = "C:/workspace2/lab_sum.html", 
       envir = new.env(), 
       params = list(series = series,
                     genhz_rules = genhz_rules
                     )
       )

Manual approach

Open the report.Rmd, hit the Knit drop down arrow, and select Knit with Parameters.

  1. Save the report. The report is automatically saved upon creation in the same folder as the R report. However, it is given the same generic name as the R report (i.e., “C:/workspace/lab_sum/report.html”), and will be overwritten the next time the report is run. Therefore, if you wish to save the report, rename the .html file to a name of your choosing and/or convert it to a PDF.

Also, beware when opening the .html file with Internet Explorer – be sure to click on “Allow blocked content” if prompted. Otherwise, Internet Explorer will alter the formatting of tables etc. within the document.

Sample pedon report

Brief summary of steps:

  • Load your selected set with the pedon and site table for an existing GHL file, or make your own (highly encouraged)
  • Run the lab_summary_by_taxonname.Rmd report on a soil series of your choice.
  • Show your work and submit the results to your mentor.