Chapter 2 The Data We Use

2.1 Objectives (The Data We Use)

  • Expand on basic R skills from Chapter 1
    • Inspect and work with different data types
    • Perform operations on data such as filtering and aggregation
    • Begin to explore regular expression (regex) patterns for text data
    • Learn how functions can be used to bundle operations
  • Work with Soil Data Sources and Structures
    • Use the soilDB package to load data into R
    • Understand the SoilProfileCollection (SPC) object
    • Learn about the data checks in the fetchNASIS() function

2.2 The Structure of Soil Data

What if you could extract, organize, and visualize data from NASIS and many other commonly used soil database sources with a couple of lines of code?

The aqp (Algorithms for Quantitative Pedology) and soilDB packages enable data to be fetched from various sources and cast into a SoilProfileCollection (SPC) object. Tabular and spatial data objects fetched via soilDB and processed using aqp methods can simplify the process of working with commonly used soil data.

2.2.1 Package References

The manual pages for soilDB and aqp are accessible (click index at the bottom of the Help tab in RStudio) by entering the following into the R console:

# load the libraries
library(aqp)
library(soilDB)

2.2.2 Importance of Pedon Data

The importance of pedon data for present and future work cannot be overstated. These data represent decades of on-the-ground observations of the soil resource for a given area.

As difficult as it may be to take the time to enter legacy pedon data, it is vitally important that we capture this resource and get these data into NASIS as an archive of point observations.

2.2.3 Some Issues With Pedon Data

  • Making and documenting observations of soil requires hard work. Digging is difficult, and writing soil descriptions is time consuming!

  • Our confidence in observations typically weakens with the depth of the material described.

    • If we acknowledge this, which we must, then how do we deal with it in pedon data?
      • Use a cutoff depth, for example 100 cm, can be used to truncate observations to a zone of greater confidence.
      • Show the relative confidence of the data with depth.

2.3 Challenges with Pedon Data

  • Consistency
    • Missing data
  • Confidence in the observations
    • Uncertainty with depth
  • Description style differences
    • Depth described, horizonation usage styles
  • Legacy data vintage
    • Decadal span of data
    • Taxonomy updates, horizon nomenclature changes
  • Location confidence
    • Origin of the location information
    • Datum used for data collection
    • Accuracy for GPS values at the time of data collection

2.4 The SoilProfileCollection

The SoilProfileCollection class (SPC) provided by the aqp package is a specialized structure for soil data analysis. It simplifies the process of working with collections of data associated with soil profiles, e.g., site-level, horizon-level, spatial, diagnostic horizons, and other metadata.

A SoilProfileCollection is similar to the NASIS Site/Pedon “object” in that it provides generalizations, specific routines and rules about data tables and their relationships.

The SoilProfileCollection is an S4 R object. S4 objects have slots. Of primary importance, are the slots for site-level and horizon-level data.

In many ways the SPC is more adaptable than the NASIS “Pedon” concept because it is more general. However, the SPC is not as expressive as the complex hierarchy of objects in NASIS, which are more aligned with data archival vs. analysis.

2.4.1 SoilProfileCollection methods

Many “familiar” methods are defined for the SoilProfileCollection object. Some are unique, and others operate like more common functions of vector and data.frame objects, such as nrow() (“how many horizons?”) or length() (“how many sites/pedons?”).

Perhaps most importantly, when you access the site data (with site(<object>)) or the horizon data (with horizons(<object>)) of a SoilProfileCollection, you get a data.frame object that you can use like any other you might use or make in R.

2.4.1.1 Promoting a data.frame to SoilProfileCollection

The SoilProfileCollection object is a collection of 1-dimensional profile descriptions, of the type conventionally described on a Form 232, or of tabular data returned from laboratory.

The object is “horizon data forward” in that you start with the layers, and can add or create site-level attributes by normalization, joins, and calculation.

Most of the time if you are using your NASIS data, or an official database, there are defined ways of getting the data “into” an SPC. For example, fetchOSD returns a SoilProfileCollection that has been assembled from horizon and site level attributes gleaned from the OSDs text, Soil Classification database, and other sources.

In the pre-course, we had you set up a process so you could connect to your local NASIS instance to “fetch” data and have methods like fetchNASIS put things together for you.

This input to make a SoilProfileCollection can be represented most simply as a data.frame with unique site or profile ID and depth combinations for each horizon or layer–for example, a subset of the phorizon or chorizon table in NASIS.

A simple demonstration of “tabular horizon data” is the sp4 data set bundled with aqp: some serpentine soil profiles stored in a data.frame in the aqp package (after McGahan et al., 2009).

library(aqp)

# Load sample serpentine soil data (McGahan et al., 2009)
data(sp4, package = "aqp")

# this is a data.frame
# same as if loaded from CSV file etc.
class(sp4)
## [1] "data.frame"
# inspect the first couple of rows
head(sp4)
##       id name top bottom   K   Mg  Ca CEC_7 ex_Ca_to_Mg sand silt clay   CF
## 1 colusa    A   0      3 0.3 25.7 9.0  23.0        0.35   46   33   21 0.12
## 2 colusa  ABt   3      8 0.2 23.7 5.6  21.4        0.23   42   31   27 0.27
## 3 colusa  Bt1   8     30 0.1 23.2 1.9  23.7        0.08   40   28   32 0.27
## 4 colusa  Bt2  30     42 0.1 44.3 0.3  43.0        0.01   27   18   55 0.16
## 5  glenn    A   0      9 0.2 21.9 4.4  18.8        0.20   54   20   25 0.55
## 6  glenn   Bt   9     34 0.3 18.9 4.5  27.5        0.20   49   18   34 0.84

To convert this horizon data into a SoilProfileCollection, we need to identify three parameters: idname, top, and bottom. These parameters refer to the columns of unique profile IDs, top depths and bottom depths, respectively.

There are a couple of important constraints and considerations:

  • records (rows) represent horizons
  • profiles are uniquely identified by a column (user pedon ID, pedon record ID, etc.)
  • profiles IDs cannot contain missing values (NA)
  • horizon top and bottom depths are identified by columns
  • ideally there are no gaps, overlap, or missing top/bottom depths (more on that later)

Use a formula to specify column names in the data.frame, in this case "id", "top" and "bottom".

#      profile ID ~ top depth + bottom depth
depths(sp4) <- id ~ top + bottom

# note new class
class(sp4)
## [1] "SoilProfileCollection"
## attr(,"package")
## [1] "aqp"
# compact summary
sp4
## SoilProfileCollection with 10 profiles and 30 horizons
## profile ID: id  |  horizon ID: hzID 
## Depth range: 16 - 49 cm
## 
## ----- Horizons (6 / 30 rows  |  10 / 14 columns) -----
##      id hzID top bottom name   K   Mg  Ca CEC_7 ex_Ca_to_Mg
##  colusa    1   0      3    A 0.3 25.7 9.0  23.0        0.35
##  colusa    2   3      8  ABt 0.2 23.7 5.6  21.4        0.23
##  colusa    3   8     30  Bt1 0.1 23.2 1.9  23.7        0.08
##  colusa    4  30     42  Bt2 0.1 44.3 0.3  43.0        0.01
##   glenn    5   0      9    A 0.2 21.9 4.4  18.8        0.20
##   glenn    6   9     34   Bt 0.3 18.9 4.5  27.5        0.20
## [... more horizons ...]
## 
## ----- Sites (6 / 10 rows  |  1 / 1 columns) -----
##         id
##     colusa
##      glenn
##      kings
##   mariposa
##  mendocino
##       napa
## [... more sites ...]
## 
## Spatial Data:
## [EMPTY]

The formula expresses the idea that a profile id defined by set of top and bottom depths.

NOTE: A character vector with same names has the same effect, and can be easier to “program” with than the formula-based syntax.

depths(sp4) <- c("id", "top", "bottom")

2.4.2 Promoting to “Spatial” SoilProfileCollection

You can also use the SoilProfileCollection to manage the information about a profile’s position on the Earth.

Chapter 4 will cover spatial data in greater detail, and the SoilProfileCollection Reference has a section on Spatial Data. For now know you can use the initSpatial<- method to define “X” and “Y” coordinate columns and the coordinate reference system in one line:

sp4$x <- runif(10); sp4$y <- runif(10) # dummy XY coordinates
initSpatial(sp4, crs = "EPSG:4326") <- ~ x + y

This is new syntax introduced in aqp 2.0, the older syntax uses the coordinates<- and proj4string<- methods.

2.4.2.1 Extracting Site and Horizon Data

You can extract values from the collection’s @site and @horizon slots using the site() and horizons() functions. These create data.frame objects that are separate from the SoilProfileCollection.

# extract site data from SPC into new data.frame 's'
# note that it only contains profile IDs
s <- site(sp4)
str(s)
## 'data.frame':    10 obs. of  1 variable:
##  $ id: chr  "colusa" "glenn" "kings" "mariposa" ...
# extract horizon data from SPC into new data.frame 'h'
h <- horizons(sp4)
str(h)
## 'data.frame':    30 obs. of  14 variables:
##  $ id         : chr  "colusa" "colusa" "colusa" "colusa" ...
##  $ name       : chr  "A" "ABt" "Bt1" "Bt2" ...
##  $ top        : int  0 3 8 30 0 9 0 4 13 0 ...
##  $ bottom     : int  3 8 30 42 9 34 4 13 40 3 ...
##  $ K          : num  0.3 0.2 0.1 0.1 0.2 0.3 0.2 0.6 0.8 0.6 ...
##  $ Mg         : num  25.7 23.7 23.2 44.3 21.9 18.9 12.1 12.1 17.7 28.3 ...
##  $ Ca         : num  9 5.6 1.9 0.3 4.4 4.5 1.4 7 4.4 5.8 ...
##  $ CEC_7      : num  23 21.4 23.7 43 18.8 27.5 23.7 18 20 29.3 ...
##  $ ex_Ca_to_Mg: num  0.35 0.23 0.08 0.01 0.2 0.2 0.58 0.51 0.25 0.2 ...
##  $ sand       : int  46 42 40 27 54 49 43 36 27 42 ...
##  $ silt       : int  33 31 28 18 20 18 55 49 45 26 ...
##  $ clay       : int  21 27 32 55 25 34 3 15 27 32 ...
##  $ CF         : num  0.12 0.27 0.27 0.16 0.55 0.84 0.5 0.75 0.67 0.25 ...
##  $ hzID       : chr  "1" "2" "3" "4" ...

2.4.2.2 Methods like data.frame

The base R functions for accessing and setting data.frame columns by name such as $ and [[ work for SoilProfileCollection objects, too. Review data.frame methods:

  • [[ and $: single columns in data.frame, by name
    • x[['variable']]
    • x$variable
  • [: combinations of rows and columns, by name or index
    • x[i, ]: specified rows, all columns
    • x[, j]: all rows, specified columns
    • x[i, j]: specified rows, specified columns

See Chapter 1 and the Chapter 2 Appendix for additional details and examples.

2.4.2.2.1 Column Access by Name: $ and [[
# sp4 is a SoilProfileCollection
sp4$clay
##  [1] 21 27 32 55 25 34  3 15 27 32 25 31 33 13 21 23 15 17 12 19 14 14 22 25 40 51 67 24 25 32
sp4[['clay']]
##  [1] 21 27 32 55 25 34  3 15 27 32 25 31 33 13 21 23 15 17 12 19 14 14 22 25 40 51 67 24 25 32
# horizon data.frame
h$clay
##  [1] 21 27 32 55 25 34  3 15 27 32 25 31 33 13 21 23 15 17 12 19 14 14 22 25 40 51 67 24 25 32
h[['clay']]
##  [1] 21 27 32 55 25 34  3 15 27 32 25 31 33 13 21 23 15 17 12 19 14 14 22 25 40 51 67 24 25 32
# use $<- or [[<- to set proportional clay content
sp4$clay <- sp4[['clay']] / 100

# undo what we did above; back to percentage
sp4[['clay']] <- sp4$clay * 100

# create new site variable ("numberone" recycled for all sites)
site(sp4)$newvar1 <- "numberone"

# create new horizon variable ("numbertwo" recycled for all horizons)
horizons(sp4)$newvar2 <- "numbertwo"
2.4.2.2.2 Row Access: [

The SoilProfileCollection also has [ (“single bracket”), but with a different interpretation from the [i, j] indexing of data.frame objects.

  • In a data.frame you have object[row, column, drop=TRUE]; the result is a data.frame (or a vector with default drop=TRUE).

  • In a SoilProfileCollection you have object[site, horizon]; the result is a SoilProfileCollection.

# i-index: first 2 profiles, all horizons
sp4[1:2, ]
## SoilProfileCollection with 2 profiles and 6 horizons
## profile ID: id  |  horizon ID: hzID 
## Depth range: 34 - 42 cm
## 
## ----- Horizons (6 / 6 rows  |  10 / 15 columns) -----
##      id hzID top bottom name   K   Mg  Ca CEC_7 ex_Ca_to_Mg
##  colusa    1   0      3    A 0.3 25.7 9.0  23.0        0.35
##  colusa    2   3      8  ABt 0.2 23.7 5.6  21.4        0.23
##  colusa    3   8     30  Bt1 0.1 23.2 1.9  23.7        0.08
##  colusa    4  30     42  Bt2 0.1 44.3 0.3  43.0        0.01
##   glenn    5   0      9    A 0.2 21.9 4.4  18.8        0.20
##   glenn    6   9     34   Bt 0.3 18.9 4.5  27.5        0.20
## 
## ----- Sites (2 / 2 rows  |  2 / 2 columns) -----
##      id   newvar1
##  colusa numberone
##   glenn numberone
## 
## Spatial Data:
## [EMPTY]
# j-index: all profiles; first 2 horizons of each profile
sp4[, 1:2]
## SoilProfileCollection with 10 profiles and 20 horizons
## profile ID: id  |  horizon ID: hzID 
## Depth range: 5 - 40 cm
## 
## ----- Horizons (6 / 20 rows  |  10 / 15 columns) -----
##      id hzID top bottom name   K   Mg  Ca CEC_7 ex_Ca_to_Mg
##  colusa    1   0      3    A 0.3 25.7 9.0  23.0        0.35
##  colusa    2   3      8  ABt 0.2 23.7 5.6  21.4        0.23
##   glenn    5   0      9    A 0.2 21.9 4.4  18.8        0.20
##   glenn    6   9     34   Bt 0.3 18.9 4.5  27.5        0.20
##   kings    7   0      4    A 0.2 12.1 1.4  23.7        0.58
##   kings    8   4     13  Bt1 0.6 12.1 7.0  18.0        0.51
## [... more horizons ...]
## 
## ----- Sites (6 / 10 rows  |  2 / 2 columns) -----
##         id   newvar1
##     colusa numberone
##      glenn numberone
##      kings numberone
##   mariposa numberone
##  mendocino numberone
##       napa numberone
## [... more sites ...]
## 
## Spatial Data:
## [EMPTY]

When you use the [ function, everything in the SoilProfileCollection is subset simultaneously depending on the constraints specified by the indices.

# First profile, first 2 horizons
horizons(sp4[1, 1:2])
##       id name top bottom   K   Mg  Ca CEC_7 ex_Ca_to_Mg sand silt clay   CF hzID   newvar2
## 1 colusa    A   0      3 0.3 25.7 9.0  23.0        0.35   46   33   21 0.12    1 numbertwo
## 2 colusa  ABt   3      8 0.2 23.7 5.6  21.4        0.23   42   31   27 0.27    2 numbertwo

All slots in the collection have a relationship to the site or i-index. When you remove sites (profiles), all associated records (e.g. spatial, diagnostics, horizons, etc.) in the object are removed.

Similarly, when all horizons are removed (say, you request the 6th j-index from a profile that has only 5 layers), the site index and all associated data are removed from the collection.

2.5 Exercise 1: Assemble a SoilProfileCollection from several CSV files

Link to exercise R code

Questions:

Run the code in the linked R file and answer these questions.

  1. How many profiles (sites) and horizons are in the granite SoilProfileCollection? How many in andesite? See length() and nrow() functions for SoilProfileCollection objects.

  2. Which profile (and which horizon in that profile) has the highest ratio of oxalate-extractable Fe to dithionite-citrate-extractable Fe (Fe_o_to_Fe_d)?

Send the results of questions (and any code you used) to your mentor.

2.6 Using the soilDB Package

The soilDB package for R provides functions for accessing data stored in NASIS, KSSL, SDA, SoilWeb, SoilGrids and other sources.

These high-level ‘fetch’ functions bundle or wrap lower-level get functions which access internal database interfaces to NASIS and other data sources. The ODBC connection to NASIS that you set up during the pre-course is an example of this internal database interface.

Basic data checks are run within fetch functions. These checks ensure the basic integrity of the data as it is queried and moved from its existing structure into an SPC. There are times when it is useful to use the lower-level get functions individually. They generally return single data.frame or list of data.frame.

You can set up scripts to make custom queries against these or other sources on your own – there is an example at the end of this section.

For now, we will start with the fetch functions and others that will get you a large variety of data you can use for soil and ecological site analyses.

2.6.1 soilDB functions for tabular data

soilDB functions are the quickest way to get up and running:

2.6.2 Open Database Connectivity (ODBC) Connection to NASIS

After setting up an ODBC connection, as you did as part of the pre-course, you can use R to access data from a selected set defined in your local NASIS database.

Does NASIS need to be open and running to query data using soilDB?

No, fetchNASIS() works whether the NASIS application is running or not. You just need to make sure that the data you want has been loaded into your selected set.

2.6.3 fetchNASIS()

The fetchNASIS() convenience function extracts data from a NASIS selected set via Structured Query Language (SQL).

Note that the import process in fetchNASIS(), and the other methods, is not comprehensive. It does not pull every column for every table related to pedon data out of NASIS.

Instead, it pulls essential / commonly used pedon and horizon data. Higher level functions like fetchNASIS() bundle a series of lower-level queries to get specific parts of the Pedon or Component data structures. Much of the nested complexity of NASIS is simplified in the resulting object.

Many-to-one relationships are “flattened” where possible by fetchNASIS(). This aggregates the data from various tables into one “site” record with related horizon records, per profile.

You can see the child tables that are aggregated using the get_extended_data_from_NASIS() method, which returns a named list of child table sources that can be joined to the SoilProfileCollection made with fetchNASIS() using the internal record IDs.

2.6.3.1 fetchNASIS arguments

fetchNASIS() has a number of different arguments:

  • from = ‘pedons’ or ‘components’ or ‘pedon_report’

    • This option allows you to select which data you want to load from NASIS. Choosing either ‘pedons’ or ‘components’ will load data from your local database. If ‘pedon_report’ is specified then it will load data from the text file generated by the NASIS report ‘fetchNASIS’ (run offline). This is useful for loading more than 20,000 pedons at one time, such for an entire Soil Survey Region.
  • url = string specifying the (temporary) URL for the NASIS pedon_report output generated by the fetchNASIS NASIS Report that can be run “Offline Against National Database” EXAMPLE OUTPUT (MT663)

  • SS = TRUE/FALSE

    • The Selected Set (SS) option allows you to choose whether you want the data to load from your current selected set in NASIS or from the local database tables. The default is set to TRUE so if unspecified fetchNASIS() will always load from the data in the selected set.
  • stringAsFactors = NULL

  • rmHzErrors = TRUE/FALSE

    • Setting this value to TRUE removes pedons that do not pass checks for horizon depth consistency. Default: FALSE
  • nullFragsAreZero = TRUE/FALSE

    • Setting this value to TRUE (the default) converts null entries for rock fragment volumes to 0. This is typically the right assumption because rock fragment data are typically populated only when observed. If you know that your data contain a combination of omitted information (e.g. no rock fragment volumes are populated) then consider setting this argument to FALSE.
  • soilColorState = ‘moist’ or ‘dry’

    • Select dry or moist colors to be converted and placed into a horizon-level attribute called soil_color. The default is set to ‘moist’ unless specified. Moist and dry colors are also stored in moist_soil_color and dry_soil_color.
  • lab = TRUE/FALSE

    • This option allows for loading the data associated with horizons that may be in the phlabresults table. The default is set to FALSE, which will not load records from the phlabresults table.
  • fill = TRUE/FALSE

    • This option adds horizons to pedons or components that lack horizon data in the database, preserving their “position” within the output SoilProfileCollection object. Default is FALSE such that profiles without horizons are omitted from the result.
  • dropAdditional = TRUE/FALSE

    • Used only for from='components' with duplicates = TRUE. Prevent “duplication” of mustatus == "additional" mapunits? Default: TRUE
  • dropNonRepresentative = TRUE/FALSE

    • Used only for from='components' with duplicates = TRUE. Prevent “duplication” of non-representative data mapunits? Default: TRUE
  • duplicates = FALSE

    • Used only for from='components'. Duplicate components for all instances of use (i.e. one for each legend data mapunit is used on; optionally for additional mapunits, and/or non-representative data mapunits?). This will include columns from get_component_correlation_data_from_NASIS_db() that identify which legend(s) each component is used on. NOTE: This requires that parent tables above “component” (such as legend, mapunit, correlation and datamapunit) are loaded in your NASIS database/selected set.
  • dsn = NULL

For more information on the data checks and adjusting the default options to fetchNASIS() function, see the following resource: Tips on Getting Data from NASIS into R.

2.6.4 The gopheridge Dataset

The gopheridge sample data set is a sample R object returned from fetchNASIS() in a self-contained .rda file stored in soilDB.

Open RStudio, and set up the environment by loading packages and the gopheridge sample dataset.

library(aqp)
library(soilDB)

# load example dataset
data(gopheridge, package = "soilDB")

# what kind of object is this?
class(gopheridge)
## [1] "SoilProfileCollection"
## attr(,"package")
## [1] "aqp"
# what does the internal structure look like?
str(gopheridge, 2)
## Formal class 'SoilProfileCollection' [package "aqp"] with 9 slots
##   ..@ idcol       : chr "peiid"
##   ..@ hzidcol     : chr "phiid"
##   ..@ depthcols   : chr [1:2] "hzdept" "hzdepb"
##   ..@ metadata    :List of 7
##   ..@ horizons    :'data.frame': 317 obs. of  73 variables:
##   ..@ site        :'data.frame': 52 obs. of  137 variables:
##   ..@ sp          :Formal class 'SpatialPoints' [package "sp"] with 3 slots
##   ..@ diagnostic  :'data.frame': 164 obs. of  4 variables:
##   ..@ restrictions:'data.frame': 56 obs. of  8 variables:

There are many columns in the data.frame objects within a fetchNASIS() SoilProfileCollection.

The following guide is part of the soilDB documentation and describes the columns that are returned from a fetchNASIS(from="pedons") call. In the future this guide will be extended to describe component data.

With siteNames() and horizonNames() we can view the names for eachslot respectively in the gopheridge object.

# the fields at the site and horizon levels within the SPC
siteNames(gopheridge)
##   [1] "peiid"                                    "siteiid"                                 
##   [3] "pedon_id"                                 "site_id"                                 
##   [5] "obs_date"                                 "utmzone"                                 
##   [7] "utmeasting"                               "utmnorthing"                             
##   [9] "x"                                        "y"                                       
##  [11] "horizdatnm"                               "x_std"                                   
##  [13] "y_std"                                    "gpspositionalerror"                      
##  [15] "describer"                                "pedonpurpose"                            
##  [17] "pedontype"                                "pedlabsampnum"                           
##  [19] "labdatadescflag"                          "tsectstopnum"                            
##  [21] "tsectinterval"                            "utransectid"                             
##  [23] "tsectkind"                                "tsectselmeth"                            
##  [25] "elev_field"                               "slope_field"                             
##  [27] "aspect_field"                             "plantassocnm"                            
##  [29] "earthcovkind1"                            "earthcovkind2"                           
##  [31] "erocl"                                    "bedrckdepth"                             
##  [33] "bedrckkind"                               "bedrckhardness"                          
##  [35] "hillslopeprof"                            "geomslopeseg"                            
##  [37] "shapeacross"                              "shapedown"                               
##  [39] "slopecomplex"                             "drainagecl"                              
##  [41] "flodfreqcl"                               "floddurcl"                               
##  [43] "flodmonthbeg"                             "pondfreqcl"                              
##  [45] "ponddurcl"                                "pondmonthbeg"                            
##  [47] "geomposhill"                              "geomposmntn"                             
##  [49] "geompostrce"                              "geomposflats"                            
##  [51] "swaterdepth"                              "surface_fine_gravel"                     
##  [53] "surface_gravel"                           "surface_cobbles"                         
##  [55] "surface_stones"                           "surface_boulders"                        
##  [57] "surface_channers"                         "surface_flagstones"                      
##  [59] "surface_parafine_gravel"                  "surface_paragravel"                      
##  [61] "surface_paracobbles"                      "surface_parastones"                      
##  [63] "surface_paraboulders"                     "surface_parachanners"                    
##  [65] "surface_paraflagstones"                   "surface_unspecified"                     
##  [67] "surface_total_frags_pct_nopf"             "surface_total_frags_pct"                 
##  [69] "slope_shape"                              "surface_fgravel"                         
##  [71] "classdate"                                "classifier"                              
##  [73] "classtype"                                "taxonname"                               
##  [75] "localphase"                               "taxonkind"                               
##  [77] "seriesstatus"                             "taxclname"                               
##  [79] "taxpartsize"                              "taxorder"                                
##  [81] "taxsuborder"                              "taxgrtgroup"                             
##  [83] "taxsubgrp"                                "soiltaxedition"                          
##  [85] "osdtypelocflag"                           "taxmoistcl"                              
##  [87] "taxtempregime"                            "taxfamother"                             
##  [89] "taxreaction"                              "taxfamhahatmatcl"                        
##  [91] "psctopdepth"                              "pscbotdepth"                             
##  [93] "selection_method"                         "ecositeid"                               
##  [95] "ecositenm"                                "ecositecorrdate"                         
##  [97] "es_classifier"                            "es_selection_method"                     
##  [99] "ochric.epipedon"                          "argillic.horizon"                        
## [101] "paralithic.contact"                       "lithic.contact"                          
## [103] "mollic.epipedon"                          "slickensides"                            
## [105] "calcic.horizon"                           "duripan"                                 
## [107] "cambic.horizon"                           "umbric.epipedon"                         
## [109] "anthropic.epipedon"                       "histic.epipedon"                         
## [111] "paralithic.materials"                     "lithologic.discontinuity"                
## [113] "andic.soil.properties"                    "densic.contact"                          
## [115] "abrupt.textural.change"                   "aquic.conditions"                        
## [117] "redox.depletions.with.chroma.2.or.less"   "redox.concentrations"                    
## [119] "reduced.matrix"                           "densic.materials"                        
## [121] "albic.horizon"                            "spodic.horizon"                          
## [123] "fibric.soil.materials"                    "hemic.soil.materials"                    
## [125] "sapric.soil.materials"                    "volcanic.glass"                          
## [127] "folistic.epipedon"                        "strongly.contrasting.particle.size.class"
## [129] "human.transported.material"               "albic.materials"                         
## [131] "secondary.carbonates"                     "landform_string"                         
## [133] "landscape_string"                         "microfeature_string"                     
## [135] "geomicrorelief_string"                    "pmkind"                                  
## [137] "pmorigin"
horizonNames(gopheridge)
##  [1] "phiid"                "peiid"                "hzname"               "genhz"               
##  [5] "hzdept"               "hzdepb"               "bounddistinct"        "boundtopo"           
##  [9] "clay"                 "silt"                 "sand"                 "fragvoltot"          
## [13] "texture"              "texcl"                "lieutex"              "phfield"             
## [17] "effclass"             "labsampnum"           "rupresblkdry"         "rupresblkmst"        
## [21] "rupresblkcem"         "stickiness"           "plasticity"           "ksatpedon"           
## [25] "texture_class"        "hzID"                 "d_r"                  "d_g"                 
## [29] "d_b"                  "d_hue"                "d_value"              "d_chroma"            
## [33] "d_sigma"              "m_r"                  "m_g"                  "m_b"                 
## [37] "m_hue"                "m_value"              "m_chroma"             "m_sigma"             
## [41] "moist_soil_color"     "dry_soil_color"       "soil_color"           "fine_gravel"         
## [45] "gravel"               "cobbles"              "stones"               "boulders"            
## [49] "channers"             "flagstones"           "parafine_gravel"      "paragravel"          
## [53] "paracobbles"          "parastones"           "paraboulders"         "parachanners"        
## [57] "paraflagstones"       "unspecified"          "total_frags_pct_nopf" "total_frags_pct"     
## [61] "art_fgr"              "art_gr"               "art_cb"               "art_st"              
## [65] "art_by"               "art_ch"               "art_fl"               "art_unspecified"     
## [69] "total_art_pct"        "huartvol_cohesive"    "huartvol_penetrable"  "huartvol_innocuous"  
## [73] "huartvol_persistent"

2.6.4.1 Make profile sketches

The plotSPC() or plot() function applied to a SoilProfileCollection object generates sketches based on horizon depths, designations, and colors. The SoilProfileCollection Reference contains many examples demonstrating way in which these sketches can be customized.

The fetchNASIS() function automatically converts moist Munsell colors into R-style colors, available in the soil_color horizon level attribute. An approximate color mixture is used when multiple colors per horizon are present.

See ?plotSPC for a detailed list of arguments and examples.

The Soil Profile Sketches tutorial contains additional examples that demonstrate various ways to customize soil profile sketches.

par(mar = c(1, 1, 1, 1))

# omitting pedon IDs and horizon designations
plotSPC(gopheridge,
     print.id = FALSE,
     name = NA,
     width = 0.3)
title('Pedons from the `gopheridge` sample dataset', line = -0.5)

Additional examples / documentation related to soil profile sketches:

2.6.4.2 Pedon Data Checks

When you load pedons using the fetchNASIS() function, the following data checks are performed:

  • Presence of multiple map datums. Results reported to the user and the data are not modified.

  • Inconsistent horizon boundaries. Pedons with inconsistent horizon boundaries are not loaded unless rmHzErrors = FALSE. In most cases, this occurs when the bottom depth of a horizon is not the same as the upper depth of the next lower horizon.

##   hzname top bot
## 1      A   0  30
## 2    Bt1  38  56
## 3    Bt2  56 121
## 4     Bk 121 135
## 5      R 135  NA

Note the issue above. The bottom depth of the A horizon and the upper depth of the Bt1 horizon should be the same: either 30 or 38 cm. The correct depth needs to be determined and fixed in the database.

  • Missing lower horizon depths. Offending horizons are fixed by replacing the missing bottom depth with the top depth plus 2 cm. In the case of the profile shown above, a bottom depth of 137 cm would be inserted where the depth is missing.

  • Sites missing pedon records. Data without corresponding horizons are not loaded.

2.6.4.3 Find Pedons with Errors

If errors in the pedon data are detected when loading data using fetchNASIS(), the following “get” commands can trace them back to the corresponding records in NASIS.

These access an option that is stored in a special object called an Environment associated with the soilDB package – they generally contain vectors of IDs.

  • get('sites.missing.pedons', envir = soilDB.env)
    • Returns user site ID for sites missing pedons
  • get('dup.pedon.ids', envir = soilDB.env)
    • Returns user pedon ID for sites with duplicate pedon ID
  • get('bad.pedon.ids', envir = soilDB.env)
    • Returns user pedon ID for pedons with inconsistent horizon depths
  • get('bad.horizons', envir = soilDB.env)
    • Returns a data.frame of horizon-level information for pedons with inconsistent horizon depths
get('sites.missing.pedons', envir = soilDB.env)
get('dup.pedon.ids', envir = soilDB.env)
get('bad.pedon.ids', envir = soilDB.env)
get('bad.horizons', envir = soilDB.env)

These get() calls access variables stored in the package environment soilDB.env. The variables only exist if there are “problems” / values found by the data checks – and if you fix the errors in the NASIS database and the checks don’t find any errors then nothing will be returned by these functions.

2.6.5 Further Reading / Reference

  • The SoilProfileCollection Reference contains examples based on the sp4 sample dataset. This is a good exercise for the class and reference for future work.

  • The Querying Soil Series Data document contains a more detailed explanation of how the data behind fetchOSD were created and how to interpret its results.

  • The Competing Soil Series document contains a number of examples related to getting OSD morphology using fetchOSD, laboratory data using fetchKSSL, and summary via profile sketches and slice-wise aggregation.

  • The “SPC Plotting Ideas” document outlines some advanced techniques for arranging and annotating soil profile sketches.

  • Several examples pertaining to SDA, soil texture, and soil profile sketches are explored in this Random Walk 001.

2.7 Working with Data in R

2.7.1 Summaries

Now that you’ve loaded some data, you can look at additional ways to summarize and interact with data elements.

2.7.1.1 table() and Cross Tabulation

The base R table() function is very useful for quick summary operations. It returns a named vector with the amount of each unique level of the a given vector.

The numeric vector of “counts” is commonly combined with other functions such as sort(), order(), prop.table(), is.na() to identify abundance, proportions, or missing data (NA).

# load required packages
library(aqp)
library(soilDB)

data("gopheridge", package = "soilDB")

# for these examples, we use the gopheridge object as our "selected set"
pedons <- gopheridge 

## you can use fetchNASIS to load your own data, like this:
# pedons <- fetchNASIS()

# summarize which soil taxa we have loaded
table(pedons$taxonname)
## 
## Gopheridge 
##         52
# sort taxonomic names in descending order
sort(table(pedons$taxonname), decreasing = TRUE)
## Gopheridge 
##         52
# could do the same thing for taxonomic subgroups 
table(pedons$taxsubgrp)
## 
## mollic haploxeralfs  typic haploxerepts  ultic haploxeralfs  ultic haploxerolls 
##                   1                   6                  44                   1
sort(table(pedons$taxsubgrp), decreasing = TRUE)
## 
##  ultic haploxeralfs  typic haploxerepts mollic haploxeralfs  ultic haploxerolls 
##                  44                   6                   1                   1

We can convert counts in the table() result into proportions with prop.table():

prop.table(table(pedons$taxsubgrp))
## 
## mollic haploxeralfs  typic haploxerepts  ultic haploxeralfs  ultic haploxerolls 
##          0.01923077          0.11538462          0.84615385          0.01923077

table() can be used to get counts over multiple dimensions of factor levels.

We call this process cross tabulation.

For instance, let’s cross tabulate taxonomic subgroup (taxsubgrp) and the particle size family class (taxpartsize) for pedons

table(pedons$taxsubgrp, pedons$taxpartsize)
##                      
##                       clayey-skeletal coarse-loamy fine fine-loamy loamy-skeletal
##   mollic haploxeralfs               0            0    0          0              1
##   typic haploxerepts                1            0    0          0              5
##   ultic haploxeralfs                2            0    1          1             40
##   ultic haploxerolls                0            1    0          0              0

As expected gopheridge the vast majority of pedons are Loamy-skeletal Ultic Haploxeralfs.

Since pedons contains site and horizon level data, when cross-tabulating we need to be sure pair a column from the site data with other site-level columns, and the same for horizon-level. This is because all arguments to table() must have the same length.

For example, let’s cross-tabulate horizon designation (hzname) with horizon texture class (texcl).

We can use the addmargins() function to add the row and column sums to the margins of the table for easier interpretation when there are many rows/columns in the result.

addmargins(table(pedons$hzname, pedons$texcl))
##        
##           c  cl   l scl sic sicl sil  sl Sum
##   2BCt5   0   1   0   0   0    0   0   0   1
##   2Bt1    0   0   3   0   0    0   0   0   3
##   2Bt2    1   2   1   1   0    0   0   0   5
##   2Bt3    1   3   1   0   0    0   0   0   5
##   2Bt4    2   0   0   0   0    0   0   0   2
##   2CBt    0   0   0   1   0    0   0   0   1
##   2Cr     0   0   0   0   0    0   0   0   0
##   2Crt    0   0   0   0   0    0   0   0   0
##   2R      0   0   0   0   0    0   0   0   0
##   A       0   0  33   0   0    0  12   3  48
##   A1      0   0   2   0   0    0   2   0   4
##   A2      0   0   2   0   0    0   2   0   4
##   A3      0   0   1   0   0    0   0   0   1
##   AB      0   0   2   0   0    0   2   0   4
##   BA      0   0  17   0   0    1   0   0  18
##   BC      0   0   4   0   0    0   0   0   4
##   BCt     0   2   2   1   0    0   1   0   6
##   Bt      0   0   1   1   1    0   0   0   3
##   Bt1     0   3  28   1   0    3   5   0  40
##   Bt2     3  15  14   1   1    3   1   0  38
##   Bt3     4   8   6   1   0    1   1   0  21
##   Bt4     1   2   2   0   0    0   0   0   5
##   Bw      0   0   6   0   0    0   1   0   7
##   Bw1     0   0   5   0   0    0   0   0   5
##   Bw2     0   0   5   0   0    0   0   0   5
##   Bw3     0   0   3   0   0    0   0   0   3
##   C       1   0   1   0   0    0   0   0   2
##   C/Brt   0   1   0   0   0    0   0   0   1
##   CBt     0   0   0   1   0    0   0   0   1
##   Cr      0   0   1   0   0    0   0   0   1
##   Crt     0   0   0   0   0    0   0   0   0
##   Ct      0   0   1   0   0    0   0   0   1
##   Oe      0   0   0   0   0    0   0   0   0
##   Oi      0   0   0   0   0    0   0   0   0
##   R       0   0   0   0   0    0   0   0   0
##   Sum    13  37 141   8   2    8  27   3 239

2.7.2 Missing Values

table(pedons$taxsubgrp, useNA = "ifany")
## 
## mollic haploxeralfs  typic haploxerepts  ultic haploxeralfs  ultic haploxerolls 
##                   1                   6                  44                   1
#  is.na(...) 
table(is.na(pedons$taxsubgrp))
## 
## FALSE 
##    52
#  is NOT NA !is.na(...)
table(!is.na(pedons$taxsubgrp))
## 
## TRUE 
##   52
# it can also be applied to horizon level columns in the SPC
sort(table(pedons$texture), decreasing=TRUE)
## 
##        BR         L      GR-L     GRV-L     CBV-L       SPM     GRX-L       SIL    GRV-CL    CBV-CL    GR-SIL 
##        58        36        33        24        18        14        12        12         9         8         7 
##     CBX-L    GRX-CL    CBX-CL   GRV-SIL        CL   GRV-SCL     GRX-C       MPM        SL      CB-L     GR-CL 
##         6         5         4         4         3         3         3         3         3         2         2 
##   GRX-SCL     PGR-C    PGRX-L      SICL    STV-CL     STV-L     STX-C     STX-L         C      CB-C     CB-CL 
##         2         2         2         2         2         2         2         2         1         1         1 
##    CB-SCL    CB-SIL   CBV-SIL   CBX-SCL      CN-L   CN-SICL     CNX-L  CNX-SICL     FLV-L      GR-C    GR-SIC 
##         1         1         1         1         1         1         1         1         1         1         1 
##  GRV-SICL   GRX-SIC   GRX-SIL  PCB-SICL PCBV-SICL     PCN-C   PCNX-CL    PGRV-C   PGRV-CL  PGRX-SCL  PGRX-SIL 
##         1         1         1         1         1         1         1         1         1         1         1 
##      ST-L     STV-C    STX-CL  STX-SICL 
##         1         1         1         1

2.7.3 Logical Operators

Logical operators act on vectors for the purposes of comparison.

  • == “EQUAL TO”

  • != “NOT EQUAL TO”

  • < LESS than

    • LESS than or equal to <=
  • > GREATER than

    • GREATER than or equal to >=
  • %in% Equivalent to IN () in SQL; same logic as match() but returns a boolean, not integer

    • Example: pedons$taxpartsize %in% c('loamy-skeletal', 'sandy-skeletal')

    • Returns a vector of TRUE/FALSE equal in length to left-hand side

  • & logical AND

  • | logical OR

2.7.4 Pattern Matching

The following examples use the grep() function to pattern match within the data, create an index of the SoilProfileCollection for records that match the specified pattern within that column, and then use that index to filter to specific sites and their corresponding profiles.

Patterns are specified using regular expression (REGEX) syntax.

This process can be applied to many different columns in the SPC based on how you need to filter the data. This example pattern matches on the tax_subgroup column, but another useful application might be to pattern match on geomorphology or parent material.

Say we want to see what the variation of particle size classes are within a specific subgroup? We can use grep() to create a row index, then apply that index to the SoilProfileCollection.

# create a numeric index for pedons with taxsubgroup containing 'typic'
idx <- grep('typic', pedons$taxsubgrp)
idx
## [1] 11 12 13 14 26 50
# use square bracket notation to subset 'typic' soils in `subset1` object
subset1 <- pedons[idx,]

# or use the index directly to summarize taxpartsize for 'typic' soils
sort(table(pedons$taxpartsize[idx]), decreasing = TRUE)
## 
##  loamy-skeletal clayey-skeletal 
##               5               1

Note: grep() below has an invert argument (default FALSE). This option is very useful for excluding the results of the pattern matching process by inverting whatever the result is. grepl() is the logical version of grep(), so you can invert it using the logical NOT operator: !.

Another method is to create an index using which() function. which() takes any logical vector (or expression), and it returns the indices (positions) where that expression returns TRUE. The use of which becomes more important when there are missing values (NA) in an expression.

Do a graphical check to see the “typic” profiles are selected.

Plot them in R using the SoilProfileCollection “plot” method (e.g., specialized version of the generic plot() function).

# adjust margins
par(mar=c(1,0,0,1))

# plot the first 10 profiles of subset1
plot(subset1[1:10, ], label = 'taxsubgrp', max.depth = 60)
title('Pedons with the word "typic" at subgroup-level of Soil Taxonomy', line=-2)

For more information on using regular expressions in grep() for pattern matching operations, see: Regular-expression-syntax.

Quick check: Compare or run these commands with some code, and review the documentation, to answer the questions.

  • True or False: grepl() returns a numeric vector
  • True or False: which(grepl('typic', pedons$taxsubgrp)) is the same as grep('typic', pedons$taxsubgrp).

2.7.4.1 REGEX Quick Start

  • . One character, any character

  • * Zero-or-more Quantifier (of previous token)

  • + One-or-more Quantifier (of previous token)

  • {n} quantifier where n is the the number of a match “repeats” (of previous token)

  • [A-Z!] ONE capital letter, or an exclamation mark

  • [0-9]{2} TWO numbers (using { quantifier)

  • | is equivalent to OR:

    • Example: grep('loamy|sandy', c("loamy-skeletal","sandy","sandy-skeletal"))
      • “loamy OR sandy”
  • ^ Anchor to beginning of string / line:

    • Example: grep('^sandy', c("loamy-skeletal","sandy","sandy-skeletal"))
      • STARTS WITH sandy”
  • $ Anchor to end of string / line:

    • Example: grep('skeletal$', c("loamy-skeletal","sandy","sandy-skeletal"))
      • ENDS WITH skeletal”
  • \\b Anchor to word boundary:

    • Example: grep('\\bmesic', c("mesic","thermic","isomesic"))
      • WORD STARTS WITH mesic” (e.g. not “isomesic”)

2.7.4.2 Resources for Regular Expressions

2.7.5 Filtering

A variety of methods are available to subset or “filter” R data objects, from a simple data.frame or vector, to something more complex like a Spatial object or a SoilProfileCollection.

You can index many R objects using numeric or logical expressions as above. There are also methods that make this process a little easier.

The base R method for this is subset() and it works on data.frame objects. It is nice because you can specify column names without explicitly referencing the data set, since subset uses non-standard evaluation of expressions passed as arguments.

2.7.5.1 Filtering with aqp::subset()

We use the SoilProfileCollection subset method, where we first specify a data (pedons) object then we can write expressions for the columns that exist in that object.

Here, we combine two logical expressions to find taxsubgrp containing "alfs" (Alfisols) with obsdate before January 1st, 2010.

subset2 <- subset(pedons, grepl("alfs", taxsubgrp) & obs_date < as.POSIXlt("2010-01-01"))

# check taxonomic range of particle size classes in the data
# overwhelmingly these are described as loamy-skeletal ultic haploxeralfs
sort(table(subset2$taxsubgrp), decreasing = TRUE)
## 
##  ultic haploxeralfs mollic haploxeralfs 
##                  28                   1
sort(table(subset2$taxpartsize), decreasing = TRUE)
## 
##  loamy-skeletal clayey-skeletal            fine      fine-loamy 
##              25               2               1               1
# check year described and taxpartsize
table(subset2$taxpartsize, substr(subset2$obs_date, 0, 4))
##                  
##                   2007 2008 2009
##   clayey-skeletal    1    0    1
##   fine               1    0    0
##   fine-loamy         1    0    0
##   loamy-skeletal    19    1    5
# a double equal sign '==' is used for exact character or numeric criteria
subset3 <- subset(subset2, taxpartsize == 'loamy-skeletal')

table(subset3$taxpartsize)
## 
## loamy-skeletal 
##             25
par(mar = c(0, 0, 2, 1))
plotSPC(subset3[1:12, ], print.id = FALSE)
title('Loamy-skeletal Ultic Haploxeralfs')

2.7.6 Dates and Times

Dates and times use special object types in R. The Unix time, also known as “Posix time,” is a system for describing a point in time. Unix epoch is a whole number value that is the number of seconds elapsed since the 00:00:00 UTC on 1 January 1970 minus leap seconds.

We can use logical comparison operators on dates and times if their string representation such as "01/01/1970" is converted to a common base R UNIX time representation known as POSIXlt or POSIXct.

This conversion accounts for important things such as timezone using your computer’s locale–which is important to keep in mind.

When converting to POSIX time several unambiguous (year-month-day) date time formats can be detected.

For instance, if you want to convert a date in the common month-day-year format, you need to specify the format argument:

as.POSIXct(24*60*60, origin = "1970-01-01", tz = "UTC")
## [1] "1970-01-02 UTC"

By default the timezone will match your current timezone. Dates without times are treated as being at midnight UTC.

You can customize the timezone with tz argument:

as.POSIXlt("01/01/1970", tz = "UTC", format = "%m/%d/%Y")
## [1] "1970-01-01 UTC"

POSIXlt and POSIXct objects can be formatted with the format() function. strptime() can be used to parse character vectors into date/times. You use as.POSIXlt() with character input, and as.POSIXct() with numeric input.

R also has the Date class which can be used for formatting calendar dates. You can convert POSIXlt/POSIXct objects to Date with as.Date()

2.8 Exercise 2: Generalized Horizons with Loafercreek

Here we will introduce the concept of using regular expressions to apply “Generalized Horizon Labels” based on the field soil description horizon designations.

This demonstrates one way of grouping horizon data for determining the Range in Characteristics of layers within a Soil Series or a SSURGO component.

For the exercise during class, we will give you some time to read over the materials and apply the process to the sample loafercreek data. The code to apply the process to loafercreek is given.

Then we ask that you apply a similar process to your own data, adjusting your generalized horizon patterns as needed for your local soils. This may take more time than we have during class itself, so you should follow up as needed with your mentor to complete. Chapter 3 Exploratory Data Analysis will get deeper into some of the topics that are referenced in the loafercreek code, such as summary statistics on grouped data. You will send a table and profile plots to your mentor when you are done.

2.9 fetchNASIS() data checks

fetchNASIS() does a lot of the hard work for you by pulling out a variety of child tables and joining them to site/horizon tables

This results in a SoilProfileCollection object that has many of the NASIS data columns. In order to do this, a variety of processes are run. Output is generated that you should look at. We will walk through the most common parts.

Mixing of multiple colors (for “default” moist and dry colors for many:1 case)

mixing dry colors ... [33 of 4108 horizons]
mixing moist colors ... [60 of 4532 horizons]

Quality control and limited “filling” of fragments, horizon depth data, site data, tables

replacing missing lower horizon depths with top depth + 1cm ... [19 horizons]
-> QC: horizon errors detected:
    Use `get('bad.pedon.ids', envir=soilDB.env)` for pedon record IDs (peiid)
    Use `get('bad.horizons', envir=soilDB.env)` for horizon designations
-> QC: pedons missing bottom hz depths:
    Use `get('missing.bottom.depths', envir=soilDB.env)` for pedon record IDs (peiid)

Notes about default settings and handling of NULL (missing data elements or records)

NOTE: some records are missing surface fragment cover
NOTE: some records are missing rock fragment volume
NOTE: all records are missing artifact volume

2.9.1 Inspecting Results

Here we inspect occurrence of andic soil properties in MT647.

We will download this “selected set” from the course website as an .rda file to save you the effort of crafting your selected set just for this example.

example.data.dir <- "C:/workspace2"
example.data.path <- file.path(example.data.dir, "mt647.rda")

if (!dir.exists(example.data.dir))
  dir.create(example.data.dir, recursive = TRUE)

download.file("https://github.com/ncss-tech/stats_for_soil_survey/raw/master/data/book/02/mt647.rda", 
              destfile = example.data.path)

Downloading and installing the above .rda is equivalent to doing NASIS query “_PedonPC_Plus_DataDump_select” (MLRA04 Bozeman) for to fill your selected set for User Site ID: %MT647%, NASIS Site: MLRA04%, NASIS Group: 4-MIS Pedons, followed by mt647 <- fetchNASIS().

2.9.1.1 Load Example Data

To load the sample object data into R, just use load() and the path to the .rda file (example.data.path or "C:/workspace2/mt647.rda")

# load the sample data
example.data.dir <- "C:/workspace2"
load(file.path(example.data.dir, "mt647.rda"))
length(mt647)
## [1] 481
table(site(mt647)$andic.soil.properties, useNA = "ifany")
## 
## FALSE  TRUE  <NA> 
##     2    83   396
# get just the profiles with andic.soil.properties == TRUE
subset(mt647, andic.soil.properties)
## SoilProfileCollection with 83 profiles and 446 horizons
## profile ID: peiid  |  horizon ID: phiid 
## Depth range: 20 - 305 cm
## 
## ----- Horizons (6 / 446 rows  |  10 / 73 columns) -----
##   peiid   phiid hzdept hzdepb hzname texture genhz bounddistinct boundtopo clay
##  828140 4005861      0      2     Oe    <NA>  <NA>          <NA>      <NA>   NA
##  828140 4005863      2     13      E   CBV-L  <NA>         clear    smooth    5
##  828140 4005859     13     23    Bs1   CBV-L  <NA>         clear    smooth    8
##  828140 4005862     23     33    Bs2   CBV-L  <NA>       gradual    smooth    8
##  828140 4005864     33     69     BC GRX-FSL  <NA>       gradual    smooth   10
##  828140 4005860     69    152     2C GRX-FSL  <NA>          <NA>      <NA>   10
## [... more horizons ...]
## 
## ----- Sites (6 / 83 rows  |  10 / 105 columns) -----
##  siteiid  peiid pedon_id     site_id   obs_date utmzone utmeasting utmnorthing         x        y
##   845415 828140  P23-134 93MT6470134 1993-08-12      12   294091.0     5092141 -113.6569 45.95195
##   845429 828152  P92-052 92MT6470052 1992-07-30      12   274420.9     5123423 -113.9253 46.22694
##   845430 828153  P92-046 92MT6470046 1992-07-23      12   282680.7     5161752 -113.8361 46.57417
##   845432 828155  P93-053 93MT6470053 1993-08-10      12   259526.3     5079142 -114.0958 45.82389
##   845434 828157  P91-103 91MT6470103 1991-09-26      12   272221.3     5127708 -113.9558 46.26472
##   845451 828176  P89-011 89MT6470011 1989-08-02      12   276988.7     5110875 -113.8861 46.11500
## [... more sites ...]
## 
## Spatial Data:
## [EMPTY]

We can compare this to what we see in the NASIS Pedon Diagnostic Features table:

Any profiles that have have logic errors detected are stored in soilDB.env bad.pedon.ids variable after you run fetchNASIS. If this variable does not exist either you have not run fetchNASIS() in the current session or there are no errors

# these are the troublesome user pedon IDs
get('bad.pedon.ids', envir = soilDB.env)
##  [1] "P93-037" "P90-008" "P90-004" "P90-009" "P93-058" "P93-059" "P90-025" "P90-002" "P90-012" "P92-001"
## [11] "P92-085" "P91-105" "P90-019" "P75-006" "P90-010" "P90-015" "P91-094" "P92-029" "P92-076" "P93-026"
## [21] "P93-035" "P93-063" "P93-064" "P93-041" "P93-043" "P93-044" "P93-083" "P93-112" "P93-113" "P93-124"
## [31] "P93-001" "P96-007" "P91-025" "P93-078" "P92-044" "P91-112" "P92-038" "P90-018" "P93-057" "P93-084"
## [41] "P91-059" "P90-016" "P92-063" "P92-048" "P93-052" "F01-230" "F95-420" "F95-114" "F96-205"
# by default, rmHzErrors removes the "bad" illogical pedons
any(mt647$pedon_id %in% get('bad.pedon.ids', envir=soilDB.env))
## [1] FALSE

When fetchNASIS(..., rmHzErrors = TRUE) (the default), any horizons that were omitted from the SoilProfileCollection will be stored in the bad.horizons variable in the soilDB.env environment.

head(get('bad.horizons', envir=soilDB.env))
##      peiid   phiid pedon_id hzname hzdept hzdepb
## 67  868038 4270407  F01-230     Bw     NA     NA
## 68  868038 4270406  F01-230      E     NA     NA
## 95  868072 4270514  F95-114   <NA>     NA     NA
## 99  868048 4270437  F95-420   <NA>     NA     NA
## 111 868074 4270521  F96-205      C     NA     NA
## 112 868074 4270519  F96-205      B     NA     NA

2.9.1.2 Logic Checks for the SoilProfileCollection

The aqp package has several functions that do logic checks on SoilProfileCollection objects.

The main method that does this in aqp is checkHzDepthLogic() which returns a data.frame of results of running four logic tests on the horizon data from each profile.

Checks for:

  1. bottom depths less than top depth / bad top depth order ("depthLogic")

  2. bottom depths equal to top depth ("sameDepth")

  3. overlaps and gaps ("overlapOrGap")

  4. missing depths ("missingDepth")

logic_tests <- checkHzDepthLogic(mt647err)

# look at first few (look OK; valid == TRUE)
head(logic_tests)
##    peiid valid depthLogic sameDepth missingDepth overlapOrGap
## 1 828138  TRUE      FALSE     FALSE        FALSE        FALSE
## 2 828139  TRUE      FALSE     FALSE        FALSE        FALSE
## 3 828140  TRUE      FALSE     FALSE        FALSE        FALSE
## 4 828141  TRUE      FALSE     FALSE        FALSE        FALSE
## 5 828142  TRUE      FALSE     FALSE        FALSE        FALSE
## 6 828143  TRUE      FALSE     FALSE        FALSE        FALSE
# these all have overlapOrGap errors
head(logic_tests[!logic_tests$valid, ])
##     peiid valid depthLogic sameDepth missingDepth overlapOrGap
## 11 828148 FALSE      FALSE     FALSE        FALSE         TRUE
## 23 828160 FALSE      FALSE     FALSE        FALSE         TRUE
## 25 828162 FALSE      FALSE     FALSE        FALSE         TRUE
## 34 828171 FALSE      FALSE     FALSE        FALSE         TRUE
## 35 828172 FALSE      FALSE     FALSE        FALSE         TRUE
## 36 828173 FALSE      FALSE     FALSE        FALSE         TRUE
# join the logic test data into the site table
site(mt647err) <- logic_tests

Use the $valid vector in result to filter out profiles with missing depths (logic_tests$valid == FALSE)

bad.profiles <- subset(mt647err, !valid)
bad.profiles
## SoilProfileCollection with 49 profiles and 338 horizons
## profile ID: peiid  |  horizon ID: phiid 
## Depth range: 15 - 152 cm
## 
## ----- Horizons (6 / 338 rows  |  10 / 73 columns) -----
##   peiid   phiid hzdept hzdepb hzname texture genhz bounddistinct boundtopo clay
##  828148 4005908      0      3     Oe    <NA>  <NA>          <NA>      <NA>   NA
##  828148 4005907      3      5     Oi    <NA>  <NA>          <NA>      <NA>   NA
##  828148 4005905      5     15      E  CB-FSL  <NA>         clear    smooth    5
##  828148 4005911     15     23     Bs  CB-SIL  <NA>        abrupt    smooth    5
##  828148 4005909     23     25     2E CBV-FSL  <NA>         clear    smooth    8
##  828148 4005910     25     58     2B CBV-FSL  <NA>       gradual    smooth   10
## [... more horizons ...]
## 
## ----- Sites (6 / 49 rows  |  10 / 110 columns) -----
##   peiid siteiid pedon_id     site_id   obs_date utmzone utmeasting utmnorthing         x        y
##  828148  845423  P93-037 93MT6470037 1993-07-28      12   272869.5     5126695 -113.9469 46.25583
##  828160  845435  P90-008 90MT6470008 1990-07-12      12   274656.5     5129224 -113.9250 46.27917
##  828162  845437  P90-004 90MT6470004 1990-07-16      12   273885.5     5128079 -113.9344 46.26861
##  828171  845446  P90-009 90MT6470009 1990-07-24      12   274813.1     5129404 -113.9230 46.28083
##  828172  845447  P93-058 93MT6470058 1993-08-19      12   265401.3     5110256 -114.0356 46.10555
##  828173  845448  P93-059 93MT6470059 1993-08-19      12   265446.7     5110316 -114.0350 46.10611
## [... more sites ...]
## 
## Spatial Data:
## [EMPTY]

And also filter out the valid ones (logic_tests$valid == TRUE)

good.profiles <- subset(mt647err, logic_tests$valid)
good.profiles
## SoilProfileCollection with 481 profiles and 2536 horizons
## profile ID: peiid  |  horizon ID: phiid 
## Depth range: 14 - 1552 cm
## 
## ----- Horizons (6 / 2536 rows  |  10 / 73 columns) -----
##   peiid   phiid hzdept hzdepb hzname texture genhz bounddistinct boundtopo clay
##  828138 4005848      0      5      O    <NA>  <NA>          <NA>      <NA>   NA
##  828138 4005852      5     18      E    GR-L  <NA>         clear    smooth   12
##  828138 4005850     18     38    Bw1   GRV-L  <NA>       gradual      wavy   14
##  828138 4005849     38     51    Bw2 CBV-FSL  <NA>       gradual      wavy   12
##  828138 4005853     51     71     BC  CBV-SL  <NA>         clear      wavy   12
##  828138 4005851     71     81      R    <NA>  <NA>          <NA>      <NA>   NA
## [... more horizons ...]
## 
## ----- Sites (6 / 481 rows  |  10 / 110 columns) -----
##   peiid siteiid pedon_id     site_id   obs_date utmzone utmeasting utmnorthing         x        y
##  828138  845413  P91-043 91MT6470043 1991-07-11      12   268288.0     5068425 -113.9781 45.73056
##  828139  845414  P91-029 91MT6470029 1991-06-27      12   280623.2     5147302 -113.8561 46.44361
##  828140  845415  P23-134 93MT6470134 1993-08-12      12   294091.0     5092141 -113.6569 45.95195
##  828141  845416  P93-025 93MT6470025 1993-07-08      12   279148.0     5151960 -113.8775 46.48500
##  828142  845417  P93-075 93MT6470075 1993-09-14      12   261086.6     5080380 -114.0764 45.83556
##  828143  845418  P93-074 93MT6470074 1993-09-14      12   261036.4     5080197 -114.0769 45.83389
## [... more sites ...]
## 
## Spatial Data:
## [EMPTY]

2.10 Extended Data Functions

Additional data related to both site and horizon information can be fetched using the get_extended_data_from_NASIS() function. This function returns a named list() with several tables that fetchNASIS draws from–in addition to the typical Site/Pedon/Component/Horizon tables.

There are a variety of calculated fields that are included in the default fetchNASIS result based on the extended data–you can make use of these to simplify queries and aggregations for a variety of analyses… And then follow up with more detailed data as needed.

2.10.1 Elements of get_extended_data_from_NASIS()

  • Ecological Site History ("ecositehistory")

  • Diagnostic Features ("diagnostic")

  • Diagnostic Feature TRUE/FALSE Summary ("diagHzBoolean")

  • Restrictions ("restriction")

  • Fragment and Texture Summaries

    • Horizon Fragments ("frag_summary")

    • Horizon Artifacts ("art_summary")

    • Surface Fragments ("surf_frag_summary")

    • Texture Class Modifiers ("texmodifier")

  • Geomorphic Table Summaries ("geomorph")

  • Parent Material Summaries ("pm")

  • Taxonomic History ("taxhistory")

  • Site Text Notes w/ Photo links("photo")

  • Horizon Structure ("struct")

  • Horizon Designation ("hzdesgn")

2.10.2 Load Example Data

Below is a summary of additional information that can be readily brought into R from your NASIS selected set via the get_extended_data_from_NASIS() function.

To download the sample 2015MT663% data from the course page with R:

example.data.dir <- "C:/workspace2"
example.data.path <- file.path(example.data.dir, "mt663.rda")

if (!dir.exists(example.data.dir))
  dir.create(example.data.dir, recursive = TRUE)

download.file("https://github.com/ncss-tech/stats_for_soil_survey/raw/master/data/book/02/mt663.rda", 
              destfile = example.data.path)

Before continuing, imagine opening the NASIS client, populating your selected set with 2015MT663% using a query like “NSSC Pangaea – POINT-Pedon/Site by User Pedon ID

Load the data like we did above.

# load the sample data
example.data.dir <- "C:/workspace2"
load(file.path(example.data.dir, "mt663.rda"))
## fetch extended site and horizon data from local NASIS
# mt663ext <- get_extended_data_from_NASIS_db()

We could use the get_extended_data_from_NASIS_db() function if 2015MT663% or other data were in the selected set, but we will use the mt663ext data we loaded from the .rda file.

The column names are the names of variables that you could join to your site or horizon data by various means. Generally these variable names, with a few exceptions, mirror the NASIS 7 data model names.

# site and pedon related extended data

# list all dataframes in the extended data
str(mt663ext, 1)
## List of 14
##  $ ecositehistory   :'data.frame':   0 obs. of  5 variables:
##  $ diagnostic       :'data.frame':   292 obs. of  4 variables:
##  $ diagHzBoolean    :'data.frame':   115 obs. of  20 variables:
##  $ restriction      :'data.frame':   11 obs. of  8 variables:
##  $ frag_summary     :'data.frame':   561 obs. of  18 variables:
##  $ art_summary      :'data.frame':   561 obs. of  14 variables:
##  $ surf_frag_summary:'data.frame':   134 obs. of  10 variables:
##  $ texmodifier      :'data.frame':   622 obs. of  5 variables:
##  $ geomorph         :'data.frame':   241 obs. of  9 variables:
##  $ taxhistory       :'data.frame':   115 obs. of  21 variables:
##  $ photo            :'data.frame':   793 obs. of  4 variables:
##  $ pm               :'data.frame':   179 obs. of  10 variables:
##  $ struct           :'data.frame':   444 obs. of  6 variables:
##  $ hzdesgn          :'data.frame':   561 obs. of  19 variables:
# vegetation data summary
colnames(mt663ext$ecositehistory) 
## [1] "siteiid"         "ecositeid"       "ecositenm"       "ecositecorrdate" "es_classifier"
# diagnostic features
colnames(mt663ext$diagnostic) 
## [1] "peiid"    "featkind" "featdept" "featdepb"
# surface rock fragments
colnames(mt663ext$surf_frag_summary)
##  [1] "peiid"               "surface_fgravel"     "surface_gravel"      "surface_cobbles"    
##  [5] "surface_stones"      "surface_boulders"    "surface_channers"    "surface_flagstones" 
##  [9] "surface_paragravel"  "surface_paracobbles"
# geomorphic description
colnames(mt663ext$geomorph)
## [1] "peiid"          "geomicrorelief" "geommicelev"    "geomfmod"       "geomfname"      "geomfeatid"    
## [7] "existsonfeat"   "geomfiidref"    "geomftname"
# taxonomic history data
colnames(mt663ext$taxhistory)
##  [1] "peiid"          "classdate"      "classifier"     "classtype"      "taxonname"      "localphase"    
##  [7] "taxonkind"      "seriesstatus"   "taxclname"      "taxpartsize"    "taxorder"       "taxsuborder"   
## [13] "taxgrtgroup"    "taxsubgrp"      "soiltaxedition" "osdtypelocflag" "taxmoistcl"     "taxtempregime" 
## [19] "taxfamother"    "psctopdepth"    "pscbotdepth"
# linked photo stored in site textnotes
colnames(mt663ext$photo) 
## [1] "siteiid"   "recdate"   "textcat"   "imagepath"
# site parent materials
colnames(mt663ext$pm)
##  [1] "siteiid"      "seqnum"       "pmorder"      "pmdept"       "pmdepb"       "pmmodifier"   "pmgenmod"    
##  [8] "pmkind"       "pmorigin"     "pmweathering"
###
### horizon related extended data
### 

# rock fragments 
colnames(mt663ext$frag_summary) 
##  [1] "phiid"                "fine_gravel"          "gravel"               "cobbles"             
##  [5] "stones"               "boulders"             "channers"             "flagstones"          
##  [9] "parafine_gravel"      "paragravel"           "paracobbles"          "parastones"          
## [13] "paraboulders"         "parachanners"         "paraflagstones"       "unspecified"         
## [17] "total_frags_pct_nopf" "total_frags_pct"
# soil texture modifers
colnames(mt663ext$texmodifier)
## [1] "peiid"  "phiid"  "phtiid" "seqnum" "texmod"
# soil structure data
colnames(mt663ext$struct) 
## [1] "phiid"         "structgrade"   "structsize"    "structtype"    "structid"      "structpartsto"

2.10.3 Visualizing Common Landforms

The following code generates a simple graphical summary of the 10 most commonly occurring "landform_string" (a calculated field in fetchNASIS()) to inspect which are the most common.

# load data from a NASIS selected set (or sample object)
pedons <- mt663

# create 'lf' object of landform factors sorted in descending order
lf <- sort(table(pedons$landform_string), decreasing = TRUE)

# plot top 10 or length, whichever is shorter
Hmisc::dotchart2(lf[1:pmin(10, length(lf))],
                 col = 'black',
                 xlim = c(0, max(lf)),
                 cex.labels = 0.75)

For a challenge and to further inspect your own data try the above code with some other summaries of geomorphic data produced by fetchNASIS().

You can swap landform_string for: landscape_string (landscape), hillslopeprof (2D), geomposmntn, geomposhill, geompostrce, geomposflats (3D), slope_shape, shapeacross, shapedown (slope shape across/down), microfeature_string (microfeature), or geomicrorelief_string (site observation microrelief).

2.10.4 Diagnostic Features

2.10.4.1 Boolean Diagnostic Features in fetchNASIS

If diagnostic features are populated in the Pedon Diagnostic Features table in NASIS, then Boolean (TRUE or FALSE) fields are created for each diagnostic feature type found in the data brought in by fetchNASIS.

These fields can be used to model presence / absence of a diagnostic soil feature by extracting the site data from the SoilProfileCollection with site().

2.10.4.2 Thickness from Diagnostic Features Table

The following is an example of how you could use the diagnostic features (if populated!) from the extended data to determine the thickness of a diagnostic feature of interest.

# get diagnostic features associated with pedons loaded from selected set
d <- diagnostic_hz(mt663)

# summary of the diagnostic features in your data!
unique(d$featkind)
##  [1] "ochric epipedon"          "cambic horizon"           "lithic contact"          
##  [4] "mollic epipedon"          "argillic horizon"         "redox concentrations"    
##  [7] "andic soil properties"    "secondary carbonates"     "sapric soil materials"   
## [10] "aquic conditions"         "reduced matrix"           "albic horizon"           
## [13] "spodic horizon"           "glossic horizon"          "spodic materials"        
## [16] "lithologic discontinuity" "densic materials"         "umbric epipedon"         
## [19] "albic materials"          NA
# tabulate
sort(table(droplevels(factor(d$featkind))), decreasing = TRUE)
## 
##          ochric epipedon           cambic horizon         argillic horizon          mollic epipedon 
##                       61                       54                       43                       42 
##    andic soil properties           lithic contact     secondary carbonates          umbric epipedon 
##                       30                       20                        7                        7 
##            albic horizon           spodic horizon          glossic horizon           reduced matrix 
##                        6                        4                        3                        3 
##    sapric soil materials lithologic discontinuity          albic materials         aquic conditions 
##                        3                        2                        1                        1 
##         densic materials     redox concentrations         spodic materials 
##                        1                        1                        1
# subset argillic horizons
d <- d[d$featkind == 'argillic horizon', ]

# create a new column and subtract the upper from the lower depth
d$argillic_thickness_cm <- d$featdepb - d$featdept

# create another new column with the upper depth to the diagnostic feature
d$depth_to_argillic_cm <- d$featdept

# omit NA values
d <- na.omit(d)

# subset to pedon records IDs and calculated thickness
d <- d[, c('peiid', 'argillic_thickness_cm', 'depth_to_argillic_cm')]
head(d)
##      peiid argillic_thickness_cm depth_to_argillic_cm
## 7  1092610                    56                   30
## 24 1092617                    38                   34
## 26 1092618                    29                   23
## 28 1092619                    38                   32
## 30 1092620                    29                   24
## 33 1092621                    23                   19
# left-join with existing site data
site(mt663) <- d

# plot as histogram
par(mar = c(4.5, 4.5, 1, 1))

# note additional arguments to adjust figure labels
hist(
  mt663$argillic_thickness_cm, 
  xlab = 'Thickness of argillic (cm)', 
  main = '',
  las = 1
)

hist(
  mt663$depth_to_argillic_cm, 
  xlab = 'Depth to argillic top depth (cm)', 
  main = '',
  las = 1
)

Quick check: What can you do with the boolean diagnostic feature data stored in the site table of a fetchNASIS SoilProfileCollection?

2.10.4.3 Diagnostic Feature Diagrams

# work up diagnostic plot based on the mt663 dataset loaded above
library(aqp)
library(soilDB)
library(sharpshootR)

# can limit which diagnostic features to show by setting 'v' manually
v <- c('ochric.epipedon', 'mollic.epipedon', 'andic.soil.properties', 
       'argillic.horizon', 'cambic.horizon', 
       'lithic.contact')

# the default concatenated landform_string may have multiple levels
#  depending on how the geomorphic tables were populated
#  these are concatenated using the ampersand (&) character
#  so take the first string split using ampersand as a delimiter
mt663$first_landform <- sapply(strsplit(mt663$landform_string, "&"), 
                                  function(x) x[[1]])

# plot with diagnostic features ordered according to co-occurrence
# v: site-level attributes to consider
# k: number of clusters to identify
diagnosticPropertyPlot(
  mt663[1:30, ], v = v, k = 5,
  grid.label = 'site_id', 
  dend.label = 'first_landform', 
  sort.vars = TRUE
)

2.11 Exercise 3: Diagnostic Horizons in Your Own Data

Use the following script to generate a diagnostic-feature diagram for the pedon data you’ve loaded from your NASIS selected set. Alternately, you may use the MT663 data from the example above, just substitute the object mt663 for f.

You can select a subset of desired diagnostic properties or use all diagnostic feature columns.

library(aqp)
library(soilDB)
library(sharpshootR)

# Load data
f <- fetchNASIS(from = 'pedons')

# ... May need to use subset() to reduce the number of pedons!

# get all diagnostic feature columns from site data 
# by pattern matching on '[.]' in the site attribute names
# this is not a generic rule, but works here
idx <- grep('[.]', siteNames(f))
v <- siteNames(f)[idx]

# inspect v
v

# insert diagnostics of interest from the possible list in 'v'
v <- c('ochric.epipedon', 'cambic.horizon', 
       'argillic.horizon', 'paralithic.contact',
       'lithic.contact')

# generate diagnostic property diagram
diagnosticPropertyPlot(
  f, v = v, k = 5, 
  grid.label = 'site_id', 
  dend.label = 'taxonname'
)

For more information on generating diagnostic feature diagrams, see the following tutorial:

Questions:

Use the results of site(), horizons(), diagnostic_hz() to answer the following questions about your own data. These functions return data.frame objects from a SoilProfileCollection derived from your local NASIS data.

  1. How many pedons do you have in your selected set? Check the number of rows in the site portion of the SoilProfileCollection with nrow(), or use the method length() on the collection.

  2. How many diagnostic features do you have for those pedons? How many different "featkind"? You can calculate the number of unique values with with length() and unique()

  3. Interpret: What diagnostic features did you choose to plot? Did you notice any patterns in the diagnostic features that tend to occur together?

Send your mentor a copy of your results and the commands you used.

2.12 Custom Queries to Local NASIS Database

fetchNASIS() and related convenience functions are wrappers around commonly used chunks of SQL (Structured Query Language). Queries of the NASIS local database can be written in T-SQL which is the dialect of SQL used to communicate with Microsoft SQL Server. This is the connection that you set for the pre-course.

To create a DBIConnection object that can access the NASIS local database, use dbConnectNASIS() or simply NASIS(). This will look for an ODBC connection of the name "nasis_local" and will authenticate with the read-only credentials.

You can also pass an existing DBIConnection or path to SQLite file as the dsn argument to connect to an alternate source. The default driver type used for NASIS connections is OdbcConnection from the odbc package.

To query or execute SQL commands on a connection you created, you can use dbGetQuery(connection, query), where connection is your DBIConnection object and query is the SQL to execute.

The soilDB package also includes a helper function dbQueryNASIS(), which functions just like dbGetQuery() except the default is to close the connection after returning the result. This can be convenient for interactive use or cases where a function needs to only execute a single query, as it will prevent you from leaving connections open accidentally.

2.12.1 Example: Site Soil Temperature Data from CA792 (Sequoia-Kings Canyon National Park)

The following example will return all records in your selected set sitesoiltemp table, along with a couple of fields from the site, siteobs, and pedon tables. This is a convenient way to collect all of the field-based soil temperature data associated with the pedons in your selected set for further analysis.

You can use the CA792 (Sequoia-Kings Canyon National Park) pedons as an example. Use a query that searches user pedon ID for the following pattern %CA792% to download and populate a selected set in the NASIS client.

library(soilDB)

# write query as a character string
q <- "SELECT siteiid as siteiid, peiid, usiteid as site_id, 
             upedonid as pedon_id, obsdate as obs_date,
             soitemp, soitempdep FROM site_View_1
       INNER JOIN siteobs_View_1 ON site_View_1.siteiid = siteobs_View_1.siteiidref
       LEFT OUTER JOIN sitesoiltemp_View_1 ON siteobs_View_1.siteobsiid = sitesoiltemp_View_1.siteobsiidref
       LEFT OUTER JOIN pedon_View_1 ON siteobs_View_1.siteobsiid = pedon_View_1.siteobsiidref
      ORDER BY obs_date, siteiid;"

# setup connection to local NASIS
channel <- NASIS()

# exec query
d <- dbQueryNASIS(channel, q)

The functions dbConnectNASIS() (alias NASIS()) and dbQueryNASIS() allow you to create a connection to the NASIS local database and send queries to that connection, respectively. By default, dbQueryNASIS() will close your connection after completing the query; you can change this by setting close=FALSE.

# check results
str(d)

# remove records missing values
d <- na.omit(d)

# tabulate unique soil depths
table(d$soitempdep)

# extract doy of year
d$doy <- as.integer(format(d$obs_date, "%j"))

# when where measurements collected?
hist(
  d$doy,
  xlim = c(1, 366),
  las = 1,
  main = 'Soil Temperature Measurements',
  xlab = 'Day of Year'
)

# soil temperature by day of year
plot(
  soitemp ~ doy,
  data = d,
  main = 'Soil Temperature Measurements (CA792)\nNASIS "Site Soil Temperature" table',
  type = 'p',
  pch = 21, 
  bg = 'royalblue',
  xlim = c(1, 366),
  ylim = c(-1, 26),
  xlab = 'Day of Year',
  ylab = 'Soil Temperature at 50cm (deg C)',
  las = 1
)

# vernal equinox, summer solstice, autumnal equinox, winter solstice
x <- as.Date(c('2022-03-20', '2022-06-21', '2022-09-23', '2022-12-21'))

# convert dates -> Julian date, or day-of-year
doy <- as.integer(format(x, "%j"))

# add vertical lines
abline(v = doy, lty = 3, col = grey(0.45))

# annotate
# pos argument: left-center offset
text(
  x = doy, 
  y = -1, 
  labels = c('vernal equinox', 'summer solstice', 'autumnal equinox', 'winter solstice'), 
  pos = 2,
  cex = 0.75,
  font = 3
)

# box/whisker plot showing distribution of day-of-year 
# when measurements (pedon descriptions) were collected
boxplot(d$doy, at = 26, horizontal = TRUE, add = TRUE, lty = 1, width = 1, col = 'white', axes = FALSE)

+