Chapter 2 The Data We Use
2.1 Objectives (The Data We Use)
- Expand on basic R skills from Chapter 1
- Inspect and work with different data types
- Perform operations on data such as filtering and aggregation
- Begin to explore regular expression (regex) patterns for text data
- Learn how functions can be used to bundle operations
- Work with Soil Data Sources and Structures
- Use the soilDB package to load data into R
- Understand the
SoilProfileCollection
(SPC) object - Learn about the data checks in the
fetchNASIS()
function
2.2 The Structure of Soil Data
What if you could extract, organize, and visualize data from NASIS and many other commonly used soil database sources with a couple of lines of code?
The aqp (Algorithms for Quantitative Pedology) and soilDB packages enable data to be fetched from various sources and cast into a SoilProfileCollection (SPC) object. Tabular and spatial data objects fetched via soilDB and processed using aqp methods can simplify the process of working with commonly used soil data.
2.2.1 Package References
The manual pages for soilDB
and aqp
are accessible (click index at the bottom of the Help tab in RStudio) by entering the following into the R console:
# load the libraries
library(aqp)
library(soilDB)
2.2.2 Importance of Pedon Data
The importance of pedon data for present and future work cannot be overstated. These data represent decades of on-the-ground observations of the soil resource for a given area.
As difficult as it may be to take the time to enter legacy pedon data, it is vitally important that we capture this resource and get these data into NASIS as an archive of point observations.
2.2.3 Some Issues With Pedon Data
Making and documenting observations of soil requires hard work. Digging is difficult, and writing soil descriptions is time consuming!
Our confidence in observations typically weakens with the depth of the material described.
- If we acknowledge this, which we must, then how do we deal with it in pedon data?
- Use a cutoff depth, for example 100 cm, can be used to truncate observations to a zone of greater confidence.
- Show the relative confidence of the data with depth.
- If we acknowledge this, which we must, then how do we deal with it in pedon data?
2.3 Challenges with Pedon Data
- Consistency
- Missing data
- Confidence in the observations
- Uncertainty with depth
- Description style differences
- Depth described, horizonation usage styles
- Legacy data vintage
- Decadal span of data
- Taxonomy updates, horizon nomenclature changes
- Location confidence
- Origin of the location information
- Datum used for data collection
- Accuracy for GPS values at the time of data collection
2.4 The SoilProfileCollection
The SoilProfileCollection
class (SPC) provided by the aqp
package is a specialized structure for soil data analysis. It simplifies the process of working with collections of data associated with soil profiles, e.g., site-level, horizon-level, spatial, diagnostic horizons, and other metadata.
A SoilProfileCollection
is similar to the NASIS Site/Pedon “object” in that it provides generalizations, specific routines and rules about data tables and their relationships.
The SoilProfileCollection
is an S4 R object. S4 objects have slots. Of primary importance, are the slots for site-level and horizon-level data.
In many ways the SPC is more adaptable than the NASIS “Pedon” concept because it is more general. However, the SPC is not as expressive as the complex hierarchy of objects in NASIS, which are more aligned with data archival vs. analysis.
2.4.1 SoilProfileCollection
methods
Many “familiar” methods are defined for the SoilProfileCollection
object. Some are unique, and others operate like more common functions of vector and data.frame
objects, such as nrow()
(“how many horizons?”) or length()
(“how many sites/pedons?”).
Perhaps most importantly, when you access the site data (with site(<object>)
) or the horizon data (with horizons(<object>)
) of a SoilProfileCollection
, you get a data.frame
object that you can use like any other you might use or make in R.
2.4.1.1 Promoting a data.frame
to SoilProfileCollection
The SoilProfileCollection
object is a collection of 1-dimensional profile descriptions, of the type conventionally described on a Form 232, or of tabular data returned from laboratory.
The object is “horizon data forward” in that you start with the layers, and can add or create site-level attributes by normalization, joins, and calculation.
Most of the time if you are using your NASIS data, or an official database, there are defined ways of getting the data “into” an SPC. For example, fetchOSD
returns a SoilProfileCollection
that has been assembled from horizon and site level attributes gleaned from the OSDs text, Soil Classification database, and other sources.
In the pre-course, we had you set up a process so you could connect to your local NASIS instance to “fetch” data and have methods like fetchNASIS
put things together for you.
This input to make a SoilProfileCollection
can be represented most simply as a data.frame
with unique site or profile ID and depth combinations for each horizon or layer–for example, a subset of the phorizon
or chorizon
table in NASIS.
A simple demonstration of “tabular horizon data” is the sp4
data set bundled with aqp
: some serpentine soil profiles stored in a data.frame
in the aqp
package (after McGahan et al., 2009).
library(aqp)
# Load sample serpentine soil data (McGahan et al., 2009)
data(sp4, package = "aqp")
# this is a data.frame
# same as if loaded from CSV file etc.
class(sp4)
## [1] "data.frame"
# inspect the first couple of rows
head(sp4)
## id name top bottom K Mg Ca CEC_7 ex_Ca_to_Mg sand silt clay CF
## 1 colusa A 0 3 0.3 25.7 9.0 23.0 0.35 46 33 21 0.12
## 2 colusa ABt 3 8 0.2 23.7 5.6 21.4 0.23 42 31 27 0.27
## 3 colusa Bt1 8 30 0.1 23.2 1.9 23.7 0.08 40 28 32 0.27
## 4 colusa Bt2 30 42 0.1 44.3 0.3 43.0 0.01 27 18 55 0.16
## 5 glenn A 0 9 0.2 21.9 4.4 18.8 0.20 54 20 25 0.55
## 6 glenn Bt 9 34 0.3 18.9 4.5 27.5 0.20 49 18 34 0.84
To convert this horizon data into a SoilProfileCollection
, we need to identify three parameters: idname
, top
, and bottom
. These parameters refer to the columns of unique profile IDs, top depths and bottom depths, respectively.
There are a couple of important constraints and considerations:
- records (rows) represent horizons
- profiles are uniquely identified by a column (user pedon ID, pedon record ID, etc.)
- profiles IDs cannot contain missing values (NA)
- horizon top and bottom depths are identified by columns
- ideally there are no gaps, overlap, or missing top/bottom depths (more on that later)
Use a formula to specify column names in the data.frame
, in this case "id"
, "top"
and "bottom"
.
# profile ID ~ top depth + bottom depth
depths(sp4) <- id ~ top + bottom
# note new class
class(sp4)
## [1] "SoilProfileCollection"
## attr(,"package")
## [1] "aqp"
# compact summary
sp4
## SoilProfileCollection with 10 profiles and 30 horizons
## profile ID: id | horizon ID: hzID
## Depth range: 16 - 49 cm
##
## ----- Horizons (6 / 30 rows | 10 / 14 columns) -----
## id hzID top bottom name K Mg Ca CEC_7 ex_Ca_to_Mg
## colusa 1 0 3 A 0.3 25.7 9.0 23.0 0.35
## colusa 2 3 8 ABt 0.2 23.7 5.6 21.4 0.23
## colusa 3 8 30 Bt1 0.1 23.2 1.9 23.7 0.08
## colusa 4 30 42 Bt2 0.1 44.3 0.3 43.0 0.01
## glenn 5 0 9 A 0.2 21.9 4.4 18.8 0.20
## glenn 6 9 34 Bt 0.3 18.9 4.5 27.5 0.20
## [... more horizons ...]
##
## ----- Sites (6 / 10 rows | 1 / 1 columns) -----
## id
## colusa
## glenn
## kings
## mariposa
## mendocino
## napa
## [... more sites ...]
##
## Spatial Data:
## [EMPTY]
The formula expresses the idea that a profile id
defined by set of top
and bottom
depths.
NOTE: A character vector with same names has the same effect, and can be easier to “program” with than the formula-based syntax.
depths(sp4) <- c("id", "top", "bottom")
2.4.2 Promoting to “Spatial” SoilProfileCollection
You can also use the SoilProfileCollection
to manage the information about a profile’s position on the Earth.
Chapter 4 will cover spatial data in greater detail, and the SoilProfileCollection Reference has a section on Spatial Data. For now know you can use the initSpatial<-
method to define “X” and “Y” coordinate columns and the coordinate reference system in one line:
$x <- runif(10); sp4$y <- runif(10) # dummy XY coordinates
sp4initSpatial(sp4, crs = "EPSG:4326") <- ~ x + y
This is new syntax introduced in aqp 2.0, the older syntax uses the coordinates<-
and proj4string<-
methods.
2.4.2.1 Extracting Site and Horizon Data
You can extract values from the collection’s @site
and @horizon
slots using the site()
and horizons()
functions. These create data.frame
objects that are separate from the SoilProfileCollection
.
# extract site data from SPC into new data.frame 's'
# note that it only contains profile IDs
<- site(sp4)
s str(s)
## 'data.frame': 10 obs. of 1 variable:
## $ id: chr "colusa" "glenn" "kings" "mariposa" ...
# extract horizon data from SPC into new data.frame 'h'
<- horizons(sp4)
h str(h)
## 'data.frame': 30 obs. of 14 variables:
## $ id : chr "colusa" "colusa" "colusa" "colusa" ...
## $ name : chr "A" "ABt" "Bt1" "Bt2" ...
## $ top : int 0 3 8 30 0 9 0 4 13 0 ...
## $ bottom : int 3 8 30 42 9 34 4 13 40 3 ...
## $ K : num 0.3 0.2 0.1 0.1 0.2 0.3 0.2 0.6 0.8 0.6 ...
## $ Mg : num 25.7 23.7 23.2 44.3 21.9 18.9 12.1 12.1 17.7 28.3 ...
## $ Ca : num 9 5.6 1.9 0.3 4.4 4.5 1.4 7 4.4 5.8 ...
## $ CEC_7 : num 23 21.4 23.7 43 18.8 27.5 23.7 18 20 29.3 ...
## $ ex_Ca_to_Mg: num 0.35 0.23 0.08 0.01 0.2 0.2 0.58 0.51 0.25 0.2 ...
## $ sand : int 46 42 40 27 54 49 43 36 27 42 ...
## $ silt : int 33 31 28 18 20 18 55 49 45 26 ...
## $ clay : int 21 27 32 55 25 34 3 15 27 32 ...
## $ CF : num 0.12 0.27 0.27 0.16 0.55 0.84 0.5 0.75 0.67 0.25 ...
## $ hzID : chr "1" "2" "3" "4" ...
2.4.2.2 Methods like data.frame
The base R functions for accessing and setting data.frame
columns by name such as $
and [[
work for SoilProfileCollection
objects, too. Review data.frame
methods:
[[
and$
: single columns indata.frame
, by namex[['variable']]
x$variable
[
: combinations of rows and columns, by name or indexx[i, ]
: specified rows, all columnsx[, j]
: all rows, specified columnsx[i, j]
: specified rows, specified columns
See Chapter 1 and the Chapter 2 Appendix for additional details and examples.
2.4.2.2.1 Column Access by Name: $
and [[
# sp4 is a SoilProfileCollection
$clay sp4
## [1] 21 27 32 55 25 34 3 15 27 32 25 31 33 13 21 23 15 17 12 19 14 14 22 25 40 51 67 24 25 32
'clay']] sp4[[
## [1] 21 27 32 55 25 34 3 15 27 32 25 31 33 13 21 23 15 17 12 19 14 14 22 25 40 51 67 24 25 32
# horizon data.frame
$clay h
## [1] 21 27 32 55 25 34 3 15 27 32 25 31 33 13 21 23 15 17 12 19 14 14 22 25 40 51 67 24 25 32
'clay']] h[[
## [1] 21 27 32 55 25 34 3 15 27 32 25 31 33 13 21 23 15 17 12 19 14 14 22 25 40 51 67 24 25 32
# use $<- or [[<- to set proportional clay content
$clay <- sp4[['clay']] / 100
sp4
# undo what we did above; back to percentage
'clay']] <- sp4$clay * 100
sp4[[
# create new site variable ("numberone" recycled for all sites)
site(sp4)$newvar1 <- "numberone"
# create new horizon variable ("numbertwo" recycled for all horizons)
horizons(sp4)$newvar2 <- "numbertwo"
2.4.2.2.2 Row Access: [
The SoilProfileCollection
also has [
(“single bracket”), but with a different interpretation from the [i, j]
indexing of data.frame
objects.
In a
data.frame
you haveobject[row, column, drop=TRUE]
; the result is adata.frame
(or a vector with defaultdrop=TRUE
).In a
SoilProfileCollection
you haveobject[site, horizon]
; the result is aSoilProfileCollection
.
# i-index: first 2 profiles, all horizons
1:2, ] sp4[
## SoilProfileCollection with 2 profiles and 6 horizons
## profile ID: id | horizon ID: hzID
## Depth range: 34 - 42 cm
##
## ----- Horizons (6 / 6 rows | 10 / 15 columns) -----
## id hzID top bottom name K Mg Ca CEC_7 ex_Ca_to_Mg
## colusa 1 0 3 A 0.3 25.7 9.0 23.0 0.35
## colusa 2 3 8 ABt 0.2 23.7 5.6 21.4 0.23
## colusa 3 8 30 Bt1 0.1 23.2 1.9 23.7 0.08
## colusa 4 30 42 Bt2 0.1 44.3 0.3 43.0 0.01
## glenn 5 0 9 A 0.2 21.9 4.4 18.8 0.20
## glenn 6 9 34 Bt 0.3 18.9 4.5 27.5 0.20
##
## ----- Sites (2 / 2 rows | 2 / 2 columns) -----
## id newvar1
## colusa numberone
## glenn numberone
##
## Spatial Data:
## [EMPTY]
# j-index: all profiles; first 2 horizons of each profile
1:2] sp4[,
## SoilProfileCollection with 10 profiles and 20 horizons
## profile ID: id | horizon ID: hzID
## Depth range: 5 - 40 cm
##
## ----- Horizons (6 / 20 rows | 10 / 15 columns) -----
## id hzID top bottom name K Mg Ca CEC_7 ex_Ca_to_Mg
## colusa 1 0 3 A 0.3 25.7 9.0 23.0 0.35
## colusa 2 3 8 ABt 0.2 23.7 5.6 21.4 0.23
## glenn 5 0 9 A 0.2 21.9 4.4 18.8 0.20
## glenn 6 9 34 Bt 0.3 18.9 4.5 27.5 0.20
## kings 7 0 4 A 0.2 12.1 1.4 23.7 0.58
## kings 8 4 13 Bt1 0.6 12.1 7.0 18.0 0.51
## [... more horizons ...]
##
## ----- Sites (6 / 10 rows | 2 / 2 columns) -----
## id newvar1
## colusa numberone
## glenn numberone
## kings numberone
## mariposa numberone
## mendocino numberone
## napa numberone
## [... more sites ...]
##
## Spatial Data:
## [EMPTY]
When you use the [
function, everything in the SoilProfileCollection
is subset simultaneously depending on the constraints specified by the indices.
# First profile, first 2 horizons
horizons(sp4[1, 1:2])
## id name top bottom K Mg Ca CEC_7 ex_Ca_to_Mg sand silt clay CF hzID newvar2
## 1 colusa A 0 3 0.3 25.7 9.0 23.0 0.35 46 33 21 0.12 1 numbertwo
## 2 colusa ABt 3 8 0.2 23.7 5.6 21.4 0.23 42 31 27 0.27 2 numbertwo
All slots in the collection have a relationship to the site or i-index. When you remove sites (profiles), all associated records (e.g. spatial, diagnostics, horizons, etc.) in the object are removed.
Similarly, when all horizons are removed (say, you request the 6th j-index from a profile that has only 5 layers), the site index and all associated data are removed from the collection.
2.5 Exercise 1: Assemble a SoilProfileCollection
from several CSV files
Questions:
Run the code in the linked R file and answer these questions.
How many profiles (sites) and horizons are in the
granite
SoilProfileCollection? How many inandesite
? Seelength()
andnrow()
functions for SoilProfileCollection objects.Which profile (and which horizon in that profile) has the highest ratio of oxalate-extractable Fe to dithionite-citrate-extractable Fe (
Fe_o_to_Fe_d
)?
Send the results of questions (and any code you used) to your mentor.
2.6 Using the soilDB
Package
The soilDB
package for R provides functions for accessing data stored in NASIS, KSSL, SDA, SoilWeb, SoilGrids and other sources.
These high-level ‘fetch’ functions bundle or wrap lower-level get
functions which access internal database interfaces to NASIS and other data sources. The ODBC connection to NASIS that you set up during the pre-course is an example of this internal database interface.
Basic data checks are run within fetch
functions. These checks ensure the basic integrity of the data as it is queried and moved from its existing structure into an SPC. There are times when it is useful to use the lower-level get
functions individually. They generally return single data.frame
or list
of data.frame
.
You can set up scripts to make custom queries against these or other sources on your own – there is an example at the end of this section.
For now, we will start with the fetch
functions and others that will get you a large variety of data you can use for soil and ecological site analyses.
2.6.1 soilDB functions for tabular data
soilDB
functions are the quickest way to get up and running:
-
- Gets and re-packages data from a local NASIS database.
- soilDB Vignette Columns in
fetchNASIS(from="pedons")
- soilDB Vignette Columns in
- Gets and re-packages data from a local NASIS database.
-
- Gets Vegetation Plot and related/child tables into a list from a local NASIS database.
-
- Gets KSSL laboratory pedon/horizon layer data from a local NASIS database.
-
- Can be used to access SSURGO, STATSGO (spatial and tabular), and Lab DataMart snapshots
- Submits queries to the Soil Data Access system.
-
- Gets KSSL data from the Lab Data Mart snapshot in Soil Data Access
-
- Fetches legend/mapunit/component/horizon data from Soil Data Access.
-
- Gets KSSL data from the SoilWeb system via BBOX, MLRA, or series name query.
-
- Fetches a limited subset of horizon- and site-level attributes for named soil series from the SoilWeb system.
-
- Full-text searching of OSD sections.
-
- Queries soil and climate data from USDA-NRCS SCAN Stations.
-
- Downloads data from the Henry Mount Soil Climate Database.
-
- Fetches commonly used site and horizon data from a PedonPC (MS Access) database.
2.6.2 Open Database Connectivity (ODBC) Connection to NASIS
After setting up an ODBC connection, as you did as part of the pre-course, you can use R to access data from a selected set defined in your local NASIS database.
Does NASIS need to be open and running to query data using soilDB?
No, fetchNASIS()
works whether the NASIS application is running or not. You just need to make sure that the data you want has been loaded into your selected set.
2.6.3 fetchNASIS()
The fetchNASIS() convenience function extracts data from a NASIS selected set via Structured Query Language (SQL).
Note that the import process in fetchNASIS()
, and the other methods, is not comprehensive. It does not pull every column for every table related to pedon data out of NASIS.
Instead, it pulls essential / commonly used pedon and horizon data. Higher level functions like fetchNASIS()
bundle a series of lower-level queries to get specific parts of the Pedon or Component data structures. Much of the nested complexity of NASIS is simplified in the resulting object.
Many-to-one relationships are “flattened” where possible by fetchNASIS()
. This aggregates the data from various tables into one “site” record with related horizon records, per profile.
You can see the child tables that are aggregated using the get_extended_data_from_NASIS()
method, which returns a named list of child table sources that can be joined to the SoilProfileCollection
made with fetchNASIS()
using the internal record IDs.
2.6.3.1 fetchNASIS arguments
fetchNASIS()
has a number of different arguments:
from = ‘pedons’ or ‘components’ or ‘pedon_report’
- This option allows you to select which data you want to load from NASIS. Choosing either ‘pedons’ or ‘components’ will load data from your local database. If ‘pedon_report’ is specified then it will load data from the text file generated by the NASIS report ‘fetchNASIS’ (run offline). This is useful for loading more than 20,000 pedons at one time, such for an entire Soil Survey Region.
url = string specifying the (temporary) URL for the NASIS
pedon_report
output generated by thefetchNASIS
NASIS Report that can be run “Offline Against National Database” EXAMPLE OUTPUT (MT663)SS =
TRUE
/FALSE
- The Selected Set (SS) option allows you to choose whether you want the data to load from your current selected set in NASIS or from the local database tables. The default is set to
TRUE
so if unspecifiedfetchNASIS()
will always load from the data in the selected set.
- The Selected Set (SS) option allows you to choose whether you want the data to load from your current selected set in NASIS or from the local database tables. The default is set to
stringAsFactors =
NULL
- This option is no longer used. See
soilDB::NASISDomainsAsFactor()
- This option is no longer used. See
rmHzErrors =
TRUE
/FALSE
- Setting this value to
TRUE
removes pedons that do not pass checks for horizon depth consistency. Default:FALSE
- Setting this value to
nullFragsAreZero =
TRUE
/FALSE
- Setting this value to
TRUE
(the default) converts null entries for rock fragment volumes to 0. This is typically the right assumption because rock fragment data are typically populated only when observed. If you know that your data contain a combination of omitted information (e.g. no rock fragment volumes are populated) then consider setting this argument toFALSE
.
- Setting this value to
soilColorState = ‘moist’ or ‘dry’
- Select dry or moist colors to be converted and placed into a horizon-level attribute called
soil_color
. The default is set to ‘moist’ unless specified. Moist and dry colors are also stored inmoist_soil_color
anddry_soil_color
.
- Select dry or moist colors to be converted and placed into a horizon-level attribute called
lab =
TRUE
/FALSE
- This option allows for loading the data associated with horizons that may be in the
phlabresults
table. The default is set toFALSE
, which will not load records from thephlabresults
table.
- This option allows for loading the data associated with horizons that may be in the
fill =
TRUE
/FALSE
- This option adds horizons to pedons or components that lack horizon data in the database, preserving their “position” within the output SoilProfileCollection object. Default is
FALSE
such that profiles without horizons are omitted from the result.
- This option adds horizons to pedons or components that lack horizon data in the database, preserving their “position” within the output SoilProfileCollection object. Default is
dropAdditional =
TRUE
/FALSE
- Used only for
from='components'
withduplicates = TRUE
. Prevent “duplication” ofmustatus == "additional"
mapunits? Default:TRUE
- Used only for
dropNonRepresentative =
TRUE
/FALSE
- Used only for
from='components'
with duplicates = TRUE. Prevent “duplication” of non-representative data mapunits? Default:TRUE
- Used only for
duplicates = FALSE
- Used only for
from='components'
. Duplicate components for all instances of use (i.e. one for each legend data mapunit is used on; optionally for additional mapunits, and/or non-representative data mapunits?). This will include columns fromget_component_correlation_data_from_NASIS_db()
that identify which legend(s) each component is used on. NOTE: This requires that parent tables above “component” (such as legend, mapunit, correlation and datamapunit) are loaded in your NASIS database/selected set.
- Used only for
dsn =
NULL
- Optional: custom path to an SQLite snapshot or DBIConnection object to database with NASIS schema. See
soilDB::createStaticNASIS()
.
- Optional: custom path to an SQLite snapshot or DBIConnection object to database with NASIS schema. See
For more information on the data checks and adjusting the default options to fetchNASIS()
function, see the following resource: Tips on Getting Data from NASIS into R.
2.6.4 The gopheridge
Dataset
The gopheridge
sample data set is a sample R object returned from fetchNASIS()
in a self-contained .rda
file stored in soilDB
.
Open RStudio, and set up the environment by loading packages and the gopheridge
sample dataset.
library(aqp)
library(soilDB)
# load example dataset
data(gopheridge, package = "soilDB")
# what kind of object is this?
class(gopheridge)
## [1] "SoilProfileCollection"
## attr(,"package")
## [1] "aqp"
# what does the internal structure look like?
str(gopheridge, 2)
## Formal class 'SoilProfileCollection' [package "aqp"] with 9 slots
## ..@ idcol : chr "peiid"
## ..@ hzidcol : chr "phiid"
## ..@ depthcols : chr [1:2] "hzdept" "hzdepb"
## ..@ metadata :List of 7
## ..@ horizons :'data.frame': 317 obs. of 73 variables:
## ..@ site :'data.frame': 52 obs. of 137 variables:
## ..@ sp :Formal class 'SpatialPoints' [package "sp"] with 3 slots
## ..@ diagnostic :'data.frame': 164 obs. of 4 variables:
## ..@ restrictions:'data.frame': 56 obs. of 8 variables:
There are many columns in the data.frame objects within a fetchNASIS() SoilProfileCollection
.
The following guide is part of the soilDB documentation and describes the columns that are returned from a fetchNASIS(from="pedons")
call. In the future this guide will be extended to describe component data.
With siteNames()
and horizonNames()
we can view the names for eachslot respectively in the gopheridge
object.
# the fields at the site and horizon levels within the SPC
siteNames(gopheridge)
## [1] "peiid" "siteiid"
## [3] "pedon_id" "site_id"
## [5] "obs_date" "utmzone"
## [7] "utmeasting" "utmnorthing"
## [9] "x" "y"
## [11] "horizdatnm" "x_std"
## [13] "y_std" "gpspositionalerror"
## [15] "describer" "pedonpurpose"
## [17] "pedontype" "pedlabsampnum"
## [19] "labdatadescflag" "tsectstopnum"
## [21] "tsectinterval" "utransectid"
## [23] "tsectkind" "tsectselmeth"
## [25] "elev_field" "slope_field"
## [27] "aspect_field" "plantassocnm"
## [29] "earthcovkind1" "earthcovkind2"
## [31] "erocl" "bedrckdepth"
## [33] "bedrckkind" "bedrckhardness"
## [35] "hillslopeprof" "geomslopeseg"
## [37] "shapeacross" "shapedown"
## [39] "slopecomplex" "drainagecl"
## [41] "flodfreqcl" "floddurcl"
## [43] "flodmonthbeg" "pondfreqcl"
## [45] "ponddurcl" "pondmonthbeg"
## [47] "geomposhill" "geomposmntn"
## [49] "geompostrce" "geomposflats"
## [51] "swaterdepth" "surface_fine_gravel"
## [53] "surface_gravel" "surface_cobbles"
## [55] "surface_stones" "surface_boulders"
## [57] "surface_channers" "surface_flagstones"
## [59] "surface_parafine_gravel" "surface_paragravel"
## [61] "surface_paracobbles" "surface_parastones"
## [63] "surface_paraboulders" "surface_parachanners"
## [65] "surface_paraflagstones" "surface_unspecified"
## [67] "surface_total_frags_pct_nopf" "surface_total_frags_pct"
## [69] "slope_shape" "surface_fgravel"
## [71] "classdate" "classifier"
## [73] "classtype" "taxonname"
## [75] "localphase" "taxonkind"
## [77] "seriesstatus" "taxclname"
## [79] "taxpartsize" "taxorder"
## [81] "taxsuborder" "taxgrtgroup"
## [83] "taxsubgrp" "soiltaxedition"
## [85] "osdtypelocflag" "taxmoistcl"
## [87] "taxtempregime" "taxfamother"
## [89] "taxreaction" "taxfamhahatmatcl"
## [91] "psctopdepth" "pscbotdepth"
## [93] "selection_method" "ecositeid"
## [95] "ecositenm" "ecositecorrdate"
## [97] "es_classifier" "es_selection_method"
## [99] "ochric.epipedon" "argillic.horizon"
## [101] "paralithic.contact" "lithic.contact"
## [103] "mollic.epipedon" "slickensides"
## [105] "calcic.horizon" "duripan"
## [107] "cambic.horizon" "umbric.epipedon"
## [109] "anthropic.epipedon" "histic.epipedon"
## [111] "paralithic.materials" "lithologic.discontinuity"
## [113] "andic.soil.properties" "densic.contact"
## [115] "abrupt.textural.change" "aquic.conditions"
## [117] "redox.depletions.with.chroma.2.or.less" "redox.concentrations"
## [119] "reduced.matrix" "densic.materials"
## [121] "albic.horizon" "spodic.horizon"
## [123] "fibric.soil.materials" "hemic.soil.materials"
## [125] "sapric.soil.materials" "volcanic.glass"
## [127] "folistic.epipedon" "strongly.contrasting.particle.size.class"
## [129] "human.transported.material" "albic.materials"
## [131] "secondary.carbonates" "landform_string"
## [133] "landscape_string" "microfeature_string"
## [135] "geomicrorelief_string" "pmkind"
## [137] "pmorigin"
horizonNames(gopheridge)
## [1] "phiid" "peiid" "hzname" "genhz"
## [5] "hzdept" "hzdepb" "bounddistinct" "boundtopo"
## [9] "clay" "silt" "sand" "fragvoltot"
## [13] "texture" "texcl" "lieutex" "phfield"
## [17] "effclass" "labsampnum" "rupresblkdry" "rupresblkmst"
## [21] "rupresblkcem" "stickiness" "plasticity" "ksatpedon"
## [25] "texture_class" "hzID" "d_r" "d_g"
## [29] "d_b" "d_hue" "d_value" "d_chroma"
## [33] "d_sigma" "m_r" "m_g" "m_b"
## [37] "m_hue" "m_value" "m_chroma" "m_sigma"
## [41] "moist_soil_color" "dry_soil_color" "soil_color" "fine_gravel"
## [45] "gravel" "cobbles" "stones" "boulders"
## [49] "channers" "flagstones" "parafine_gravel" "paragravel"
## [53] "paracobbles" "parastones" "paraboulders" "parachanners"
## [57] "paraflagstones" "unspecified" "total_frags_pct_nopf" "total_frags_pct"
## [61] "art_fgr" "art_gr" "art_cb" "art_st"
## [65] "art_by" "art_ch" "art_fl" "art_unspecified"
## [69] "total_art_pct" "huartvol_cohesive" "huartvol_penetrable" "huartvol_innocuous"
## [73] "huartvol_persistent"
2.6.4.1 Make profile sketches
The plotSPC()
or plot()
function applied to a SoilProfileCollection
object generates sketches based on horizon depths, designations, and colors. The SoilProfileCollection
Reference contains many examples demonstrating way in which these sketches can be customized.
The fetchNASIS()
function automatically converts moist Munsell colors into R-style colors, available in the soil_color
horizon level attribute. An approximate color mixture is used when multiple colors per horizon are present.
See ?plotSPC
for a detailed list of arguments and examples.
The Soil Profile Sketches tutorial contains additional examples that demonstrate various ways to customize soil profile sketches.
par(mar = c(1, 1, 1, 1))
# omitting pedon IDs and horizon designations
plotSPC(gopheridge,
print.id = FALSE,
name = NA,
width = 0.3)
title('Pedons from the `gopheridge` sample dataset', line = -0.5)
Additional examples / documentation related to soil profile sketches:
2.6.4.2 Pedon Data Checks
When you load pedons using the fetchNASIS()
function, the following data checks are performed:
Presence of multiple map datums. Results reported to the user and the data are not modified.
Inconsistent horizon boundaries. Pedons with inconsistent horizon boundaries are not loaded unless
rmHzErrors = FALSE
. In most cases, this occurs when the bottom depth of a horizon is not the same as the upper depth of the next lower horizon.
## hzname top bot
## 1 A 0 30
## 2 Bt1 38 56
## 3 Bt2 56 121
## 4 Bk 121 135
## 5 R 135 NA
Note the issue above. The bottom depth of the A horizon and the upper depth of the Bt1 horizon should be the same: either 30 or 38 cm. The correct depth needs to be determined and fixed in the database.
Missing lower horizon depths. Offending horizons are fixed by replacing the missing bottom depth with the top depth plus 2 cm. In the case of the profile shown above, a bottom depth of 137 cm would be inserted where the depth is missing.
Sites missing pedon records. Data without corresponding horizons are not loaded.
2.6.4.3 Find Pedons with Errors
If errors in the pedon data are detected when loading data using fetchNASIS()
, the following “get” commands can trace them back to the corresponding records in NASIS.
These access an option that is stored in a special object called an Environment
associated with the soilDB package – they generally contain vectors of IDs.
get('sites.missing.pedons', envir = soilDB.env)
- Returns user site ID for sites missing pedons
get('dup.pedon.ids', envir = soilDB.env)
- Returns user pedon ID for sites with duplicate pedon ID
get('bad.pedon.ids', envir = soilDB.env)
- Returns user pedon ID for pedons with inconsistent horizon depths
get('bad.horizons', envir = soilDB.env)
- Returns a data.frame of horizon-level information for pedons with inconsistent horizon depths
get('sites.missing.pedons', envir = soilDB.env)
get('dup.pedon.ids', envir = soilDB.env)
get('bad.pedon.ids', envir = soilDB.env)
get('bad.horizons', envir = soilDB.env)
These get()
calls access variables stored in the package environment soilDB.env
. The variables only exist if there are “problems” / values found by the data checks – and if you fix the errors in the NASIS database and the checks don’t find any errors then nothing will be returned by these functions.
2.6.5 Further Reading / Reference
The
SoilProfileCollection
Reference contains examples based on thesp4
sample dataset. This is a good exercise for the class and reference for future work.The Querying Soil Series Data document contains a more detailed explanation of how the data behind
fetchOSD
were created and how to interpret its results.The Competing Soil Series document contains a number of examples related to getting OSD morphology using
fetchOSD
, laboratory data usingfetchKSSL
, and summary via profile sketches and slice-wise aggregation.The “SPC Plotting Ideas” document outlines some advanced techniques for arranging and annotating soil profile sketches.
Several examples pertaining to SDA, soil texture, and soil profile sketches are explored in this Random Walk 001.
2.7 Working with Data in R
2.7.1 Summaries
Now that you’ve loaded some data, you can look at additional ways to summarize and interact with data elements.
2.7.1.1 table()
and Cross Tabulation
The base R table()
function is very useful for quick summary operations. It returns a named vector with the amount of each unique level of the a given vector.
The numeric vector of “counts” is commonly combined with other functions such as sort()
, order()
, prop.table()
, is.na()
to identify abundance, proportions, or missing data (NA
).
# load required packages
library(aqp)
library(soilDB)
data("gopheridge", package = "soilDB")
# for these examples, we use the gopheridge object as our "selected set"
<- gopheridge
pedons
## you can use fetchNASIS to load your own data, like this:
# pedons <- fetchNASIS()
# summarize which soil taxa we have loaded
table(pedons$taxonname)
##
## Gopheridge
## 52
# sort taxonomic names in descending order
sort(table(pedons$taxonname), decreasing = TRUE)
## Gopheridge
## 52
# could do the same thing for taxonomic subgroups
table(pedons$taxsubgrp)
##
## mollic haploxeralfs typic haploxerepts ultic haploxeralfs ultic haploxerolls
## 1 6 44 1
sort(table(pedons$taxsubgrp), decreasing = TRUE)
##
## ultic haploxeralfs typic haploxerepts mollic haploxeralfs ultic haploxerolls
## 44 6 1 1
We can convert counts in the table()
result into proportions with prop.table()
:
prop.table(table(pedons$taxsubgrp))
##
## mollic haploxeralfs typic haploxerepts ultic haploxeralfs ultic haploxerolls
## 0.01923077 0.11538462 0.84615385 0.01923077
table()
can be used to get counts over multiple dimensions of factor levels.
We call this process cross tabulation.
For instance, let’s cross tabulate taxonomic subgroup (taxsubgrp
) and the particle size family class (taxpartsize
) for pedons
table(pedons$taxsubgrp, pedons$taxpartsize)
##
## clayey-skeletal coarse-loamy fine fine-loamy loamy-skeletal
## mollic haploxeralfs 0 0 0 0 1
## typic haploxerepts 1 0 0 0 5
## ultic haploxeralfs 2 0 1 1 40
## ultic haploxerolls 0 1 0 0 0
As expected gopheridge
the vast majority of pedons are Loamy-skeletal Ultic Haploxeralfs.
Since pedons
contains site and horizon level data, when cross-tabulating we need to be sure pair a column from the site data with other site-level columns, and the same for horizon-level. This is because all arguments to table()
must have the same length.
For example, let’s cross-tabulate horizon designation (hzname
) with horizon texture class (texcl
).
We can use the addmargins()
function to add the row and column sums to the margins of the table for easier interpretation when there are many rows/columns in the result.
addmargins(table(pedons$hzname, pedons$texcl))
##
## c cl l scl sic sicl sil sl Sum
## 2BCt5 0 1 0 0 0 0 0 0 1
## 2Bt1 0 0 3 0 0 0 0 0 3
## 2Bt2 1 2 1 1 0 0 0 0 5
## 2Bt3 1 3 1 0 0 0 0 0 5
## 2Bt4 2 0 0 0 0 0 0 0 2
## 2CBt 0 0 0 1 0 0 0 0 1
## 2Cr 0 0 0 0 0 0 0 0 0
## 2Crt 0 0 0 0 0 0 0 0 0
## 2R 0 0 0 0 0 0 0 0 0
## A 0 0 33 0 0 0 12 3 48
## A1 0 0 2 0 0 0 2 0 4
## A2 0 0 2 0 0 0 2 0 4
## A3 0 0 1 0 0 0 0 0 1
## AB 0 0 2 0 0 0 2 0 4
## BA 0 0 17 0 0 1 0 0 18
## BC 0 0 4 0 0 0 0 0 4
## BCt 0 2 2 1 0 0 1 0 6
## Bt 0 0 1 1 1 0 0 0 3
## Bt1 0 3 28 1 0 3 5 0 40
## Bt2 3 15 14 1 1 3 1 0 38
## Bt3 4 8 6 1 0 1 1 0 21
## Bt4 1 2 2 0 0 0 0 0 5
## Bw 0 0 6 0 0 0 1 0 7
## Bw1 0 0 5 0 0 0 0 0 5
## Bw2 0 0 5 0 0 0 0 0 5
## Bw3 0 0 3 0 0 0 0 0 3
## C 1 0 1 0 0 0 0 0 2
## C/Brt 0 1 0 0 0 0 0 0 1
## CBt 0 0 0 1 0 0 0 0 1
## Cr 0 0 1 0 0 0 0 0 1
## Crt 0 0 0 0 0 0 0 0 0
## Ct 0 0 1 0 0 0 0 0 1
## Oe 0 0 0 0 0 0 0 0 0
## Oi 0 0 0 0 0 0 0 0 0
## R 0 0 0 0 0 0 0 0 0
## Sum 13 37 141 8 2 8 27 3 239
2.7.2 Missing Values
table(pedons$taxsubgrp, useNA = "ifany")
##
## mollic haploxeralfs typic haploxerepts ultic haploxeralfs ultic haploxerolls
## 1 6 44 1
# is.na(...)
table(is.na(pedons$taxsubgrp))
##
## FALSE
## 52
# is NOT NA !is.na(...)
table(!is.na(pedons$taxsubgrp))
##
## TRUE
## 52
# it can also be applied to horizon level columns in the SPC
sort(table(pedons$texture), decreasing=TRUE)
##
## BR L GR-L GRV-L CBV-L SPM GRX-L SIL GRV-CL CBV-CL GR-SIL
## 58 36 33 24 18 14 12 12 9 8 7
## CBX-L GRX-CL CBX-CL GRV-SIL CL GRV-SCL GRX-C MPM SL CB-L GR-CL
## 6 5 4 4 3 3 3 3 3 2 2
## GRX-SCL PGR-C PGRX-L SICL STV-CL STV-L STX-C STX-L C CB-C CB-CL
## 2 2 2 2 2 2 2 2 1 1 1
## CB-SCL CB-SIL CBV-SIL CBX-SCL CN-L CN-SICL CNX-L CNX-SICL FLV-L GR-C GR-SIC
## 1 1 1 1 1 1 1 1 1 1 1
## GRV-SICL GRX-SIC GRX-SIL PCB-SICL PCBV-SICL PCN-C PCNX-CL PGRV-C PGRV-CL PGRX-SCL PGRX-SIL
## 1 1 1 1 1 1 1 1 1 1 1
## ST-L STV-C STX-CL STX-SICL
## 1 1 1 1
2.7.3 Logical Operators
Logical operators act on vectors for the purposes of comparison.
==
“EQUAL TO”!=
“NOT EQUAL TO”<
LESS than- LESS than or equal to
<=
- LESS than or equal to
>
GREATER than- GREATER than or equal to
>=
- GREATER than or equal to
%in%
Equivalent toIN ()
in SQL; same logic asmatch()
but returns a boolean, not integerExample:
pedons$taxpartsize %in% c('loamy-skeletal', 'sandy-skeletal')
Returns a vector of
TRUE
/FALSE
equal in length to left-hand side
&
logical AND|
logical OR
2.7.4 Pattern Matching
The following examples use the grep()
function to pattern match within the data, create an index of the SoilProfileCollection
for records that match the specified pattern within that column, and then use that index to filter to specific sites and their corresponding profiles.
Patterns are specified using regular expression (REGEX) syntax.
This process can be applied to many different columns in the SPC based on how you need to filter the data. This example pattern matches on the tax_subgroup
column, but another useful application might be to pattern match on geomorphology or parent material.
Say we want to see what the variation of particle size classes are within a specific subgroup? We can use grep()
to create a row index, then apply that index to the SoilProfileCollection
.
# create a numeric index for pedons with taxsubgroup containing 'typic'
<- grep('typic', pedons$taxsubgrp)
idx idx
## [1] 11 12 13 14 26 50
# use square bracket notation to subset 'typic' soils in `subset1` object
<- pedons[idx,]
subset1
# or use the index directly to summarize taxpartsize for 'typic' soils
sort(table(pedons$taxpartsize[idx]), decreasing = TRUE)
##
## loamy-skeletal clayey-skeletal
## 5 1
Note: grep()
below has an invert
argument (default FALSE
). This option is very useful for excluding the results of the pattern matching process by inverting whatever the result is. grepl()
is the logical version of grep()
, so you can invert it using the logical NOT operator: !
.
Another method is to create an index using which()
function. which()
takes any logical vector (or expression), and it returns the indices (positions) where that expression returns TRUE
. The use of which
becomes more important when there are missing values (NA) in an expression.
Do a graphical check to see the “typic” profiles are selected.
Plot them in R using the SoilProfileCollection
“plot” method (e.g., specialized version of the generic plot()
function).
# adjust margins
par(mar=c(1,0,0,1))
# plot the first 10 profiles of subset1
plot(subset1[1:10, ], label = 'taxsubgrp', max.depth = 60)
title('Pedons with the word "typic" at subgroup-level of Soil Taxonomy', line=-2)
For more information on using regular expressions in grep()
for pattern matching operations, see: Regular-expression-syntax.
Quick check: Compare or run these commands with some code, and review the documentation, to answer the questions.
- True or False:
grepl()
returns a numeric vector - True or False:
which(grepl('typic', pedons$taxsubgrp))
is the same asgrep('typic', pedons$taxsubgrp)
.
2.7.4.1 REGEX Quick Start
.
One character, any character*
Zero-or-more Quantifier (of previous token)+
One-or-more Quantifier (of previous token){n}
quantifier wheren
is the the number of a match “repeats” (of previous token)[A-Z!]
ONE capital letter, or an exclamation mark[0-9]{2}
TWO numbers (using{
quantifier)|
is equivalent toOR
:- Example:
grep('loamy|sandy', c("loamy-skeletal","sandy","sandy-skeletal"))
- “loamy OR sandy”
- Example:
^
Anchor to beginning of string / line:- Example:
grep('^sandy', c("loamy-skeletal","sandy","sandy-skeletal"))
- “STARTS WITH sandy”
- Example:
$
Anchor to end of string / line:- Example:
grep('skeletal$', c("loamy-skeletal","sandy","sandy-skeletal"))
- “ENDS WITH skeletal”
- Example:
\\b
Anchor to word boundary:- Example:
grep('\\bmesic', c("mesic","thermic","isomesic"))
- “WORD STARTS WITH mesic” (e.g. not “isomesic”)
- Example:
2.7.4.2 Resources for Regular Expressions
https://regex101.com/ & https://regexr.com/ - Online regular expression testers
http://www.regular-expressions.info/quickstart.html - One-page regular expression quick start guide
2.7.5 Filtering
A variety of methods are available to subset or “filter” R data objects, from a simple data.frame
or vector, to something more complex like a Spatial
object or a SoilProfileCollection
.
You can index many R objects using numeric or logical expressions as above. There are also methods that make this process a little easier.
The base R method for this is subset()
and it works on data.frame
objects. It is nice because you can specify column names without explicitly referencing the data set, since subset uses non-standard evaluation of expressions passed as arguments.
2.7.5.1 Filtering with aqp::subset()
We use the SoilProfileCollection
subset
method, where we first specify a data (pedons
) object then we can write expressions for the columns that exist in that object.
Here, we combine two logical expressions to find taxsubgrp
containing "alfs"
(Alfisols) with obsdate
before January 1st, 2010.
<- subset(pedons, grepl("alfs", taxsubgrp) & obs_date < as.POSIXlt("2010-01-01"))
subset2
# check taxonomic range of particle size classes in the data
# overwhelmingly these are described as loamy-skeletal ultic haploxeralfs
sort(table(subset2$taxsubgrp), decreasing = TRUE)
##
## ultic haploxeralfs mollic haploxeralfs
## 28 1
sort(table(subset2$taxpartsize), decreasing = TRUE)
##
## loamy-skeletal clayey-skeletal fine fine-loamy
## 25 2 1 1
# check year described and taxpartsize
table(subset2$taxpartsize, substr(subset2$obs_date, 0, 4))
##
## 2007 2008 2009
## clayey-skeletal 1 0 1
## fine 1 0 0
## fine-loamy 1 0 0
## loamy-skeletal 19 1 5
# a double equal sign '==' is used for exact character or numeric criteria
<- subset(subset2, taxpartsize == 'loamy-skeletal')
subset3
table(subset3$taxpartsize)
##
## loamy-skeletal
## 25
par(mar = c(0, 0, 2, 1))
plotSPC(subset3[1:12, ], print.id = FALSE)
title('Loamy-skeletal Ultic Haploxeralfs')
2.7.6 Dates and Times
Dates and times use special object types in R. The Unix time, also known as “Posix time,” is a system for describing a point in time. Unix epoch is a whole number value that is the number of seconds elapsed since the 00:00:00 UTC on 1 January 1970 minus leap seconds.
We can use logical comparison operators on dates and times if their string representation such as "01/01/1970"
is converted to a common base R UNIX time representation known as POSIXlt
or POSIXct
.
This conversion accounts for important things such as timezone using your computer’s locale–which is important to keep in mind.
When converting to POSIX time several unambiguous (year-month-day) date time formats can be detected.
For instance, if you want to convert a date in the common month-day-year format, you need to specify the format
argument:
as.POSIXct(24*60*60, origin = "1970-01-01", tz = "UTC")
## [1] "1970-01-02 UTC"
By default the timezone will match your current timezone. Dates without times are treated as being at midnight UTC.
You can customize the timezone with tz
argument:
as.POSIXlt("01/01/1970", tz = "UTC", format = "%m/%d/%Y")
## [1] "1970-01-01 UTC"
POSIXlt
and POSIXct
objects can be formatted with the format()
function. strptime()
can be used to parse character vectors into date/times. You use as.POSIXlt()
with character input, and as.POSIXct()
with numeric input.
R also has the Date class which can be used for formatting calendar dates. You can convert POSIXlt/POSIXct objects to Date with as.Date()
2.8 Exercise 2: Generalized Horizons with Loafercreek
Here we will introduce the concept of using regular expressions to apply “Generalized Horizon Labels” based on the field soil description horizon designations.
This demonstrates one way of grouping horizon data for determining the Range in Characteristics of layers within a Soil Series or a SSURGO component.
For the exercise during class, we will give you some time to read over the materials and apply the process to the sample loafercreek
data. The code to apply the process to loafercreek
is given.
Then we ask that you apply a similar process to your own data, adjusting your generalized horizon patterns as needed for your local soils. This may take more time than we have during class itself, so you should follow up as needed with your mentor to complete. Chapter 3 Exploratory Data Analysis will get deeper into some of the topics that are referenced in the loafercreek
code, such as summary statistics on grouped data. You will send a table and profile plots to your mentor when you are done.
2.9 fetchNASIS()
data checks
fetchNASIS()
does a lot of the hard work for you by pulling out a variety of child tables and joining them to site/horizon tables
This results in a SoilProfileCollection
object that has many of the NASIS data columns. In order to do this, a variety of processes are run. Output is generated that you should look at. We will walk through the most common parts.
Mixing of multiple colors (for “default” moist and dry colors for many:1 case)
mixing dry colors ... [33 of 4108 horizons]
mixing moist colors ... [60 of 4532 horizons]
Quality control and limited “filling” of fragments, horizon depth data, site data, tables
replacing missing lower horizon depths with top depth + 1cm ... [19 horizons]
-> QC: horizon errors detected:
Use `get('bad.pedon.ids', envir=soilDB.env)` for pedon record IDs (peiid)
Use `get('bad.horizons', envir=soilDB.env)` for horizon designations
-> QC: pedons missing bottom hz depths:
Use `get('missing.bottom.depths', envir=soilDB.env)` for pedon record IDs (peiid)
Notes about default settings and handling of NULL
(missing data elements or records)
NOTE: some records are missing surface fragment cover
NOTE: some records are missing rock fragment volume
NOTE: all records are missing artifact volume
2.9.1 Inspecting Results
Here we inspect occurrence of andic soil properties in MT647.
We will download this “selected set” from the course website as an .rda file to save you the effort of crafting your selected set just for this example.
<- "C:/workspace2"
example.data.dir <- file.path(example.data.dir, "mt647.rda")
example.data.path
if (!dir.exists(example.data.dir))
dir.create(example.data.dir, recursive = TRUE)
download.file("https://github.com/ncss-tech/stats_for_soil_survey/raw/master/data/book/02/mt647.rda",
destfile = example.data.path)
Downloading and installing the above .rda is equivalent to doing NASIS query “_PedonPC_Plus_DataDump_select” (MLRA04 Bozeman) for to fill your selected set for User Site ID: %MT647%
, NASIS Site: MLRA04%
, NASIS Group: 4-MIS Pedons
, followed by mt647 <- fetchNASIS()
.
2.9.1.1 Load Example Data
To load the sample object data into R, just use load()
and the path to the .rda file (example.data.path
or "C:/workspace2/mt647.rda"
)
# load the sample data
<- "C:/workspace2"
example.data.dir load(file.path(example.data.dir, "mt647.rda"))
length(mt647)
## [1] 481
table(site(mt647)$andic.soil.properties, useNA = "ifany")
##
## FALSE TRUE <NA>
## 2 83 396
# get just the profiles with andic.soil.properties == TRUE
subset(mt647, andic.soil.properties)
## SoilProfileCollection with 83 profiles and 446 horizons
## profile ID: peiid | horizon ID: phiid
## Depth range: 20 - 305 cm
##
## ----- Horizons (6 / 446 rows | 10 / 73 columns) -----
## peiid phiid hzdept hzdepb hzname texture genhz bounddistinct boundtopo clay
## 828140 4005861 0 2 Oe <NA> <NA> <NA> <NA> NA
## 828140 4005863 2 13 E CBV-L <NA> clear smooth 5
## 828140 4005859 13 23 Bs1 CBV-L <NA> clear smooth 8
## 828140 4005862 23 33 Bs2 CBV-L <NA> gradual smooth 8
## 828140 4005864 33 69 BC GRX-FSL <NA> gradual smooth 10
## 828140 4005860 69 152 2C GRX-FSL <NA> <NA> <NA> 10
## [... more horizons ...]
##
## ----- Sites (6 / 83 rows | 10 / 105 columns) -----
## siteiid peiid pedon_id site_id obs_date utmzone utmeasting utmnorthing x y
## 845415 828140 P23-134 93MT6470134 1993-08-12 12 294091.0 5092141 -113.6569 45.95195
## 845429 828152 P92-052 92MT6470052 1992-07-30 12 274420.9 5123423 -113.9253 46.22694
## 845430 828153 P92-046 92MT6470046 1992-07-23 12 282680.7 5161752 -113.8361 46.57417
## 845432 828155 P93-053 93MT6470053 1993-08-10 12 259526.3 5079142 -114.0958 45.82389
## 845434 828157 P91-103 91MT6470103 1991-09-26 12 272221.3 5127708 -113.9558 46.26472
## 845451 828176 P89-011 89MT6470011 1989-08-02 12 276988.7 5110875 -113.8861 46.11500
## [... more sites ...]
##
## Spatial Data:
## [EMPTY]
We can compare this to what we see in the NASIS Pedon Diagnostic Features table:
Any profiles that have have logic errors detected are stored in soilDB.env
bad.pedon.ids
variable after you run fetchNASIS
. If this variable does not exist either you have not run fetchNASIS()
in the current session or there are no errors
# these are the troublesome user pedon IDs
get('bad.pedon.ids', envir = soilDB.env)
## [1] "P93-037" "P90-008" "P90-004" "P90-009" "P93-058" "P93-059" "P90-025" "P90-002" "P90-012" "P92-001"
## [11] "P92-085" "P91-105" "P90-019" "P75-006" "P90-010" "P90-015" "P91-094" "P92-029" "P92-076" "P93-026"
## [21] "P93-035" "P93-063" "P93-064" "P93-041" "P93-043" "P93-044" "P93-083" "P93-112" "P93-113" "P93-124"
## [31] "P93-001" "P96-007" "P91-025" "P93-078" "P92-044" "P91-112" "P92-038" "P90-018" "P93-057" "P93-084"
## [41] "P91-059" "P90-016" "P92-063" "P92-048" "P93-052" "F01-230" "F95-420" "F95-114" "F96-205"
# by default, rmHzErrors removes the "bad" illogical pedons
any(mt647$pedon_id %in% get('bad.pedon.ids', envir=soilDB.env))
## [1] FALSE
When fetchNASIS(..., rmHzErrors = TRUE)
(the default), any horizons that were omitted from the SoilProfileCollection will be stored in the bad.horizons
variable in the soilDB.env
environment.
head(get('bad.horizons', envir=soilDB.env))
## peiid phiid pedon_id hzname hzdept hzdepb
## 67 868038 4270407 F01-230 Bw NA NA
## 68 868038 4270406 F01-230 E NA NA
## 95 868072 4270514 F95-114 <NA> NA NA
## 99 868048 4270437 F95-420 <NA> NA NA
## 111 868074 4270521 F96-205 C NA NA
## 112 868074 4270519 F96-205 B NA NA
2.9.1.2 Logic Checks for the SoilProfileCollection
The aqp
package has several functions that do logic checks on SoilProfileCollection
objects.
The main method that does this in aqp
is checkHzDepthLogic()
which returns a data.frame of results of running four logic tests on the horizon data from each profile.
Checks for:
bottom depths less than top depth / bad top depth order (
"depthLogic"
)bottom depths equal to top depth (
"sameDepth"
)overlaps and gaps (
"overlapOrGap"
)missing depths (
"missingDepth"
)
<- checkHzDepthLogic(mt647err)
logic_tests
# look at first few (look OK; valid == TRUE)
head(logic_tests)
## peiid valid depthLogic sameDepth missingDepth overlapOrGap
## 1 828138 TRUE FALSE FALSE FALSE FALSE
## 2 828139 TRUE FALSE FALSE FALSE FALSE
## 3 828140 TRUE FALSE FALSE FALSE FALSE
## 4 828141 TRUE FALSE FALSE FALSE FALSE
## 5 828142 TRUE FALSE FALSE FALSE FALSE
## 6 828143 TRUE FALSE FALSE FALSE FALSE
# these all have overlapOrGap errors
head(logic_tests[!logic_tests$valid, ])
## peiid valid depthLogic sameDepth missingDepth overlapOrGap
## 11 828148 FALSE FALSE FALSE FALSE TRUE
## 23 828160 FALSE FALSE FALSE FALSE TRUE
## 25 828162 FALSE FALSE FALSE FALSE TRUE
## 34 828171 FALSE FALSE FALSE FALSE TRUE
## 35 828172 FALSE FALSE FALSE FALSE TRUE
## 36 828173 FALSE FALSE FALSE FALSE TRUE
# join the logic test data into the site table
site(mt647err) <- logic_tests
Use the $valid
vector in result to filter out profiles with missing depths (logic_tests$valid == FALSE
)
<- subset(mt647err, !valid)
bad.profiles bad.profiles
## SoilProfileCollection with 49 profiles and 338 horizons
## profile ID: peiid | horizon ID: phiid
## Depth range: 15 - 152 cm
##
## ----- Horizons (6 / 338 rows | 10 / 73 columns) -----
## peiid phiid hzdept hzdepb hzname texture genhz bounddistinct boundtopo clay
## 828148 4005908 0 3 Oe <NA> <NA> <NA> <NA> NA
## 828148 4005907 3 5 Oi <NA> <NA> <NA> <NA> NA
## 828148 4005905 5 15 E CB-FSL <NA> clear smooth 5
## 828148 4005911 15 23 Bs CB-SIL <NA> abrupt smooth 5
## 828148 4005909 23 25 2E CBV-FSL <NA> clear smooth 8
## 828148 4005910 25 58 2B CBV-FSL <NA> gradual smooth 10
## [... more horizons ...]
##
## ----- Sites (6 / 49 rows | 10 / 110 columns) -----
## peiid siteiid pedon_id site_id obs_date utmzone utmeasting utmnorthing x y
## 828148 845423 P93-037 93MT6470037 1993-07-28 12 272869.5 5126695 -113.9469 46.25583
## 828160 845435 P90-008 90MT6470008 1990-07-12 12 274656.5 5129224 -113.9250 46.27917
## 828162 845437 P90-004 90MT6470004 1990-07-16 12 273885.5 5128079 -113.9344 46.26861
## 828171 845446 P90-009 90MT6470009 1990-07-24 12 274813.1 5129404 -113.9230 46.28083
## 828172 845447 P93-058 93MT6470058 1993-08-19 12 265401.3 5110256 -114.0356 46.10555
## 828173 845448 P93-059 93MT6470059 1993-08-19 12 265446.7 5110316 -114.0350 46.10611
## [... more sites ...]
##
## Spatial Data:
## [EMPTY]
And also filter out the valid ones (logic_tests$valid == TRUE
)
<- subset(mt647err, logic_tests$valid)
good.profiles good.profiles
## SoilProfileCollection with 481 profiles and 2536 horizons
## profile ID: peiid | horizon ID: phiid
## Depth range: 14 - 1552 cm
##
## ----- Horizons (6 / 2536 rows | 10 / 73 columns) -----
## peiid phiid hzdept hzdepb hzname texture genhz bounddistinct boundtopo clay
## 828138 4005848 0 5 O <NA> <NA> <NA> <NA> NA
## 828138 4005852 5 18 E GR-L <NA> clear smooth 12
## 828138 4005850 18 38 Bw1 GRV-L <NA> gradual wavy 14
## 828138 4005849 38 51 Bw2 CBV-FSL <NA> gradual wavy 12
## 828138 4005853 51 71 BC CBV-SL <NA> clear wavy 12
## 828138 4005851 71 81 R <NA> <NA> <NA> <NA> NA
## [... more horizons ...]
##
## ----- Sites (6 / 481 rows | 10 / 110 columns) -----
## peiid siteiid pedon_id site_id obs_date utmzone utmeasting utmnorthing x y
## 828138 845413 P91-043 91MT6470043 1991-07-11 12 268288.0 5068425 -113.9781 45.73056
## 828139 845414 P91-029 91MT6470029 1991-06-27 12 280623.2 5147302 -113.8561 46.44361
## 828140 845415 P23-134 93MT6470134 1993-08-12 12 294091.0 5092141 -113.6569 45.95195
## 828141 845416 P93-025 93MT6470025 1993-07-08 12 279148.0 5151960 -113.8775 46.48500
## 828142 845417 P93-075 93MT6470075 1993-09-14 12 261086.6 5080380 -114.0764 45.83556
## 828143 845418 P93-074 93MT6470074 1993-09-14 12 261036.4 5080197 -114.0769 45.83389
## [... more sites ...]
##
## Spatial Data:
## [EMPTY]
2.10 Extended Data Functions
Additional data related to both site and horizon information can be fetched using the get_extended_data_from_NASIS()
function. This function returns a named list()
with several tables that fetchNASIS
draws from–in addition to the typical Site/Pedon/Component/Horizon tables.
There are a variety of calculated fields that are included in the default fetchNASIS
result based on the extended data–you can make use of these to simplify queries and aggregations for a variety of analyses… And then follow up with more detailed data as needed.
2.10.1 Elements of get_extended_data_from_NASIS()
Ecological Site History (
"ecositehistory"
)Diagnostic Features (
"diagnostic"
)Diagnostic Feature
TRUE
/FALSE
Summary ("diagHzBoolean"
)Restrictions (
"restriction"
)Fragment and Texture Summaries
Horizon Fragments (
"frag_summary"
)Horizon Artifacts (
"art_summary"
)Surface Fragments (
"surf_frag_summary"
)Texture Class Modifiers (
"texmodifier"
)
Geomorphic Table Summaries (
"geomorph"
)Parent Material Summaries (
"pm"
)Taxonomic History (
"taxhistory"
)Site Text Notes w/ Photo links(
"photo"
)Horizon Structure (
"struct"
)Horizon Designation (
"hzdesgn"
)
2.10.2 Load Example Data
Below is a summary of additional information that can be readily brought into R from your NASIS selected set via the get_extended_data_from_NASIS()
function.
To download the sample 2015MT663%
data from the course page with R:
<- "C:/workspace2"
example.data.dir <- file.path(example.data.dir, "mt663.rda")
example.data.path
if (!dir.exists(example.data.dir))
dir.create(example.data.dir, recursive = TRUE)
download.file("https://github.com/ncss-tech/stats_for_soil_survey/raw/master/data/book/02/mt663.rda",
destfile = example.data.path)
Before continuing, imagine opening the NASIS client, populating your selected set with 2015MT663%
using a query like “NSSC Pangaea – POINT-Pedon/Site by User Pedon ID”
Load the data like we did above.
# load the sample data
<- "C:/workspace2"
example.data.dir load(file.path(example.data.dir, "mt663.rda"))
## fetch extended site and horizon data from local NASIS
# mt663ext <- get_extended_data_from_NASIS_db()
We could use the get_extended_data_from_NASIS_db()
function if 2015MT663%
or other data were in the selected set, but we will use the mt663ext
data we loaded from the .rda file.
The column names are the names of variables that you could join to your site or horizon data by various means. Generally these variable names, with a few exceptions, mirror the NASIS 7 data model names.
# site and pedon related extended data
# list all dataframes in the extended data
str(mt663ext, 1)
## List of 14
## $ ecositehistory :'data.frame': 0 obs. of 5 variables:
## $ diagnostic :'data.frame': 292 obs. of 4 variables:
## $ diagHzBoolean :'data.frame': 115 obs. of 20 variables:
## $ restriction :'data.frame': 11 obs. of 8 variables:
## $ frag_summary :'data.frame': 561 obs. of 18 variables:
## $ art_summary :'data.frame': 561 obs. of 14 variables:
## $ surf_frag_summary:'data.frame': 134 obs. of 10 variables:
## $ texmodifier :'data.frame': 622 obs. of 5 variables:
## $ geomorph :'data.frame': 241 obs. of 9 variables:
## $ taxhistory :'data.frame': 115 obs. of 21 variables:
## $ photo :'data.frame': 793 obs. of 4 variables:
## $ pm :'data.frame': 179 obs. of 10 variables:
## $ struct :'data.frame': 444 obs. of 6 variables:
## $ hzdesgn :'data.frame': 561 obs. of 19 variables:
# vegetation data summary
colnames(mt663ext$ecositehistory)
## [1] "siteiid" "ecositeid" "ecositenm" "ecositecorrdate" "es_classifier"
# diagnostic features
colnames(mt663ext$diagnostic)
## [1] "peiid" "featkind" "featdept" "featdepb"
# surface rock fragments
colnames(mt663ext$surf_frag_summary)
## [1] "peiid" "surface_fgravel" "surface_gravel" "surface_cobbles"
## [5] "surface_stones" "surface_boulders" "surface_channers" "surface_flagstones"
## [9] "surface_paragravel" "surface_paracobbles"
# geomorphic description
colnames(mt663ext$geomorph)
## [1] "peiid" "geomicrorelief" "geommicelev" "geomfmod" "geomfname" "geomfeatid"
## [7] "existsonfeat" "geomfiidref" "geomftname"
# taxonomic history data
colnames(mt663ext$taxhistory)
## [1] "peiid" "classdate" "classifier" "classtype" "taxonname" "localphase"
## [7] "taxonkind" "seriesstatus" "taxclname" "taxpartsize" "taxorder" "taxsuborder"
## [13] "taxgrtgroup" "taxsubgrp" "soiltaxedition" "osdtypelocflag" "taxmoistcl" "taxtempregime"
## [19] "taxfamother" "psctopdepth" "pscbotdepth"
# linked photo stored in site textnotes
colnames(mt663ext$photo)
## [1] "siteiid" "recdate" "textcat" "imagepath"
# site parent materials
colnames(mt663ext$pm)
## [1] "siteiid" "seqnum" "pmorder" "pmdept" "pmdepb" "pmmodifier" "pmgenmod"
## [8] "pmkind" "pmorigin" "pmweathering"
###
### horizon related extended data
###
# rock fragments
colnames(mt663ext$frag_summary)
## [1] "phiid" "fine_gravel" "gravel" "cobbles"
## [5] "stones" "boulders" "channers" "flagstones"
## [9] "parafine_gravel" "paragravel" "paracobbles" "parastones"
## [13] "paraboulders" "parachanners" "paraflagstones" "unspecified"
## [17] "total_frags_pct_nopf" "total_frags_pct"
# soil texture modifers
colnames(mt663ext$texmodifier)
## [1] "peiid" "phiid" "phtiid" "seqnum" "texmod"
# soil structure data
colnames(mt663ext$struct)
## [1] "phiid" "structgrade" "structsize" "structtype" "structid" "structpartsto"
2.10.3 Visualizing Common Landforms
The following code generates a simple graphical summary of the 10 most commonly occurring "landform_string"
(a calculated field in fetchNASIS()
) to inspect which are the most common.
# load data from a NASIS selected set (or sample object)
<- mt663
pedons
# create 'lf' object of landform factors sorted in descending order
<- sort(table(pedons$landform_string), decreasing = TRUE)
lf
# plot top 10 or length, whichever is shorter
::dotchart2(lf[1:pmin(10, length(lf))],
Hmisccol = 'black',
xlim = c(0, max(lf)),
cex.labels = 0.75)
For a challenge and to further inspect your own data try the above code with some other summaries of geomorphic data produced by fetchNASIS()
.
You can swap landform_string
for: landscape_string
(landscape), hillslopeprof
(2D), geomposmntn
, geomposhill
, geompostrce
, geomposflats
(3D), slope_shape
, shapeacross
, shapedown
(slope shape across/down), microfeature_string
(microfeature), or geomicrorelief_string
(site observation microrelief).
2.10.4 Diagnostic Features
2.10.4.1 Boolean Diagnostic Features in fetchNASIS
If diagnostic features are populated in the Pedon Diagnostic Features table in NASIS, then Boolean (TRUE
or FALSE
) fields are created for each diagnostic feature type found in the data brought in by fetchNASIS
.
These fields can be used to model presence / absence of a diagnostic soil feature by extracting the site data from the SoilProfileCollection
with site()
.
2.10.4.2 Thickness from Diagnostic Features Table
The following is an example of how you could use the diagnostic features (if populated!) from the extended data to determine the thickness of a diagnostic feature of interest.
# get diagnostic features associated with pedons loaded from selected set
<- diagnostic_hz(mt663)
d
# summary of the diagnostic features in your data!
unique(d$featkind)
## [1] "ochric epipedon" "cambic horizon" "lithic contact"
## [4] "mollic epipedon" "argillic horizon" "redox concentrations"
## [7] "andic soil properties" "secondary carbonates" "sapric soil materials"
## [10] "aquic conditions" "reduced matrix" "albic horizon"
## [13] "spodic horizon" "glossic horizon" "spodic materials"
## [16] "lithologic discontinuity" "densic materials" "umbric epipedon"
## [19] "albic materials" NA
# tabulate
sort(table(droplevels(factor(d$featkind))), decreasing = TRUE)
##
## ochric epipedon cambic horizon argillic horizon mollic epipedon
## 61 54 43 42
## andic soil properties lithic contact secondary carbonates umbric epipedon
## 30 20 7 7
## albic horizon spodic horizon glossic horizon reduced matrix
## 6 4 3 3
## sapric soil materials lithologic discontinuity albic materials aquic conditions
## 3 2 1 1
## densic materials redox concentrations spodic materials
## 1 1 1
# subset argillic horizons
<- d[d$featkind == 'argillic horizon', ]
d
# create a new column and subtract the upper from the lower depth
$argillic_thickness_cm <- d$featdepb - d$featdept
d
# create another new column with the upper depth to the diagnostic feature
$depth_to_argillic_cm <- d$featdept
d
# omit NA values
<- na.omit(d)
d
# subset to pedon records IDs and calculated thickness
<- d[, c('peiid', 'argillic_thickness_cm', 'depth_to_argillic_cm')]
d head(d)
## peiid argillic_thickness_cm depth_to_argillic_cm
## 7 1092610 56 30
## 24 1092617 38 34
## 26 1092618 29 23
## 28 1092619 38 32
## 30 1092620 29 24
## 33 1092621 23 19
# left-join with existing site data
site(mt663) <- d
# plot as histogram
par(mar = c(4.5, 4.5, 1, 1))
# note additional arguments to adjust figure labels
hist(
$argillic_thickness_cm,
mt663xlab = 'Thickness of argillic (cm)',
main = '',
las = 1
)
hist(
$depth_to_argillic_cm,
mt663xlab = 'Depth to argillic top depth (cm)',
main = '',
las = 1
)
Quick check: What can you do with the boolean diagnostic feature data stored in the site table of a fetchNASIS
SoilProfileCollection
?
2.10.4.3 Diagnostic Feature Diagrams
# work up diagnostic plot based on the mt663 dataset loaded above
library(aqp)
library(soilDB)
library(sharpshootR)
# can limit which diagnostic features to show by setting 'v' manually
<- c('ochric.epipedon', 'mollic.epipedon', 'andic.soil.properties',
v 'argillic.horizon', 'cambic.horizon',
'lithic.contact')
# the default concatenated landform_string may have multiple levels
# depending on how the geomorphic tables were populated
# these are concatenated using the ampersand (&) character
# so take the first string split using ampersand as a delimiter
$first_landform <- sapply(strsplit(mt663$landform_string, "&"),
mt663function(x) x[[1]])
# plot with diagnostic features ordered according to co-occurrence
# v: site-level attributes to consider
# k: number of clusters to identify
diagnosticPropertyPlot(
1:30, ], v = v, k = 5,
mt663[grid.label = 'site_id',
dend.label = 'first_landform',
sort.vars = TRUE
)
2.11 Exercise 3: Diagnostic Horizons in Your Own Data
Use the following script to generate a diagnostic-feature diagram for the pedon data you’ve loaded from your NASIS selected set. Alternately, you may use the MT663 data from the example above, just substitute the object mt663
for f
.
You can select a subset of desired diagnostic properties or use all diagnostic feature columns.
library(aqp)
library(soilDB)
library(sharpshootR)
# Load data
<- fetchNASIS(from = 'pedons')
f
# ... May need to use subset() to reduce the number of pedons!
# get all diagnostic feature columns from site data
# by pattern matching on '[.]' in the site attribute names
# this is not a generic rule, but works here
<- grep('[.]', siteNames(f))
idx <- siteNames(f)[idx]
v
# inspect v
v
# insert diagnostics of interest from the possible list in 'v'
<- c('ochric.epipedon', 'cambic.horizon',
v 'argillic.horizon', 'paralithic.contact',
'lithic.contact')
# generate diagnostic property diagram
diagnosticPropertyPlot(
v = v, k = 5,
f, grid.label = 'site_id',
dend.label = 'taxonname'
)
For more information on generating diagnostic feature diagrams, see the following tutorial:
Questions:
Use the results of site()
, horizons()
, diagnostic_hz()
to answer the following questions about your own data. These functions return data.frame objects from a SoilProfileCollection derived from your local NASIS data.
How many pedons do you have in your selected set? Check the number of rows in the site portion of the SoilProfileCollection with
nrow()
, or use the methodlength()
on the collection.How many diagnostic features do you have for those pedons? How many different
"featkind"
? You can calculate the number of unique values with withlength()
andunique()
Interpret: What diagnostic features did you choose to plot? Did you notice any patterns in the diagnostic features that tend to occur together?
Send your mentor a copy of your results and the commands you used.
2.12 Custom Queries to Local NASIS Database
fetchNASIS()
and related convenience functions are wrappers around commonly used chunks of SQL (Structured Query Language). Queries of the NASIS local database can be written in T-SQL which is the dialect of SQL used to communicate with Microsoft SQL Server. This is the connection that you set for the pre-course.
To create a DBIConnection
object that can access the NASIS local database, use dbConnectNASIS()
or simply NASIS()
. This will look for an ODBC connection of the name "nasis_local"
and will authenticate with the read-only credentials.
You can also pass an existing DBIConnection
or path to SQLite file as the dsn
argument to connect to an alternate source. The default driver type used for NASIS connections is OdbcConnection
from the odbc
package.
To query or execute SQL commands on a connection you created, you can use dbGetQuery(connection, query)
, where connection
is your DBIConnection
object and query
is the SQL to execute.
The soilDB package also includes a helper function dbQueryNASIS()
, which functions just like dbGetQuery()
except the default is to close the connection after returning the result. This can be convenient for interactive use or cases where a function needs to only execute a single query, as it will prevent you from leaving connections open accidentally.
2.12.1 Example: Site Soil Temperature Data from CA792 (Sequoia-Kings Canyon National Park)
The following example will return all records in your selected set sitesoiltemp
table, along with a couple of fields from the site
, siteobs
, and pedon
tables. This is a convenient way to collect all of the field-based soil temperature data associated with the pedons in your selected set for further analysis.
You can use the CA792 (Sequoia-Kings Canyon National Park) pedons as an example. Use a query that searches user pedon ID for the following pattern %CA792%
to download and populate a selected set in the NASIS client.
library(soilDB)
# write query as a character string
<- "SELECT siteiid as siteiid, peiid, usiteid as site_id,
q upedonid as pedon_id, obsdate as obs_date,
soitemp, soitempdep FROM site_View_1
INNER JOIN siteobs_View_1 ON site_View_1.siteiid = siteobs_View_1.siteiidref
LEFT OUTER JOIN sitesoiltemp_View_1 ON siteobs_View_1.siteobsiid = sitesoiltemp_View_1.siteobsiidref
LEFT OUTER JOIN pedon_View_1 ON siteobs_View_1.siteobsiid = pedon_View_1.siteobsiidref
ORDER BY obs_date, siteiid;"
# setup connection to local NASIS
<- NASIS()
channel
# exec query
<- dbQueryNASIS(channel, q) d
The functions dbConnectNASIS()
(alias NASIS()
) and dbQueryNASIS()
allow you to create a connection to the NASIS local database and send queries to that connection, respectively. By default, dbQueryNASIS()
will close your connection after completing the query; you can change this by setting close=FALSE
.
# check results
str(d)
# remove records missing values
<- na.omit(d)
d
# tabulate unique soil depths
table(d$soitempdep)
# extract doy of year
$doy <- as.integer(format(d$obs_date, "%j"))
d
# when where measurements collected?
hist(
$doy,
dxlim = c(1, 366),
las = 1,
main = 'Soil Temperature Measurements',
xlab = 'Day of Year'
)
# soil temperature by day of year
plot(
~ doy,
soitemp data = d,
main = 'Soil Temperature Measurements (CA792)\nNASIS "Site Soil Temperature" table',
type = 'p',
pch = 21,
bg = 'royalblue',
xlim = c(1, 366),
ylim = c(-1, 26),
xlab = 'Day of Year',
ylab = 'Soil Temperature at 50cm (deg C)',
las = 1
)
# vernal equinox, summer solstice, autumnal equinox, winter solstice
<- as.Date(c('2022-03-20', '2022-06-21', '2022-09-23', '2022-12-21'))
x
# convert dates -> Julian date, or day-of-year
<- as.integer(format(x, "%j"))
doy
# add vertical lines
abline(v = doy, lty = 3, col = grey(0.45))
# annotate
# pos argument: left-center offset
text(
x = doy,
y = -1,
labels = c('vernal equinox', 'summer solstice', 'autumnal equinox', 'winter solstice'),
pos = 2,
cex = 0.75,
font = 3
)
# box/whisker plot showing distribution of day-of-year
# when measurements (pedon descriptions) were collected
boxplot(d$doy, at = 26, horizontal = TRUE, add = TRUE, lty = 1, width = 1, col = 'white', axes = FALSE)
+