Stephen Roecker and Tom D’Avello
2021-02-03
hzname | clay_l | clay_r | clay_h | texture |
---|---|---|---|---|
Ap | 7 | 18 | 26.1 | sil |
Bt1 | 24 | 27 | 35.0 | sicl |
2Bt2 | 27 | 31 | 35.0 | cl |
2BCt | 15 | 22 | 25.0 | l |
2Cd | 10 | 15 | 20.0 | l |
Parameter | NASIS | Description | R function |
---|---|---|---|
Mean | RV | arithmetic average | mean() |
Median | RV | middle value, 50% quantile | median() |
Mode | RV | most frequent value | sort(table(), decreasting = TRUE)[1] |
Standard Deviation | L & H | variation | sd() |
Quantiles | L & H | percent rank of values, such that all values are <= p | quantile() |
data("loafercreek")
h <- horizons(loafercreek)
h$texture_class <- factor(h$texture_class)
h %>%
select(clay, phfield, total_frags_pct, texture_class) %>%
summary()
## clay phfield total_frags_pct texture_class
## Min. :10.00 Min. :4.90 Min. : 0.00 l :265
## 1st Qu.:18.00 1st Qu.:6.00 1st Qu.: 0.00 br :105
## Median :22.00 Median :6.30 Median : 5.00 cl :101
## Mean :23.63 Mean :6.18 Mean :13.88 sil : 56
## 3rd Qu.:28.00 3rd Qu.:6.50 3rd Qu.:20.00 spm : 22
## Max. :60.00 Max. :7.00 Max. :95.00 (Other): 52
## NA's :167 NA's :381 NA's : 25
##
## br c cb cl gr l pg scl sic sicl sil sl spm <NA>
## 2BCt 0 6 0 3 0 0 0 0 0 0 0 0 0 0
## 2Bt 0 5 0 1 0 0 0 0 0 0 0 0 0 0
## A 0 0 0 1 0 97 0 0 0 1 29 7 0 3
## BA 0 0 0 0 0 2 0 0 0 0 0 0 0 0
## BCt 0 2 0 8 0 7 0 1 0 1 1 0 0 1
## Bt 0 0 0 17 0 37 0 0 0 0 1 1 0 0
## Bt1 0 1 0 13 0 57 0 3 0 1 12 0 0 2
## Bt2 0 4 0 40 0 45 0 4 2 6 8 0 0 0
## Bt3 0 0 0 6 0 2 0 0 0 0 0 0 0 0
## Cr 55 1 1 1 3 1 1 1 0 0 0 0 0 10
## Oi 0 0 0 0 0 0 0 0 0 0 0 0 4 0
## R 43 0 0 0 0 0 0 0 0 0 0 0 0 2
## <NA> 7 0 0 11 0 17 0 0 0 0 5 0 18 7
na.exclude()
, such as h2 <- na.exclude(h)
. However this can be wasteful because it removes all rows (e.g., horizons), regardless if the row only has 1 missing value. Instead it’s sometimes best to create a temporary copy of the variable in question and then remove the missing variables, such as clay <- na.exclude(h$clay)
.h$clay <- ifelse(is.na(h$clay), 0, h$clay) # or h[is.na(h$clay), ] <- 0
.na.rm
.Plot Types | Description |
---|---|
Bar | a plot where each bar represents the frequency of observations for a ‘group’ |
Histogram | a plot where each bar represents the frequency of observations for a ‘given range of values’ |
Density | an estimation of the frequency distribution based on the sample data |
Quantile-Quantile | a plot of the actual data values against a normal distribution |
Box-Whisker | a visual representation of median, quartiles, symmetry, skewness, and outliers |
Scatter & Line | a graphical display of one variable plotted on the x axis and another on the y axis |
Plot Types | Base R | lattice | ggplot geoms |
---|---|---|---|
Bar | barplot() | barchart() | geom_bar() |
Histogram | hist() | histogram() | geom_histogram() |
Density | plot(density()) | densityplot() | geom_density() |
Quantile-Quantile | qqnorm() | qq() | geom_qq() |
Box-Whisker | boxplot() | bwplot() | geom_boxplot() |
Scatter & Line | plot() | xyplot | geom_point() |
Healy, K., 2018. Data Visualization: a practical introduction. Princeton University Press. http://socviz.co/
Helsel, D.R., and R.M. Hirsch, 2002. Statistical Methods in Water Resources Techniques of Water Resources Investigations, Book 4, chapter A3. U.S. Geological Survey. 522 pages. http://pubs.usgs.gov/twri/twri4a3/
Kabacoff, R.I., 2015. R in Action. Manning Publications Co. Shelter Island, NY. https://www.statmethods.net/
Kabacoff, R.I., 2018. Data Visualization in R. https://rkabacoff.github.io/datavis/
Peng, R. D., 2016. Exploratory Data Analysis with R. Leanpub. https://bookdown.org/rdpeng/exdata/
Wilke, C.O., 2019. Fundamentals of Data Visualization. O’Reily Media, Inc. https://serialmentor.com/dataviz/