Generalize a vector of horizon names, based on new classes, and REGEX
patterns. Or create a new column ghl
in a SoilProfileCollection (requires a horizon designation name to be defined for the collection, see details)
generalize.hz(
x,
new,
pattern,
non.matching.code = "not-used",
hzdepm = NULL,
ordered = !missing(hzdepm),
...
)
# S4 method for character
generalizeHz(
x,
new,
pattern,
non.matching.code = "not-used",
hzdepm = NULL,
ordered = !missing(hzdepm),
...
)
# S4 method for SoilProfileCollection
generalizeHz(
x,
new,
pattern,
non.matching.code = "not-used",
hzdepm = NULL,
ordered = !missing(hzdepm),
ghl = "genhz",
...
)
a character vector of horizon names or a SoilProfileCollection
a character vector of new horizon classes
a character vector of REGEX patterns, same length as new
label used for any horizon not matching any pattern
a numeric vector of horizon mid-points; NA
values in hzdepm
will result in non.matching.code
(or NA
if not defined) in result
by default, the result is an ordered factor when hzdepm
is defined.
additional arguments passed to grep()
such as perl=TRUE
for advanced REGEX
Generalized Horizon Designation column name (to be created/updated when x
is a SoilProfileCollection
)
(ordered) factor of the same length as x
(if character) or as number of horizons in x
(if SoilProfileCollection)
When x
is a SoilProfileCollection
the ghl
column will be updated with the factor results. This requires that the "horizon designation name" metadata be defined for the collection to set the column for input designations.
data(sp1)
# check original distribution of hz designations
table(sp1$name)
#>
#> 2C 2C1 2C2 3Ab 3Bwb 3C 3Cb A A1 A2 A3 AB AB1 AB2 AB3 BA
#> 2 2 2 1 2 2 1 4 4 4 1 5 1 1 1 2
#> Bt Bt1 Bt2 Bw1 Bw2 Bw3 C C1 C2 Oa/A Oe Oi Rt
#> 1 3 3 3 3 1 1 2 2 1 1 3 1
# generalize
sp1$genhz <- generalize.hz(sp1$name,
new=c('O','A','B','C','R'),
pattern=c('O', '^A','^B','C','R'))
# see how we did / what we missed
table(sp1$genhz, sp1$name)
#>
#> 2C 2C1 2C2 3Ab 3Bwb 3C 3Cb A A1 A2 A3 AB AB1 AB2 AB3 BA Bt Bt1 Bt2
#> O 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> A 0 0 0 0 0 0 0 4 4 4 1 5 1 1 1 0 0 0 0
#> B 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 3 3
#> C 2 2 2 0 0 2 1 0 0 0 0 0 0 0 0 0 0 0 0
#> R 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> not-used 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#>
#> Bw1 Bw2 Bw3 C C1 C2 Oa/A Oe Oi Rt
#> O 0 0 0 0 0 0 1 1 3 0
#> A 0 0 0 0 0 0 0 0 0 0
#> B 3 3 1 0 0 0 0 0 0 0
#> C 0 0 0 1 2 2 0 0 0 0
#> R 0 0 0 0 0 0 0 0 0 1
#> not-used 0 0 0 0 0 0 0 0 0 0
## a more advanced example, requries perl=TRUE
# example data
x <- c('A', 'AC', 'Bt1', '^AC', 'C', 'BC', 'CB')
# new labels
n <- c('A', '^AC', 'C')
# patterns:
# "A anywhere in the name"
# "literal '^A' anywhere in the name"
# "C anywhere in name, but without preceding A"
p <- c('A', '^A', '(?<!A)C')
# note additional argument
res <- generalize.hz(x, new = n, pattern=p, perl=TRUE)
# double-check: OK
table(res, x)
#> x
#> res A AC BC Bt1 C CB ^AC
#> A 0 0 0 0 0 0 1
#> ^AC 1 1 0 0 0 0 0
#> C 0 0 1 0 1 1 0
#> not-used 0 0 0 1 0 0 0