Encode data frame column using external crosswalk file.
encodefrom(
.data,
var,
cw_file,
raw,
clean,
label,
delimiter = NULL,
sheet = NULL,
case_ignore = TRUE,
ignore_tibble = FALSE
)
encodefrom_(
.data,
var,
cw_file,
raw,
clean,
label,
delimiter = NULL,
sheet = NULL,
case_ignore = TRUE,
ignore_tibble = FALSE
)
Data frame or tbl_df
Column name of vector to be encoded
Either data frame object or string with path to
external crosswalk file, including path, which has columns
representing raw
(current) vector values, clean
(new) vector values, and label
s for values. Values in
raw
and clean
columns must be unique (1:1 match)
or an error will be thrown. Acceptable file types include:
delimited (.csv, .tsv, or other), R (.rda, .rdata, .rds), or
Stata (.dta).
Name of column in cw_file
that contains values in
current vector.
Name of column in cw_file
that contains new
values for vector.
Name of column in cw_file
with labels for new
values.
String delimiter used to parse
cw_file
. Only necessary if using a delimited file that
isn't a comma-separated or tab-separated file (guessed by
function based on file ending).
Specify sheet if cw_file
is an Excel file and
required sheet isn't the first one.
Ignore case when matching current (raw
)
vector name with new (clean
) column name.
Ignore .data
status as tbl_df and
return vector as a factor rather than labelled vector.
Vector that is either a factor or labelled, depending on data input and options
encodefrom_()
: Standard evaluation version of
encodefrom
(var
, raw
, clean
,
and label
must be strings when using this version)
df <- data.frame(state = c('Kentucky','Tennessee','Virginia'),
stfips = c(21,47,51),
cenregnm = c('South','South','South'))
df_tbl <- tibble::as_tibble(df)
cw <- get(data(stcrosswalk))
df$state2 <- encodefrom(df, state, cw, stname, stfips, stabbr)
df_tbl$state2 <- encodefrom(df_tbl, state, cw, stname, stfips, stabbr)
df_tbl$state3 <- encodefrom(df_tbl, state, cw, stname, stfips, stabbr,
ignore_tibble = TRUE)
haven::as_factor(df_tbl)
#> # A tibble: 3 × 5
#> state stfips cenregnm state2 state3
#> <chr> <dbl> <chr> <fct> <fct>
#> 1 Kentucky 21 South KY KY
#> 2 Tennessee 47 South TN TN
#> 3 Virginia 51 South VA VA
haven::zap_labels(df_tbl)
#> # A tibble: 3 × 5
#> state stfips cenregnm state2 state3
#> <chr> <dbl> <chr> <int> <fct>
#> 1 Kentucky 21 South 21 KY
#> 2 Tennessee 47 South 47 TN
#> 3 Virginia 51 South 51 VA