Encode data frame column using external crosswalk file.

encodefrom(
  .data,
  var,
  cw_file,
  raw,
  clean,
  label,
  delimiter = NULL,
  sheet = NULL,
  case_ignore = TRUE,
  ignore_tibble = FALSE
)

encodefrom_(
  .data,
  var,
  cw_file,
  raw,
  clean,
  label,
  delimiter = NULL,
  sheet = NULL,
  case_ignore = TRUE,
  ignore_tibble = FALSE
)

Arguments

.data

Data frame or tbl_df

var

Column name of vector to be encoded

cw_file

Either data frame object or string with path to external crosswalk file, including path, which has columns representing raw (current) vector values, clean (new) vector values, and labels for values. Values in raw and clean columns must be unique (1:1 match) or an error will be thrown. Acceptable file types include: delimited (.csv, .tsv, or other), R (.rda, .rdata, .rds), or Stata (.dta).

raw

Name of column in cw_file that contains values in current vector.

clean

Name of column in cw_file that contains new values for vector.

label

Name of column in cw_file with labels for new values.

delimiter

String delimiter used to parse cw_file. Only necessary if using a delimited file that isn't a comma-separated or tab-separated file (guessed by function based on file ending).

sheet

Specify sheet if cw_file is an Excel file and required sheet isn't the first one.

case_ignore

Ignore case when matching current (raw) vector name with new (clean) column name.

ignore_tibble

Ignore .data status as tbl_df and return vector as a factor rather than labelled vector.

Value

Vector that is either a factor or labelled, depending on data input and options

Functions

  • encodefrom_(): Standard evaluation version of encodefrom (var, raw, clean, and label must be strings when using this version)

Examples

df <- data.frame(state = c('Kentucky','Tennessee','Virginia'),
                 stfips = c(21,47,51),
                 cenregnm = c('South','South','South'))

df_tbl <- tibble::as_tibble(df)

cw <- get(data(stcrosswalk))

df$state2 <- encodefrom(df, state, cw, stname, stfips, stabbr)
df_tbl$state2 <- encodefrom(df_tbl, state, cw, stname, stfips, stabbr)
df_tbl$state3 <- encodefrom(df_tbl, state, cw, stname, stfips, stabbr,
                            ignore_tibble = TRUE)

haven::as_factor(df_tbl)
#> # A tibble: 3 × 5
#>   state     stfips cenregnm state2 state3
#>   <chr>      <dbl> <chr>    <fct>  <fct> 
#> 1 Kentucky      21 South    KY     KY    
#> 2 Tennessee     47 South    TN     TN    
#> 3 Virginia      51 South    VA     VA    
haven::zap_labels(df_tbl)
#> # A tibble: 3 × 5
#>   state     stfips cenregnm state2 state3
#>   <chr>      <dbl> <chr>     <int> <fct> 
#> 1 Kentucky      21 South        21 KY    
#> 2 Tennessee     47 South        47 TN    
#> 3 Virginia      51 South        51 VA