Process a range of check-all-that-apply response columns for correct tabulation.
Source:R/survey_utils.R
check_all_recode.Rd
Some survey software returns check-all-that-apply response columns where missing values could indicate either that the respondent skipped the question entirely, or that they did not select that particular answer choice. To count the responses properly, the cases where a respondent did not check any of choices - i.e., they skipped the question - should not be counted in the denominator (assuming that the choices were completely exhaustive, or that there was an NA option).
This function takes a data.frame and range of columns containing all answer choices to a check-all-that-apply question and updates the columns in the data.frame to contain one of three values: 1 if the choice was selected; 0 if the respondent chose another option but not this one; or NA if the respondent skipped the question (i.e., they did not select any of the choices) and thus their response is truly missing.
It also takes the single text values in each column and adds them as a label
attribute to each data.frame columns.
This function accomodates an open-response column, to get the correct denominator when some respondents have skipped all check variables but written something in. This passing over of the offered choices is an implicit rejection of them, not a "missing." Such a text variable will throw a warning - which may be okay - and will then be recoded into a binary 1/0 variable indicating a response. Such a text variable will be assigned the label "Other". Consider preserving the original respondent text values prior to this point as a separate column if needed.
check_all_recode()
prepares the data.frame for a call to its sister function check_all_count()
. The label attribute is accessed by this function.
Arguments
- dat
a data.frame with survey data
- ...
unquoted variable names containing the answer choices. Can be specified as a range, i.e.,
q1_1:q1_5
or using other helper functions fromdplyr::select()
.- set_labels
should the label attribute of the columns be over-written with the column text? Allow this to be TRUE unless there are currently label attributes you don't wish to overwrite.
Value
the original data.frame with the specified column range updated, and with label attributes on the questions.
Examples
x <- data.frame( # 4th person didn't respond at all
unrelated = 1:5,
q1_1 = c("a", "a", "a", NA, NA),
q1_2 = c("b", "b", NA, NA, NA),
q1_3 = c(NA, NA, "c", NA, NA),
q1_other = c(NA, "something else", NA, NA, "not any of these")
)
x |>
check_all_recode(q1_1:q1_other)
#> Warning: column 4 has multiple values besides NA; not sure which is the question text. Guessing this an "Other (please specify)" column.
#> unrelated q1_1 q1_2 q1_3 q1_other
#> 1 1 1 1 0 0
#> 2 2 1 1 0 1
#> 3 3 1 0 1 0
#> 4 4 NA NA NA NA
#> 5 5 0 0 0 1
# You can use any of the dplyr::select() helpers to identify the columns:
x |>
check_all_recode(contains("q1"))
#> Warning: column 4 has multiple values besides NA; not sure which is the question text. Guessing this an "Other (please specify)" column.
#> unrelated q1_1 q1_2 q1_3 q1_other
#> 1 1 1 1 0 0
#> 2 2 1 1 0 1
#> 3 3 1 0 1 0
#> 4 4 NA NA NA NA
#> 5 5 0 0 0 1