ctesta01/matrix_single_answer_results.md

## matrix_single_answer_results.md

      
    Raw
  

              matrix_single_answer_results.md
            
          
    This page was last updated on July 21st, 2016.
The commit this reflects the code from is:
82af5
Matrix Single Answer

Matrix questions use "choice" to indicate a horizontal question component
and "answer" to indicate a vertical option. For example, if a
matrix questions options were formatted like the following table, the
"choices" would be the list of "apple", "orange" and "banana". Similarly, the
"answers" are the 1, 2, and 3 options a respondent could select for each choice.


1
2
3


apple


orange


banana


Additionally, the question[['Payload']][['RecodeValues']] assign custom indexing
to the indexing of the question[['Payload']][['Answers']] list. Similarly,
the question[['Payload']][['ChoiceDataExportTags']] assign custom naming for the question
parts listed in the question[['Payload']][['Choices']] list.


Line 35, R/results_generation.R is where we begin.
matrix_single_answer_results <- function(question) {


First, we determine the factors that we will be using to table the responses to each column.
  if ("RecodeValues" %in% names(question[['Payload']]) && length(question[['Payload']][['RecodeValues']]) > 0) {
    factors <- unlist(question[['Payload']][['RecodeValues']])
  } else if ("AnswerOrder" %in% names(question[['Payload']]) && length(question[['Payload']][['AnswerOrder']]) > 0){
    factors <- unlist(question[['Payload']][['AnswerOrder']])
  } else {
    factors <- names(question[['Payload']][['Answers']])
  }


We select the numeric response columns as the columns which don't include "TEXT" in the name,
then we table the respondents by the factors for each response column.
  orig_responses <- question[['Responses']]
  not_text_columns <- which(sapply(colnames(orig_responses), function(x) !(grepl("TEXT", x))))
  orig_responses <- orig_responses[, not_text_columns]
  responses <- sapply(orig_responses, function(x) table(factor(x, factors)))


To get the total number of valid respondents for a given column, we
take the number of responses greater than or equal to zero for each
column.
  N <- sapply(orig_responses, function(x) suppressWarnings(strtoi(length(which(as.integer(as.character(x)) >= 0)))))


Here, we transpose the responses data frame so that we have each
answer in the first entry of a column, and below it the number of respondents
choosing that answer for a given question part (corresponding to the row, also called a question choice).
  responses <- t(responses)


We use the data export tag to construct a list of possible variations
of the data export tag that should be stripped from the column names.
  export_tag <- question[['Payload']][['DataExportTag']]
  export_tags <- c(export_tag, gsub("#", "_", export_tag), gsub("-", "_", export_tag))
  export_tags <- sapply(export_tags, function(x) paste0(x, "[-#_]"))
  export_tags <- paste(export_tags, collapse="|")
  choice_export_tags_with_underscores <- sapply(question[['Payload']][['ChoiceDataExportTags']], function(x) gsub("-", "_", x))
  response_names_without_export_tag <- gsub(export_tags, "", names(orig_responses))
If the question we're processing was derived originally from a Side-by-Side question,
then the response column will be prepended with an "AnswerDataExportTag" -- let's get rid of that.
  if ("AnswerDataExportTag" %in% names(question[['Payload']])) {
    response_names_without_export_tag <- gsub(paste0("_", question[['Payload']][['AnswerDataExportTag']]), "", response_names_without_export_tag)
  }
Next, we break into three different logical cases: whether or not there are recode values, and if there are recode values, whether any are less than zero.
  if ("RecodeValues" %in% names(question[['Payload']]) && length(question[['Payload']][['RecodeValues']]) > 0) {
    if (any(question[['Payload']][['RecodeValues']] < 0)) {
If there are recode values, and there are recode values less than zero, then we will handle the
answers corresponding to the negative recode values as "NA-like" responses.
Since all that should be left in the column names at this point are the recode values,
we can simply select the columns with negative recode values and separate them into their
own data frame called na_columns, and remove them from the rest of the responses.
      na_choices <- question[['Payload']][['RecodeValues']][which(question[['Payload']][['RecodeValues']] < 0)]
      na_columns <- responses[, unlist(na_choices), drop=FALSE]
      responses <- responses[,!(colnames(responses) %in% na_choices)]
Next, we distinguish between the valid respondents and the total respondents. N will continue
to be used as our valid respondents, but we introduce a new variable to use for the total number
of respondents: total_N.
      total_N <- sapply(orig_responses, function(x) strtoi(length(which(x != -99 & x != ""))))


Now we can calculate the percentages of non-applicable respondents.
      for (i in 1:ncol(na_columns)) {
        for (j in 1:nrow(na_columns)) {
          na_columns[j,i] <- percent0(strtoi(na_columns[j,i]) /
                                        total_N[i])
        }
      }
We use the recode values to translate to answer indices, and then we use those indices to get the
answer text for each column of the na_columns.
      colnames(na_columns) <- sapply(colnames(na_columns), function(x)
        names(question[['Payload']][['RecodeValues']][which(question[['Payload']][['RecodeValues']] == x)])[[1]])
      colnames(na_columns) <- sapply(colnames(na_columns), function(x) question[['Payload']][['Answers']][[x]][[1]])
    }
The last bracket closed the conditional code block used for questions with NA choices, so we're moving
into code that runs on all questions with recode values.
We use the recode values and the answer indices to get the answer text in the names of the
columns.
    answers_uncoded <- sapply(colnames(responses), function(x)
      names(question[['Payload']][['RecodeValues']][which(question[['Payload']][['RecodeValues']] == x)])[[1]])
    answers <- lapply(answers_uncoded, function(x) question[['Payload']][['Answers']][[x]][[1]])
    answers <- unlist(answers, use.names = FALSE)
  } else {
Now, we move into the conditional code block for questions that have no recode values.
For those questions, we directly use the indexing and text in question[['Payload']][['Answers']]
  } else {
    answers <- sapply(colnames(responses), function(x) question[['Payload']][['Answers']][[x]][[1]])
  }
Now, we can clean the