Skip to content

Instantly share code, notes, and snippets.

@ctesta01
Last active July 21, 2016 16:40
Show Gist options
  • Save ctesta01/f22eeb46cf5aca8cfc964f2613970be3 to your computer and use it in GitHub Desktop.
Save ctesta01/f22eeb46cf5aca8cfc964f2613970be3 to your computer and use it in GitHub Desktop.
QualtricsTools: How Matrix SA Questions Work

This page was last updated on July 21st, 2016.

The commit this reflects the code from is: 82af5

Matrix Single Answer

Matrix questions use "choice" to indicate a horizontal question component and "answer" to indicate a vertical option. For example, if a matrix questions options were formatted like the following table, the "choices" would be the list of "apple", "orange" and "banana". Similarly, the "answers" are the 1, 2, and 3 options a respondent could select for each choice.

1 2 3
apple
orange
banana

Additionally, the question[['Payload']][['RecodeValues']] assign custom indexing to the indexing of the question[['Payload']][['Answers']] list. Similarly, the question[['Payload']][['ChoiceDataExportTags']] assign custom naming for the question parts listed in the question[['Payload']][['Choices']] list.

Line 35, R/results_generation.R is where we begin.

matrix_single_answer_results <- function(question) {

First, we determine the factors that we will be using to table the responses to each column.

  if ("RecodeValues" %in% names(question[['Payload']]) && length(question[['Payload']][['RecodeValues']]) > 0) {
    factors <- unlist(question[['Payload']][['RecodeValues']])
  } else if ("AnswerOrder" %in% names(question[['Payload']]) && length(question[['Payload']][['AnswerOrder']]) > 0){
    factors <- unlist(question[['Payload']][['AnswerOrder']])
  } else {
    factors <- names(question[['Payload']][['Answers']])
  }

We select the numeric response columns as the columns which don't include "TEXT" in the name, then we table the respondents by the factors for each response column.

  orig_responses <- question[['Responses']]
  not_text_columns <- which(sapply(colnames(orig_responses), function(x) !(grepl("TEXT", x))))
  orig_responses <- orig_responses[, not_text_columns]
  responses <- sapply(orig_responses, function(x) table(factor(x, factors)))

To get the total number of valid respondents for a given column, we take the number of responses greater than or equal to zero for each column.

  N <- sapply(orig_responses, function(x) suppressWarnings(strtoi(length(which(as.integer(as.character(x)) >= 0)))))

Here, we transpose the responses data frame so that we have each answer in the first entry of a column, and below it the number of respondents choosing that answer for a given question part (corresponding to the row, also called a question choice).

  responses <- t(responses)

We use the data export tag to construct a list of possible variations of the data export tag that should be stripped from the column names.

  export_tag <- question[['Payload']][['DataExportTag']]
  export_tags <- c(export_tag, gsub("#", "_", export_tag), gsub("-", "_", export_tag))
  export_tags <- sapply(export_tags, function(x) paste0(x, "[-#_]"))
  export_tags <- paste(export_tags, collapse="|")
  choice_export_tags_with_underscores <- sapply(question[['Payload']][['ChoiceDataExportTags']], function(x) gsub("-", "_", x))
  response_names_without_export_tag <- gsub(export_tags, "", names(orig_responses))

If the question we're processing was derived originally from a Side-by-Side question, then the response column will be prepended with an "AnswerDataExportTag" -- let's get rid of that.

  if ("AnswerDataExportTag" %in% names(question[['Payload']])) {
    response_names_without_export_tag <- gsub(paste0("_", question[['Payload']][['AnswerDataExportTag']]), "", response_names_without_export_tag)
  }

Next, we break into three different logical cases: whether or not there are recode values, and if there are recode values, whether any are less than zero.

  if ("RecodeValues" %in% names(question[['Payload']]) && length(question[['Payload']][['RecodeValues']]) > 0) {
    if (any(question[['Payload']][['RecodeValues']] < 0)) {

If there are recode values, and there are recode values less than zero, then we will handle the answers corresponding to the negative recode values as "NA-like" responses. Since all that should be left in the column names at this point are the recode values, we can simply select the columns with negative recode values and separate them into their own data frame called na_columns, and remove them from the rest of the responses.

      na_choices <- question[['Payload']][['RecodeValues']][which(question[['Payload']][['RecodeValues']] < 0)]
      na_columns <- responses[, unlist(na_choices), drop=FALSE]
      responses <- responses[,!(colnames(responses) %in% na_choices)]

Next, we distinguish between the valid respondents and the total respondents. N will continue to be used as our valid respondents, but we introduce a new variable to use for the total number of respondents: total_N.

      total_N <- sapply(orig_responses, function(x) strtoi(length(which(x != -99 & x != ""))))

Now we can calculate the percentages of non-applicable respondents.

      for (i in 1:ncol(na_columns)) {
        for (j in 1:nrow(na_columns)) {
          na_columns[j,i] <- percent0(strtoi(na_columns[j,i]) /
                                        total_N[i])
        }
      }

We use the recode values to translate to answer indices, and then we use those indices to get the answer text for each column of the na_columns.

      colnames(na_columns) <- sapply(colnames(na_columns), function(x)
        names(question[['Payload']][['RecodeValues']][which(question[['Payload']][['RecodeValues']] == x)])[[1]])
      colnames(na_columns) <- sapply(colnames(na_columns), function(x) question[['Payload']][['Answers']][[x]][[1]])
    }

The last bracket closed the conditional code block used for questions with NA choices, so we're moving into code that runs on all questions with recode values. We use the recode values and the answer indices to get the answer text in the names of the columns.

    answers_uncoded <- sapply(colnames(responses), function(x)
      names(question[['Payload']][['RecodeValues']][which(question[['Payload']][['RecodeValues']] == x)])[[1]])
    answers <- lapply(answers_uncoded, function(x) question[['Payload']][['Answers']][[x]][[1]])
    answers <- unlist(answers, use.names = FALSE)
  } else {

Now, we move into the conditional code block for questions that have no recode values. For those questions, we directly use the indexing and text in question[['Payload']][['Answers']]

  } else {
    answers <- sapply(colnames(responses), function(x) question[['Payload']][['Answers']][[x]][[1]])
  }

Now, we can clean the

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment