Learn how to collect a series of columns that that contains target text in the header. Link to Example
You have a database-like table in a Google spreadsheet that is the output of a Google Form that asks questions on a variety of topics. The form itself is organized into sections, where the the first question ("What grade are you in?") will then jump to the relevant section with grade-specific questions. Each section has some questions that repeat ("English") but others that do not ("Design" because students don't take Design until 9th Grade). The result is the following:
( First section in form ) ( Second section in form )
Grade | English | Math | Science | English | Math | Science | Design
------------------------------------------------------------------------
8 | 2 | 4 | 3 | | | |
9 | | | | 5 | 5 | 1 | 3
And you'd like to make a tab that only contains the scores for English, one for Math, one for Science, and so on. Perhaps because you want to analyze the numbers (in this example, a Likert scale)
While this may be a trivial copy and paste job for the example given, I recently worked with a table with more than 400 columns and 200 rows of data.
The first thing to do is to output the target columns (in row 1 of the responses
tab) so that it lists the value of the column on the left, and the column number on the right:
=TRANSPOSE( { responses!B1:1 ; ARRAYFORMULA(COLUMN(responses!B1:1)) } )
so that we get this:
English | 2
Math | 3
Science | 4
English | 5
Math | 6
Science | 7
Design | 8
We then use that as the source for a QUERY where we can filter with contains '<text>'
. First let's dissect what's going on in that first formula. We use a literal array structure { "one" ; "two" }
in order to lay out the values of the columns responses!B1:1
on top and the column numbers on the bottom.
={ responses!B1:1 ; ARRAYFORMULA(COLUMN(responses!B1:1)) }
results in
English | Math | Science | English | Math | Science | Design
---------------------------------------------------------------
2 | 3 | 4 | 5 | 5 | 1 | 3
We have to use ARRAYFORMULA
outside the COLUMN(responses!B1:1)
because we are using a range and we need all of the values in the range. Without the ARRAYFORMULA
, it'll only produce {2}
but we need {2, 3, 4, …}
. A final TRANSPOSE
will reshape the data so that the info is aligned by row.
The query's select phrase will be:
"select Col2 where Col1 contains 'English'"
So that it will output only the columns that contain the phrase "English." We just have to reshape the data again so that the column numbers go across; we'll need another TRANSPOSE
. The following formula just outputs the columns that have English as the header:
=TRANSPOSE( QUERY(TRANSPOSE({responses!B1:1 ; ARRAYFORMULA(COLUMN(responses!B1:1))}), "select Col2 where Col1 contains 'English'") )
We'll use that to generate the headers that appear after Grade below. The idea is to get the output to look like this:
Grade | 2 | 5 |
-----------------------------
8 | 6 | |
9 | | 5 |
Now that we have the column numbers of our target headers output, we can use the RXCY
range specification to collect the responses we are looking for. Where B2 = 2, above, we can display the entire column with the following range specification, which is resolved from a string using the [https://support.google.com/docs/answer/3093377](INDIRECT formula). You would put this in the first data row (data row 2 in this example):
=INDIRECT("responses!R2C" & B1 & ":C" & B1, FALSE)
You can use right-fill to apply it to every column thereafter.
This step is optional, but probably highly preferred. We'd like to get it to look like this in the end:
Grade | Response | 2 | 5 |
----------------------------------------
8 | 6 | 6 | |
9 | 5 | | 5 |
That is, all the responses are collected into one column. Easy for filtering and analysis!
Here's the trick, where C2 is the start of the data that runs across:
=FILTER(C2:2, NOT(ISBLANK(C2:2)))
Fill down on every row thereafter, and you'll have your data all neatly formatted.