Skip to content

Instantly share code, notes, and snippets.

@brainysmurf
Last active September 24, 2023 12:06
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save brainysmurf/85ee08a48a6f482f1cab78da9cffc774 to your computer and use it in GitHub Desktop.
Save brainysmurf/85ee08a48a6f482f1cab78da9cffc774 to your computer and use it in GitHub Desktop.
Learn how to identify and output all columns that have certain text

GSheets Formula Tipizoid #1

Learn how to collect a series of columns that that contains target text in the header. Link to Example

The data

You have a database-like table in a Google spreadsheet that is the output of a Google Form that asks questions on a variety of topics. The form itself is organized into sections, where the the first question ("What grade are you in?") will then jump to the relevant section with grade-specific questions. Each section has some questions that repeat ("English") but others that do not ("Design" because students don't take Design until 9th Grade). The result is the following:

        (  First section in form   ) (    Second section in form   )

  Grade |  English | Math | Science | English | Math | Science |  Design
------------------------------------------------------------------------
   8    |     2    |   4  |    3    |         |      |         |
   9    |          |      |         |    5    |  5   |    1    |    3

And you'd like to make a tab that only contains the scores for English, one for Math, one for Science, and so on. Perhaps because you want to analyze the numbers (in this example, a Likert scale)

While this may be a trivial copy and paste job for the example given, I recently worked with a table with more than 400 columns and 200 rows of data.

The solution

Output the column header and column number in a table

The first thing to do is to output the target columns (in row 1 of the responses tab) so that it lists the value of the column on the left, and the column number on the right:

=TRANSPOSE( { responses!B1:1 ; ARRAYFORMULA(COLUMN(responses!B1:1)) } )

so that we get this:

English | 2
Math    | 3
Science | 4
English | 5
Math    | 6
Science | 7
Design  | 8

We then use that as the source for a QUERY where we can filter with contains '<text>' . First let's dissect what's going on in that first formula. We use a literal array structure { "one" ; "two" } in order to lay out the values of the columns responses!B1:1 on top and the column numbers on the bottom.

={ responses!B1:1 ; ARRAYFORMULA(COLUMN(responses!B1:1)) }

results in

  English |  Math | Science | English | Math | Science | Design
---------------------------------------------------------------
     2    |    3  |   4     |    5    |  5   |    1    |   3 

We have to use ARRAYFORMULA outside the COLUMN(responses!B1:1) because we are using a range and we need all of the values in the range. Without the ARRAYFORMULA, it'll only produce {2} but we need {2, 3, 4, …}. A final TRANSPOSE will reshape the data so that the info is aligned by row.

The query's select phrase will be:

"select Col2 where Col1 contains 'English'"

So that it will output only the columns that contain the phrase "English." We just have to reshape the data again so that the column numbers go across; we'll need another TRANSPOSE. The following formula just outputs the columns that have English as the header:

=TRANSPOSE( QUERY(TRANSPOSE({responses!B1:1 ; ARRAYFORMULA(COLUMN(responses!B1:1))}), "select Col2 where Col1 contains 'English'") )

We'll use that to generate the headers that appear after Grade below. The idea is to get the output to look like this:

Grade |     2    |     5    |
-----------------------------
8     |     6    |          |
9     |          |     5    |

Now that we have the column numbers of our target headers output, we can use the RXCY range specification to collect the responses we are looking for. Where B2 = 2, above, we can display the entire column with the following range specification, which is resolved from a string using the [https://support.google.com/docs/answer/3093377](INDIRECT formula). You would put this in the first data row (data row 2 in this example):

=INDIRECT("responses!R2C" & B1 & ":C" & B1, FALSE)

You can use right-fill to apply it to every column thereafter.

Use horizontal filter to shift the data into one column

This step is optional, but probably highly preferred. We'd like to get it to look like this in the end:

Grade | Response |     2    |     5    |
----------------------------------------
8     |     6    |     6    |          |
9     |     5    |          |     5    |

That is, all the responses are collected into one column. Easy for filtering and analysis!

Here's the trick, where C2 is the start of the data that runs across:

=FILTER(C2:2, NOT(ISBLANK(C2:2)))

Fill down on every row thereafter, and you'll have your data all neatly formatted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment