Skip to content

Instantly share code, notes, and snippets.

@clarkbw
Last active December 1, 2017 18:58
Show Gist options
  • Save clarkbw/f0aa27e5d1635b62d49d7dab4f31e842 to your computer and use it in GitHub Desktop.
Save clarkbw/f0aa27e5d1635b62d49d7dab4f31e842 to your computer and use it in GitHub Desktop.
Useful snippets for mozilla re:dash

re:dash

github | redash.io

Helpful Resources

google spreadsheets : database

You can use Google Spreadshets as a query resource which allows you to do additional calculations and other processing of data which is then collected into re:dash to build your dashboards. The disadvantage of Google Spreadsheets is that unlike other queries you will need to update the spread data manually instead of having re:dash re-run the query on a repeating schedule.

NOTE: The first row of your spreadsheet should be the column names

Here is how to query your spreadsheet:

  • Share your spreadsheet with gspread@pipeline-sql-prod.iam.gserviceaccount.com (edit privs?)
  • (agree to the warning about this being an external to the Mozilla account)
  • In the re:dash query use the ID of the spreadsheet and the (zero based) index of the sheet ID|SHEET_INDEX
  • The ID can be found in the URL: https://docs.google.com/spreadsheets/d/1-DaBLABLAmu_lVGunk8I/edit has an ID of 1-DaBLABLAmu_lVGunk8I
  • The sheet index can be omitted and the query will use the first (left most) sheet, to use the second sheet you would write 1-DaBLABLAmu_lVGunk8I|1. The sheet can be named anything, the sheet index is just like an array index of all the sheets, if you rearrange the order of your sheets you will need to update the index values you use.
  • Run your query

presto : database

Presto docs | Presto Functions

Use this snippet to see all the tables available.

show tables

You can download this data as a CSV / XSL file so you always have it handy.

longitudinal : table

This is a sample (1%) of all users (all channels) over time (180 days). If you're unsure which db to use, this is the one you're looking for.

The longitudinal dataset is logically organized as a table where rows represent profiles and columns represent the various metrics (e.g. startup time). Each field of the table contains a list of values, one per Telemetry submission received for that profile.

Here's why you want to use this table

  • Easier reporting of profile level metrics by grouping all data for a client-id in the same row
  • Samples to 1% of all recent profiles, which will reduce query computation time and save resources (compared to main_summary)
  • Profiles across all channels with up to 180 days of data

Get Started Examples

desc longitudinal

This will list all the available columns in the table.

SELECT * 
FROM longitudinal 
LIMIT 1000

This will query a sample of the table, by adding the LIMIT 1000 you can run test queries much faster and remove the limit when you're ready to run the query for real.

SELECT * 
FROM longitudinal 
-- LIMIT 1000

SQL uses the -- at the start of a line to comment out the line, remember?

» system : column

System contains a number of elements related to the Firefox operating system.

The following example queries to group systems by memory for systems running in WoW64 mode.

SELECT count(client_id) as clients, system[1].memory_mb as memory_mb
FROM longitudinal
WHERE system[1].is_wow64 = TRUE -- 32-bit Firefox on 64-bit OS
GROUP BY system[1].memory_mb

see all system fields

» build : column

Build contains a number of elements related to the build of Firefox running on the system.

The following example queries to group all systems by architecture.

SELECT count(client_id) as clients, build[1].architecture as architecture
FROM longitudinal
GROUP BY build[1].architecture -- 'x86-64' equals 64-bit Firefox (on 64-bit OS)

see all build fields

» system_gfx : column

System GFX contains a number of elements related to the graphics system of Firefox.

The following example queries to group all systems by the number of monitors connected.

SELECT count(client_id) as clients, cardinality(system_gfx[1].monitors) as monitors
FROM longitudinal
GROUP BY cardinality(system_gfx[1].monitors)
HAVING cardinality(system_gfx[1].monitors) > 0 -- removing odd headless

see all system_gfx fields

client_count : table

Client count is a subset of the main_summary database for querying counts like, How many users with e10s are on each channel?

desc client_count
SELECT * FROM client_count LIMIT 100

» activity_date : column

Limit the activity to a range that makes sense for your query. Start with when the telemetry probe went live, like '2016-02-00' in the example. Then cut off the last two weeks so your graph doesn't trail off, looking like total doom to come. from_iso8601_date(activity_date) < current_date - interval '14' day

WHERE activity_date > '2016-02-00' AND from_iso8601_date(activity_date) < current_date - interval '14' day
Do Not Want WANT
do not want WANT

main_summary : table

This is the full database, you probably don't want to use this unless you like waiting 30+ min for your queries to finish. On the plus side this db returns the most normal SQL compared to others which return array values.

desc main_summary
SELECT * FROM main_summary LIMIT 100
@digitarald
Copy link

@clarkbw, here a more readable line for clamping on dates:

AND from_iso8601_date(activity_date) < current_date - interval '4' day

@clarkbw
Copy link
Author

clarkbw commented Aug 19, 2016

Nice, updated! Thanks @digitarald

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment