clarkbw/snippets.md

## snippets.md

      
    Raw
  

              snippets.md
            
          
    re:dash

github | redash.io
Helpful Resources

Dashboards and Query code snippets in the re:dash demo
Telemetry meets SQL by rvitillo
Custom Dashboards with re:dash
Longitudinal Examples
Longitudinal Dataset Tutorial in Python

google spreadsheets : database

You can use Google Spreadshets as a query resource which allows you to do additional calculations and other processing of data which is then collected into re:dash to build your dashboards.  The disadvantage of Google Spreadsheets is that unlike other queries you will need to update the spread data manually instead of having re:dash re-run the query on a repeating schedule.

NOTE: The first row of your spreadsheet should be the column names

Here is how to query your spreadsheet:

Share your spreadsheet with gspread@pipeline-sql-prod.iam.gserviceaccount.com (edit privs?)
(agree to the warning about this being an external to the Mozilla account)
In the re:dash query use the ID of the spreadsheet and the (zero based) index of the sheet ID|SHEET_INDEX
The ID can be found in the URL: https://docs.google.com/spreadsheets/d/1-DaBLABLAmu_lVGunk8I/edit has an ID of 1-DaBLABLAmu_lVGunk8I
The sheet index can be omitted and the query will use the first (left most) sheet, to use the second sheet you would write 1-DaBLABLAmu_lVGunk8I|1.  The sheet can be named anything, the sheet index is just like an array index of all the sheets, if you rearrange the order of your sheets you will need to update the index values you use.
Run your query

presto : database

Presto docs | Presto Functions
Use this snippet to see all the tables available.
show tables
You can download this data as a CSV / XSL file so you always have it handy.
longitudinal : table

This is a sample (1%) of all users (all channels) over time (180 days).  If you're unsure which db to use, this is the one you're looking for.

The longitudinal dataset is logically organized as a table where rows represent profiles and columns represent the various metrics (e.g. startup time). Each field of the table contains a list of values, one per Telemetry submission received for that profile.

Here's why you want to use this table

Easier reporting of profile level metrics by grouping all data for a client-id in the same row
Samples to 1% of all recent profiles, which will reduce query computation time and save resources (compared to main_summary)
Profiles across all channels with up to 180 days of data

Get Started Examples
desc longitudinal
This will list all the available columns in the table.
SELECT * 
FROM longitudinal 
LIMIT 1000
This will query a sample of the table, by adding the LIMIT 1000 you can run test queries much faster and remove the limit when you're ready to run the query for real.
SELECT * 
FROM longitudinal 
-- LIMIT 1000
SQL uses the -- at the start of a line to comment out the line, remember?
» system : column

System contains a number of elements related to the Firefox operating system.
The following example queries to group systems by memory for systems running in WoW64 mode.
SELECT count(client_id) as clients, system[1].memory_mb as memory_mb
FROM longitudinal
WHERE system[1].is_wow64 = TRUE -- 32-bit Firefox on 64-bit OS
GROUP BY system[1].memory_mb
see all system fields
» build : column

Build contains a number of elements related to the build of Firefox running on the system.
The following example queries to group all systems by architecture.
SELECT count(client_id) as clients, build[1].architecture as architecture
FROM longitudinal
GROUP BY build[1].architecture -- 'x86-64' equals 64-bit Firefox (on 64-bit OS)
see all build fields
» system_gfx : column

System GFX contains a number of elements related to the graphics system of Firefox.
The following example queries to group all systems by the number of monitors connected.
SELECT count(client_id) as clients, cardinality(system_gfx[1].monitors) as monitors
FROM longitudinal
GROUP BY cardinality(system_gfx[1].monitors)
HAVING cardinality(system_gfx[1].monitors) > 0 -- removing odd headless
see all system_gfx fields
client_count : table

Client count is a subset of the main_summary database for querying counts like, How many users with e10s are on each channel?
desc client_count
SELECT * FROM client_count LIMIT 100
» activity_date : column

Limit the activity to a range that makes sense for your query.  Start with when the telemetry probe went live, like '2016-02-00' in the example.  Then cut off the last two weeks so your graph doesn't trail off, looking like total doom to come. from_iso8601_date(activity_date) < current_date - interval '14' day
WHERE activity_date > '2016-02-00' AND from_iso8601_date(activity_date) < current_date - interval '14' day


Do Not Want
WANT


main_summary : table

This is the full database, you probably don't want to use this unless you like waiting 30+ min for your queries to finish.  On the plus side this db returns the most normal SQL compared to others which return array values.
desc main_summary
SELECT * FROM main_summary LIMIT 100