kylebrandt/gist:b17ff133127b88165fd22a65a0424e81

## gistfile1.md

      
    Raw
  

              gistfile1.md
            
          
    Support Graphing of Long (a.k.a. Tall) and Wide Format Time Series Dataframes

Use Case

Currently our SQL plugins derive things from the special names of the columns. Instead we can use more conventional Long and Wide formats and derive the format from the schema. This will result in a more conventional SQL experience for users. The amount of code plugin authors need to create will be reduced.
Taking a MySQL Example:
SELECT
  $__timeGroup(createdAt,'$summarize') as time_sec,
  avg(value) as value,
  hostname as metric
FROM 
  grafana_metric
WHERE
  $__timeFilter(createdAt) AND
  measurement = 'logins.count' AND
  hostname IN($host)
GROUP BY 1, 3
ORDER BY 1
Which becomes:
SELECT
  UNIX_TIMESTAMP(createdAt) DIV 60 * 60 as time_sec,
  avg(value) as value,
  hostname as metric
FROM 
  grafana_metric
WHERE
  createdAt BETWEEN FROM_UNIXTIME(1580908377) AND FROM_UNIXTIME(1580985308) AND
  measurement = 'logins.count' AND
  hostname IN('10.1.100.1','10.1.100.10','server1','server2')
GROUP BY 1, 3
ORDER BY 1
Produces Long Format Data:
"Time";"Metric";"Value"
"2020-02-06 10:35:00";"server2";"445.36"
"2020-02-06 10:35:00";"server1";"445.85"
"2020-02-06 10:35:00";"10.1.100.10";"445.64"
"2020-02-06 10:35:00";"10.1.100.1";"445.59"
"2020-02-06 10:34:00";"server2";"446.28"
"2020-02-06 10:34:00";"server1";"446.25"
"2020-02-06 10:34:00";"10.1.100.10";"446.56"
"2020-02-06 10:34:00";"10.1.100.1";"446.72"

In order to minimally impact existing queries, we can update timeGroup to select time column. Users will still have to update when not using macros. Will also need to make sure we can always select into time objects from our all of our sql datasources to have a proper schema.
So the SQL query becomes the following (having sql create time):
SELECT
  FROM_UNIXTIME(UNIX_TIMESTAMP(createdAt) DIV 60 * 60) as time_sec,
  avg(value) as value,
  hostname as metric
FROM 
  grafana_metric
WHERE
  createdAt BETWEEN FROM_UNIXTIME(1580908377) AND FROM_UNIXTIME(1580985308) AND
  measurement = 'logins.count' AND
  hostname IN('10.1.100.1','10.1.100.10','server1','server2')
GROUP BY 1, 3
ORDER BY 1
Distinguishing Aspects Formats


Both:

Have Time column
Have one or more Numeric Field


Long

Will have String Fields (in Result)
Time will be repeated (Not detectable in schema, may be sparse)
Have no Labels on the Dataframe's Fields.


Wide

Will Not have string Fields
Time (should) not be repeated


How Long Format Time Series format is interpreted

Reference from azure data explorer datasource: https://github.com/grafana/azure-data-explorer-datasource#time-series-queries

Has one Time column.
Number type columns are "Metrics". The name of the column is the name of the metric.
String type columns are Key/Value pair identifiers (a.k.a labels, tags). The column name is the identifer key, the value in each row is identifier Value.
Each unique time series in the response is the unique combination of metric and Key/Value pair identifiers.

How Wide Format Time Series format is interpret


Has one Time column.
Has one or more Number value columns. The Unique name of the column is the Metric.
If the column has a Labels property, they become Key/Value pair identifiers. And the Name does not have to be unique if Name+Labels is unique.

Misc notes and Reference


Alerting and GEL will need same interpretation ability.
There are some other types to support. Both those involve nested arrays most likely and dataframe does have those yet.
Wide vs Long is determined by format of query result, not format of data in database.
There may be cases where there is a second time column, and maybe that should be treated as a graph-able value.
We should not force these schemas to always be time series. Assumption is the context is one of time series.
A reference on wide vs long: https://docs.timescale.com/latest/introduction/data-model