Skip to content

Instantly share code, notes, and snippets.

@jdaigle
Last active November 1, 2021 22:31
Show Gist options
  • Save jdaigle/bdf55889b0935b3f6775 to your computer and use it in GitHub Desktop.
Save jdaigle/bdf55889b0935b3f6775 to your computer and use it in GitHub Desktop.
Stats Aggregation and SQL Time Series Database
  • We can implement something like https://github.com/etsy/statsd. This will collect and aggregate metrics.
  • At pre-defined intervals (say... every 10 seconds), these metrics are flushed to a backend store
  • We could use SQL Server to store the metrics:

#SQL Schema

  • Each series is a seperate table containing a timestamp epoch (bigint) and a value (float)
  • Rows are inserted, timestamp is the PK

#Example Query

This will query data from two series, between two times, aggregated to the hour

WITH data AS (
SELECT * FROM Series1
UNION
SELECT * FROM Series2
)
SELECT
    (Timestamp - (Timestamp % 3600)),
    COUNT(*),
    AVG(Value)
FROM data
WHERE Timestamp >= 100000 AND Timestamp <= 200000
GROUP BY (Timestamp - (Timestamp % 3600))
ORDER BY (Timestamp - (Timestamp % 3600))

Here is a neat query that joins two series, and then does some aggregations based on calculations between the series. Basically, filter each series and then FULL OUTER JOIN on timestamp columns. Then aggregate.

; WITH 
 _series1 AS (SELECT (Timestamp - (Timestamp % 3600)) AS [Hour], * FROM Series1 WHERE Timestamp >= 100000 AND Timestamp <= 200000)
,_series2 AS (SELECT (Timestamp - (Timestamp % 3600)) AS [Hour], * FROM Series2 WHERE Timestamp >= 100000 AND Timestamp <= 200000)
SELECT
    COALESCE(_series1.[Hour], _series2.[Hour])
    ,COUNT(*)
    ,MIN(_series1.Value - _series2.Value)
FROM _series1 FULL OUTER JOIN _series2 ON _series1.Timestamp = _series2.Timestamp
GROUP BY COALESCE(_series1.[Hour], _series2.[Hour])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment