Skip to content

Instantly share code, notes, and snippets.

@dcramer
Last active December 12, 2015 12:39
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dcramer/4773531 to your computer and use it in GitHub Desktop.
Save dcramer/4773531 to your computer and use it in GitHub Desktop.
Track last notified
===================
Alert model
-> binds to many users
- alert_date
Alert on Threshold
==================
Signifies increase from >0 by N%
If an alert is generated, it then sets a key signifying that a potential alert has happened.
- That key expires in 60 seconds???
- key is bound to alert params (e.g. team-wide alert) and the normalized minute
When potential alert happens, we check a for a baseline (15 minutes history?) to ensure that
the anomoly really is something meaningful. The primary purpose is to look for gaps here.
e.g. if our 15 minute interval is [..., 100, 100, 100, 0, 100] we don't want to alert when it
went from 0 to 100 since it was sitting around 100 for most of the time.
Costs:
- No additional writes on incr calls
- Two additional gets per incr (one for the "have we tried to alert" and one for "previous value")
- Entire range call to check on an alert
We could get off of doing this in realtime if we just check periodically, which removes the two additional
gets per call:
- Store a sorted set per project
- Each sorted set contains the number of events seen in the interval (1 minute)
- An additional set contains the number of unique events seen
- Every minute we iterate this sorted set (we can exploit the queue just like buffers to avoid crons)
- We clear the results immediately to no-op any concurrent tasks that might try to run
- The task fires off a set of subtasks that individually check each project
- Each project's value is compared to the historic value in the last N minutes (15m for redis counters or
a period of time using the SQL counters)
- We only alert if an alert has not been seen on this condition in the last N minutes
Notes:
- Nydus optimizes out multiple writes/gets, so its not as expensive as it looks
- Values that are not set need to constitute missing data, and we either need to ignore them or normalize them to the
average from the before/after points
- Celery has ``expires=datetime.now() + timedelta(days=1)`` on tasks
@dcramer
Copy link
Author

dcramer commented Feb 12, 2013

Of note, "two additional gets" is actually optimized out by Nydus in that its two concurrent gets on different nodes, or its a single pipelined get.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment