Skip to content

Instantly share code, notes, and snippets.

@MisterRayCo
Created February 21, 2017 20:25
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save MisterRayCo/383962671b9125f8e8a796c7818ed7c8 to your computer and use it in GitHub Desktop.
Save MisterRayCo/383962671b9125f8e8a796c7818ed7c8 to your computer and use it in GitHub Desktop.
This gist is an API post to create a new screenboard within your Datadog to display a few basic outlier and anomaly scenarios, using the aws.elb.request.count metric.
api_key=XXXXXXX
app_key=XXXXXXX
curl -X POST -H "Content-type: application/json" \
-d '{
"width": 1024,
"height": 768,
"board_title": "Demo - Anomaly & Outlier",
"widgets": [{
"type": "timeseries",
"title": true,
"title_size": 16,
"title_align": "left",
"title_text": "ELB Request Count + Basic Bound 2",
"height": 13,
"width": 38,
"y": 8,
"x": 1,
"timeframe": "1d",
"tile_def": {
"viz": "timeseries",
"requests": [{
"q": "anomalies(avg:aws.elb.request_count{*}.as_count(), \u0027basic\u0027, 2)",
"aggregator": "avg",
"type": "line",
"conditional_formats": []
}], "autoscale": true
}
},{
"type": "timeseries",
"title": true,
"title_size": 16,
"title_align": "left",
"title_text": "ELB Request Count + Basic Bound 5",
"height": 13,
"width": 38,
"y": 25,
"x": 1,
"timeframe": "1d",
"tile_def": {
"viz": "timeseries",
"requests": [{
"q": "anomalies(avg:aws.elb.request_count{*}.as_count(), \u0027basic\u0027, 5)",
"aggregator": "avg",
"conditional_formats": [],
"type": "line"
}], "autoscale": true
}
},{
"type": "timeseries",
"title": true,
"title_size": 16,
"title_align": "left",
"title_text": "ELB Request Count + Agile Bound 2",
"height": 13,
"width": 38,
"y": 42,
"x": 1,
"timeframe": "1d",
"tile_def": {
"viz": "timeseries",
"requests": [{
"q": "anomalies(avg:aws.elb.request_count{*}.as_count(), \u0027agile\u0027, 2)",
"aggregator": "avg",
"conditional_formats": [],
"type": "line"
}], "autoscale": true
}
},{
"type": "timeseries",
"title": true,
"title_size": 16,
"title_align": "left",
"title_text": "ELB Request Count + Robust Bound 2",
"height": 13,
"width": 38,
"y": 59,
"x": 1,
"timeframe": "1d",
"tile_def": {
"viz": "timeseries",
"requests": [{
"q": "anomalies(avg:aws.elb.request_count{*}.as_count(), \u0027robust\u0027, 2)",
"aggregator": "avg",
"conditional_formats": [],
"type": "line"
}], "autoscale": true
}
},{
"type": "timeseries",
"title": true,
"title_size": 16,
"title_align": "left",
"title_text": "ELB Request Count + Adaptive Bound 2",
"height": 13,
"width": 38,
"y": 76,
"x": 1,
"timeframe": "1d",
"tile_def": {
"viz": "timeseries",
"requests": [{
"q": "anomalies(avg:aws.elb.request_count{*}.as_count(), \u0027adaptive\u0027, 2)",
"aggregator": "avg",
"conditional_formats": [],
"type": "line"
}], "autoscale": true
}
},{
"title_size":16,
"title":true,
"refresh_every":30000,
"tick_pos":"50%",
"title_align":"left",
"tick_edge":"left",
"text_align":"center",
"title_text":"",
"height":5,
"bgcolor":"blue",
"html":"[Intro to Anomaly Detection Blog Post](https://www.datadoghq.com/blog/introducing-anomaly-detection-datadog/)",
"y":1,
"x":1,
"font_size":"18",
"tick":false,
"type":"note",
"width":38,
"auto_refresh":false
},
{
"title_size":16,
"title":true,
"refresh_every":30000,
"tick_pos":"50%",
"title_align":"left",
"tick_edge":"left",
"text_align":"center",
"title_text":"",
"height":5,
"bgcolor":"blue",
"html":"[Anomaly Detection Docs](http://docs.datadoghq.com/guides/anomalies/)",
"y":1,
"x":41,
"font_size":"18",
"tick":false,
"type":"note",
"width":38,
"auto_refresh":false
},
{
"title_size":16,
"title":true,
"refresh_every":30000,
"tick_pos":"50%",
"title_align":"left",
"tick_edge":"left",
"text_align":"left",
"title_text":"",
"height":15,
"bgcolor":"pink",
"html":"Basic uses a simple lagging rolling quantile computation to determine the range of expected values. It adjusts quickly to changing conditions but has no knowledge of seasonality or long-term trends.",
"y":8,
"x":41,
"font_size":"16",
"tick":true,
"type":"note",
"width":38,
"auto_refresh":false
},
{
"title_size":16,
"title":true,
"refresh_every":30000,
"tick_pos":"50%",
"title_align":"left",
"tick_edge":"left",
"text_align":"left",
"title_text":"",
"height":15,
"bgcolor":"pink",
"html":"Agile is a robust version of the seasonal autoregressive integrated moving average (SARIMA) algorithm. It is sensitive to seasonality but can also quickly adjust to level shifts in the metric\u2014for instance, if a code change increases the baseline level of requests per second.",
"y":42,
"x":41,
"font_size":"14",
"tick":true,
"type":"note",
"width":38,
"auto_refresh":false
},
{
"title_size":16,
"title":true,
"refresh_every":30000,
"tick_pos":"50%",
"title_align":"left",
"tick_edge":"left",
"text_align":"left",
"title_text":"",
"height":15,
"bgcolor":"pink",
"html":"Robust is a seasonal-trend decomposition algorithm that works best for seasonal metrics that have a relatively level baseline. Its predictions are very stable, so its forecast won\u2019t be unduly influenced by long-lasting anomalies.",
"y":59,
"x":41,
"font_size":"16",
"tick":true,
"type":"note",
"width":38,
"auto_refresh":false
},
{
"title_size":16,
"title":true,
"refresh_every":30000,
"tick_pos":"50%",
"title_align":"left",
"tick_edge":"left",
"text_align":"left",
"title_text":"",
"height":15,
"bgcolor":"pink",
"html":"Adaptive uses an online learning algorithm to readily adjust its predictions in response to changes. It is best used for metrics whose behavior is not consistent enough for agile or robust alerts.",
"y":76,
"x":41,
"font_size":"16",
"tick":true,
"type":"note",
"width":38,
"auto_refresh":false
},
{
"title_size":16,
"title":true,
"refresh_every":30000,
"tick_pos":"50%",
"title_align":"left",
"tick_edge":"left",
"text_align":"left",
"title_text":"",
"height":15,
"bgcolor":"yellow",
"html":"The bounds parameter in the query editor determines the tolerance of the anomaly detection algorithm, and hence the width of the \u201cnormal\u201d gray band. You can think of these bounds as deviations from the predicted timeseries value. For most timeseries, setting the bounds to 2 or 3 will capture most \u201cnormal\u201d points in the gray band. Here we see how the same algorithm looks with bounds set to 1 (narrowest), 2, 3, and 4 (widest)",
"y":25,
"x":41,
"font_size":"14",
"tick":true,
"type":"note",
"width":38,
"auto_refresh":false
},
{
"title_size":16,
"title":true,
"refresh_every":30000,
"tick_pos":"50%",
"title_align":"left",
"tick_edge":"left",
"text_align":"center",
"title_text":"",
"height":5,
"bgcolor":"blue",
"html":"[Outlier Detection Docs](http://docs.datadoghq.com/guides/outliers/)",
"y":1,
"x":83,
"font_size":"18",
"tick":false,
"type":"note",
"width":38,
"auto_refresh":false
},
{
"title_size":16,
"title":true,
"title_align":"left",
"title_text":"ELB Request Count + Outlier MAD, tol3 pct10",
"height":13,
"tile_def":{
"viz":"timeseries",
"requests":[
{
"q":"outliers(avg:aws.elb.request_count{*} by {host}.as_count(), \u0027MAD\u0027, 3, 10)",
"aggregator":"avg",
"conditional_formats":[
],
"type":"line",
"style":{
"palette":"orange"
}
}
],
"autoscale":true
},
"width":38,
"timeframe":"1d",
"y":8,
"x":83,
"legend_size":"0",
"type":"timeseries",
"legend":false
},
{
"title_size":16,
"title":true,
"refresh_every":30000,
"tick_pos":"50%",
"title_align":"left",
"tick_edge":"left",
"text_align":"center",
"title_text":"",
"height":5,
"bgcolor":"blue",
"html":"[Outlier Intro Blog](https://www.datadoghq.com/blog/introducing-outlier-detection-in-datadog/) & [Algorithm Blog Post](https://www.datadoghq.com/blog/outlier-detection-algorithms-at-datadog/)",
"y":1,
"x":123,
"font_size":"18",
"tick":false,
"type":"note",
"width":38,
"auto_refresh":false
},
{
"title_size":16,
"title":true,
"title_align":"left",
"title_text":"ELB Request Count + Outlier MAD, tol5 pct10",
"height":13,
"tile_def":{
"viz":"timeseries",
"requests":[
{
"q":"outliers(avg:aws.elb.request_count{*} by {host}.as_count(), \u0027MAD\u0027, 5, 10)",
"aggregator":"avg",
"conditional_formats":[
],
"type":"line",
"style":{
"palette":"orange"
}
}
],
"autoscale":true
},
"width":38,
"timeframe":"1d",
"y":25,
"x":83,
"legend_size":"0",
"type":"timeseries",
"legend":false
},
{
"title_size":16,
"title":true,
"title_align":"left",
"title_text":"ELB Request Count + Outlier MAD, tol3 pct80",
"height":13,
"tile_def":{
"viz":"timeseries",
"requests":[
{
"q":"outliers(avg:aws.elb.request_count{*} by {host}.as_count(), \u0027MAD\u0027, 3, 80)",
"aggregator":"avg",
"conditional_formats":[
],
"type":"line",
"style":{
"palette":"orange"
}
}
],
"autoscale":true
},
"width":38,
"timeframe":"1d",
"y":42,
"x":83,
"legend_size":"0",
"type":"timeseries",
"legend":false
},
{
"title_size":16,
"title":true,
"title_align":"left",
"title_text":"ELB Request Count + Outlier DBSCAN, tol5",
"height":13,
"tile_def":{
"viz":"timeseries",
"requests":[
{
"q":"outliers(avg:aws.elb.request_count{*} by {host}.as_count(), \u0027DBSCAN\u0027, 5)",
"aggregator":"avg",
"conditional_formats":[
],
"type":"line",
"style":{
"palette":"orange"
}
}
],
"autoscale":true
},
"width":38,
"timeframe":"1d",
"y":59,
"x":83,
"legend_size":"0",
"type":"timeseries",
"legend":false
},
{
"title_size":16,
"title":true,
"title_align":"left",
"title_text":"ELB Request Count + Outlier DBSCAN, tol5",
"height":13,
"tile_def":{
"viz":"timeseries",
"requests":[
{
"q":"outliers(avg:aws.elb.request_count{*} by {host}.as_count(), \u0027DBSCAN\u0027, 5)",
"aggregator":"avg",
"conditional_formats":[
],
"type":"line",
"style":{
"palette":"orange"
}
}
],
"autoscale":true
},
"width":38,
"timeframe":"1d",
"y":76,
"x":83,
"legend_size":"0",
"type":"timeseries",
"legend":false
},
{
"title_size":16,
"title":true,
"refresh_every":30000,
"tick_pos":"50%",
"title_align":"left",
"tick_edge":"left",
"text_align":"left",
"title_text":"",
"height":15,
"bgcolor":"pink",
"html":"DBSCAN, a popular density-based clustering algorithm, works by greedily agglomerating points that are close to each other. Clusters with few points in them are considered outliers. Traditionally, DBSCAN takes: 1) a parameter \u03b5 that specifies a distance threshold under which two points are considered to be close; and 2) the minimum number of points that have to be within a point\u2019s \u03b5-radius before that point can start agglomerating. ",
"y":59,
"x":123,
"font_size":"14",
"tick":true,
"type":"note",
"width":38,
"auto_refresh":false
},
{
"title_size":16,
"title":true,
"refresh_every":30000,
"tick_pos":"50%",
"title_align":"left",
"tick_edge":"left",
"text_align":"left",
"title_text":"",
"height":15,
"bgcolor":"yellow",
"html":"In our case, the data set encompasses all points in every time series within the selected time window. We take the MAD of all the points, then multiply it by a normalizing constant and a tolerance parameter. The constant normalizes MAD so that it is comparable to the standard deviation of the normal distribution. The tolerance parameter then specifies how many \u201cdeviations\u201d a point has to be away from the median for it to be considered an outlier.",
"y":25,
"x":123,
"font_size":"14",
"tick":true,
"type":"note",
"width":38,
"auto_refresh":false
},
{
"title_size":16,
"title":true,
"refresh_every":30000,
"tick_pos":"50%",
"title_align":"left",
"tick_edge":"left",
"text_align":"left",
"title_text":"",
"height":15,
"bgcolor":"pink",
"html":"The Median Absolute Deviation is a robust measure of variability, and can be viewed as the robust analogue for standard deviation. Robust statistics describe data in such a way that they are not unduly influenced by outliers. For a given set of data D = {d1, \u2026, dn}, the deviations are the difference between each di and median(D). The MAD is then the median of the absolute values of all the deviations. For example if D = {1, 2, 3, 4, 5, 6, 100}, then the median is 4, the deviations are {-3, -2, -1, 0, 1, 2, 96}, and the MAD is the median of {0, 1, 1, 2, 2, 3, 96}, which is 2. (Note that the standard deviation by contrast is 33.8.)",
"y":8,
"x":123,
"font_size":"16",
"tick":true,
"type":"note",
"width":38,
"auto_refresh":false
},
{
"title_size":16,
"title":true,
"refresh_every":30000,
"tick_pos":"50%",
"title_align":"left",
"tick_edge":"left",
"text_align":"left",
"title_text":"",
"height":15,
"bgcolor":"yellow",
"html":"Now, to mark a time series as an outlier, we use a second parameter, pct. If more than pct% of a particular series\u2019 points are considered outliers, then the whole series is marked as an outlier. Here is MAD with a tolerance of 3 and pct of 20 in action when comparing the average system load by availability zone:",
"y":42,
"x":123,
"font_size":"14",
"tick":true,
"type":"note",
"width":38,
"auto_refresh":false
},
{
"title_size":16,
"title":true,
"refresh_every":30000,
"tick_pos":"50%",
"title_align":"left",
"tick_edge":"left",
"text_align":"left",
"title_text":"",
"height":15,
"bgcolor":"yellow",
"html":"The only parameter we take is tolerance, the constant by which the initial threshold is multiplied to yield DBSCAN\u2019s distance parameter \u03b5. You should set the tolerance parameter depending on how similarly you expect your group of hosts to behave\u2014larger values allow for more tolerance in how much a host can deviate from its peers.\n",
"y":76,
"x":123,
"font_size":"14",
"tick":true,
"type":"note",
"width":38,
"auto_refresh":false
}
]
}' \
"https://app.datadoghq.com/api/v1/screen?api_key=XXXX&application_key=XXXXX"
@MisterRayCo
Copy link
Author

Put in your application and api keys in lines 1,2, and 584.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment