Will Welch welch

## README.md

      
              18 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                welch
                / README.md
            
            
              Last active
              August 29, 2015 14:19
            
              
                testbed for juttle forecasting
              
          
    Streaming statistics, forecasting, and anomaly detection in Juttle


## README.md

      
              8 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                welch
                / README.md
            
            
              Last active
              August 29, 2015 14:19
            
              
                juttle forecast modules: forecast
              
          
    Forecast

One-step-ahead predictions for timeseries using exponential smoothing
Forecast threads a smooth curve through a noisy timeseries in a way that lets you
visualize trends, cycles, and anomalies. It can be used as part of an automatic anomaly
detection system for metric timeseries (that is, sequences of timestamped numerical values).
It accomplishes this using a variation of Holt-Winters forecasting -- more generally known as exponential smoothing. Forecast decomposes a noisy signal into level, trend, repetitive "seasonal" effects, and unexplained variation or noise. The result is a smoothed version of the signal which may be used to forecast future values or detect unexpected variation. This approach has been successfully used for network anomaly detection with other monitoring tools. Here we implement the meth

  
## README.md

      
              6 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                welch
                / README.md
            
            
              Last active
              August 29, 2015 14:19
            
              
                juttle forecast modules: seasonality
              
          
    Seasonality

In this example we fit seasonal curves to repeating daily and weekly patterns, and then subtract these patterns from the data. The remaining signal can be used as an input to a simple anomaly detector, without giving false-alarms for expected changes (like decreased levels over the weekend, or increases during daylight hours).

  
## trend.change.juttle
import 'https://gist.githubusercontent.com/welch/b18d75bba184c441253c/raw/6b5b56d4677e416ef4a085fa6004f6f001ed1af9/sources.juttle' as sources;
import 'https://gist.githubusercontent.com/welch/6f2053f2bfc7d9e7b4d9/raw/b1e46167d05c7e49233db4613437283dcb1a3c4e/trend.juttle' as trend;

const start = :2014-01-01: ;
const dt = :60s: ;

sources.ripple_cpu -from start -to start + :1h:
    | trend.change -in 'cpu' -dt dt -t0 start -out 'change'
    | split
    | @timechart -title "60s change" -display.dataDensity 0

## sources.juttle
// simulated sources for demos and tests
//
// Exported subs:
// bumpy_cpu: a 10-minute cpu metric with daily variation
// ripple_cpu: a 10-second cpu metric with minute-variation
//
// Exported functions:
//
// Exported reducers:
//

## trend.juttle
// model trends in a series via linear regression.
//
// Exported subs:
// fit: do a least-squares fit of a line to successive batches of points,
//      and calculate the trend's value at each point. A fit is computed for each batch.
//     ... | fit -in 'cpu' -every :2h: -over :8h: -t0 :2014-01-01: -out 'trend';
// fit_initial: do a least-squares fit of a line to an initial window of points,
//      and calculate this trend's value at subsequent points. A fit is computed once.
//     ... | fit_initial -in 'cpu' -over :2h: -t0 :2014-01-01: -out 'trend';
// fit_trailing: do a least-squares fit of a line to a moving window of points

## README.md

      
              5 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                welch
                / README.md
            
            
              Last active
              August 29, 2015 14:19
            
              
                juttle forecast modules: trends and rate of change
              
          
    Robust time derivatives

In this example we use trend estimation as a robust way to estimate the rate of change of a metric at a point in time.
The constant every controls the interval of data we use for each estimate, and the frequency at which we update the estimate. The trend module (which is more typically used over long intervals of time) is applied to this very short interval. The slope of the fitted trend is returned by trend.rate, and the trended change over an interval as trend.change (which is in the same units as the input field, often more convenient for alerting and display)
Try different durations for every to see its effect. The simulated cpu sends a new value every 10 seconds, so every should be at least :20s: so it has enough samples to fit a line to them. Longer windows will give smoother derivative curves.
In this example, a every between :45s: and :2m: does a good job of highlighting the big breaks in the cpu curve while ignoring the noise.

  
## points.json
      [
       { "source_type": "metric", "time": "2014-01-01T00:00:00.000Z", "name": "C750.<test:runid>.live", "space": "default", "value": 10 },
       { "source_type": "metric", "time": "2014-01-01T00:00:01.000Z", "name": "C750.<test:runid>.live", "space": "default", "value": 20 },
       { "source_type": "metric", "time": "2014-01-01T00:00:02.000Z", "name": "C750.<test:runid>.live", "space": "default", "value": 30 },
       { "source_type": "metric", "time": "2014-01-01T00:00:03.000Z", "name": "C750.<test:runid>.live", "space": "default", "value": 40 },
       { "source_type": "metric", "time": "2014-01-01T00:00:04.000Z", "name": "C750.<test:runid>.live", "space": "default", "value": 50 }
      ]

## whats-my-line.juttle
// 4-way right outer join of a point stream of ids against  tables of personal information.
//
// The points in the "tables" all have the same timestamp.
// For the join, the ID in each emitter point
// is matched against each table, and an output point is created that is the union of all
// matching points. This demonstrates partial joins when not all tables have an entry for
// an ID. There are no matches at all for ID 5, so that point is passed through unchanged.
//---------------------------------------------------------------------------
    const name = [
        {time:"1970-01-01T00:00:00.000Z", "id":1, "name":"fred"},

## twitter-anomaly.json
[
{"time": "1980-09-25T14:01:00Z", "count": 182.478},
{"time": "1980-09-25T14:02:00Z", "count": 176.231},
{"time": "1980-09-25T14:03:00Z", "count": 183.917},
{"time": "1980-09-25T14:04:00Z", "count": 177.798},
{"time": "1980-09-25T14:05:00Z", "count": 165.469},
{"time": "1980-09-25T14:06:00Z", "count": 181.878},
{"time": "1980-09-25T14:07:00Z", "count": 184.502},
{"time": "1980-09-25T14:08:00Z", "count": 183.303},
{"time": "1980-09-25T14:09:00Z", "count": 177.578},
	import 'https://gist.githubusercontent.com/welch/b18d75bba184c441253c/raw/6b5b56d4677e416ef4a085fa6004f6f001ed1af9/sources.juttle' as sources;
	import 'https://gist.githubusercontent.com/welch/6f2053f2bfc7d9e7b4d9/raw/b1e46167d05c7e49233db4613437283dcb1a3c4e/trend.juttle' as trend;

	const start = :2014-01-01: ;
	const dt = :60s: ;

	sources.ripple_cpu -from start -to start + :1h:
	\| trend.change -in 'cpu' -dt dt -t0 start -out 'change'
	\| split
	\| @timechart -title "60s change" -display.dataDensity 0
	// simulated sources for demos and tests
	//
	// Exported subs:
	// bumpy_cpu: a 10-minute cpu metric with daily variation
	// ripple_cpu: a 10-second cpu metric with minute-variation
	//
	// Exported functions:
	//
	// Exported reducers:
	//
	// model trends in a series via linear regression.
	//
	// Exported subs:
	// fit: do a least-squares fit of a line to successive batches of points,
	// and calculate the trend's value at each point. A fit is computed for each batch.
	// ... \| fit -in 'cpu' -every :2h: -over :8h: -t0 :2014-01-01: -out 'trend';
	// fit_initial: do a least-squares fit of a line to an initial window of points,
	// and calculate this trend's value at subsequent points. A fit is computed once.
	// ... \| fit_initial -in 'cpu' -over :2h: -t0 :2014-01-01: -out 'trend';
	// fit_trailing: do a least-squares fit of a line to a moving window of points
	[
	{ "source_type": "metric", "time": "2014-01-01T00:00:00.000Z", "name": "C750.<test:runid>.live", "space": "default", "value": 10 },
	{ "source_type": "metric", "time": "2014-01-01T00:00:01.000Z", "name": "C750.<test:runid>.live", "space": "default", "value": 20 },
	{ "source_type": "metric", "time": "2014-01-01T00:00:02.000Z", "name": "C750.<test:runid>.live", "space": "default", "value": 30 },
	{ "source_type": "metric", "time": "2014-01-01T00:00:03.000Z", "name": "C750.<test:runid>.live", "space": "default", "value": 40 },
	{ "source_type": "metric", "time": "2014-01-01T00:00:04.000Z", "name": "C750.<test:runid>.live", "space": "default", "value": 50 }
	]
	// 4-way right outer join of a point stream of ids against tables of personal information.
	//
	// The points in the "tables" all have the same timestamp.
	// For the join, the ID in each emitter point
	// is matched against each table, and an output point is created that is the union of all
	// matching points. This demonstrates partial joins when not all tables have an entry for
	// an ID. There are no matches at all for ID 5, so that point is passed through unchanged.
	//---------------------------------------------------------------------------
	const name = [
	{time:"1970-01-01T00:00:00.000Z", "id":1, "name":"fred"},
	[
	{"time": "1980-09-25T14:01:00Z", "count": 182.478},
	{"time": "1980-09-25T14:02:00Z", "count": 176.231},
	{"time": "1980-09-25T14:03:00Z", "count": 183.917},
	{"time": "1980-09-25T14:04:00Z", "count": 177.798},
	{"time": "1980-09-25T14:05:00Z", "count": 165.469},
	{"time": "1980-09-25T14:06:00Z", "count": 181.878},
	{"time": "1980-09-25T14:07:00Z", "count": 184.502},
	{"time": "1980-09-25T14:08:00Z", "count": 183.303},
	{"time": "1980-09-25T14:09:00Z", "count": 177.578},