First of all, the major change was in the way to use the anomaly scores. Clearly we can see from the monitor's plots that anomalies do occur, but we don't use it in a pratical manner. The threshold line at 0.6 anomaly is there for visual purposes only, as it would catch too many events if we used it as an actual threshold. So a way to look at it is to seek patterns within the anomalies scores. The first thing I though was to use the frequency of anomalies beyond the threshold as a metric, as it is more probable that something is wrong if we get a lot of anomalous patterns in a short period of time. But the key here is the word probable, so why not use a probability distribution to estimate the likelihood of something really anomalous? The interesting thing is that the people at Grok also realized that the raw anomaly score is not a very good metric, as we can see in this excelent talk by Grok's engineer Subutai. As they released
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| FROM allanino/nupic | |
| # Clone Cerebro repository | |
| RUN git clone https://github.com/numenta/nupic.cerebro.git /usr/local/src/nupic.cerebro | |
| # Install dependencies | |
| # Install Mongo | |
| RUN \ | |
| apt-get install -y libevent-dev;\ | |
| apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 7F0CEB10;\ |
NewerOlder