A proposal for a new, simplified alarm evaluation engine for Monasca. It builds off the existing aggregation engine (https://github.com/monasca/monasca-aggregator) and allows for a simplified implementation of a thresholding engine that shares syntax with the query language.
It would involve two or three new components:
- transformation engine: evaluates rules stored in a database
- rules can be dynamically created
- evaluates rules of two forms:
function(timeseries, constant...)
- e.g.
delta(metric{foo=bar})
- e.g.
function(timeseries_a, timeseries_b)
- e.g.
metric{foo=bar} > metric{foo=baz}
- e.g.
metric{foo=bar} / metric{foo=baz}
- e.g.
- each rule executes a single function
- each rule outputs a single time series
- could be implemented as an extension or modification to the aggregation engine
- alarm evaluation engine: triggers events when a metric meets a condition
- only one form:
function(timeseries, constant)
- more complex rules will be decomposed into the supported form as needed
- only one form:
- expression compiler: complex expressions are compiled and decomposed into transformation and alarm rules at creation time
sum(delta(request_total_time{app="ms-api-api"}[1h])) by (path, method)
/
sum(delta(request_count{app="ms-api-api"}[1h])) by (path, metohhod)
> 5.0
as a prometheus expression
The complex expression will be decomposed into 5 transformation rules and 1 alarming rule.
temp1 = delta(request_total_time{app="ms-api-api"}[1h])
temp2 = sum(temp1) by (path, method)
temp3 = delta(request_count{app="ms-api-api"}[1h])
temp4 = sum(temp3) by (path, method)
temp5 = temp2 / temp4
if temp5 > 5.0:
alarm()
- Rules are implicitly evaluated in order
- Rules are implicitly deduplicated
- Potentially simplified clustering
- Can expect cost of each rule to be roughly equal
- Without changes to keying in the API all cluster members would need to see every metric
- puts upper bound on performance with
confluent_kafka
at ~250k metrics/sec
- puts upper bound on performance with
- Builds on existing aggregation engine. New requirements would include:
- Dynamic rules
- New functions, particularly of the
function(timeseries_a, timeseries_b)
form
- Expression compiler would probably be part of the API