joost-de-vries/prometheus-recording-rules.md

## prometheus-recording-rules.md

      
    Raw
  

              prometheus-recording-rules.md
            
          
    yaml, yaml everywhere!

For our services we make sure of course to record the right metrics in Prometheus. Which means creating recording rules for

7 services,
two environments,
from both a latency - and throughput perspective,
on service internals like: outward http, circuitbreakers, persistence (mongo, hikari connection pooling), logging, exceptions, and more
different percentiles and
intervals and windows

All possible combinations make for a total of 4600 lines of yaml per environment.

The Prometheus recording rules themselves being yaml. Embedded in turn in the yaml for a Kubernetes operator.
A DSL for yaml

The way we do that is with code like this. A DSL for generating configuration yaml called jSonnet.
[
  common.prometheus.rule(
    kind='Prometheus/Rule',
    name='foobar-prometheus-rules-latency',
    groups=[
      common.prometheus.group(
        name=std.join('_', ['latency', r.interval]),
        interval=r.interval,
        rules=[
          recording.prometheus.rules.recording(
            record=std.join(':', ['job', 'latency', q.name, p, w]),
            expr=q.expr % [p, a, w]
          )
          for q in queries.latency
          for p in percentiles
          for a in apps
          for w in r.windows
        ]
      )
      for r in ranges
    ]
  ),
Producing yaml like the following:
# yaml-language-server: $schema=https://json.schemastore.org/prometheus.rules.json
groups:
  - name: latency_1m
    interval: 1m
    rules:
      - record: 'job:latency:http:90:1m'
        expr: >-
          histogram_quantile(0.90,
          sum(rate(http_server_requests_seconds_bucket{job="abcService",
          uri!~".*/(test|admin|internal|\*\*).*|root"}[1m])) by (le, job,
          service, uri, method, status))
      - record: 'job:latency:http:90:5m'
        expr: >-
          histogram_quantile(0.90,
          sum(rate(http_server_requests_seconds_bucket{job="abcService",
          uri!~".*/(test|admin|internal|\*\*).*|root"}[5m])) by (le, job,
          service, uri, method, status))
I had to extend our code. So I took a bit of time to understand it. I ran the generate.sh script. Which unexpectedly took 1:40 min. And I inspected the result to understand the meaning of the code.

The for part looks a lot like list comprehensions in Python. Which are a bit like set definitions in highschool math. The result is a list of values for all the combinations of queries, percentiles, apps etc.

And the % is C like string formatting.
Deno and generator functions

Looking at those list comprehensions I had a feeling that Typescript generator functions could probably do something similar.

And I've been wanting to try out Deno since it sounds promising for scripting kind of programming. I.e. stuff for which it's overkill to setup a whole build file and source file directory structure. And that starts instantly. Both not things that JVM based programming, which I normally use a lot, is well known for.
In an evening I ported the logic from the yaml DSL to Deno and Typescript. Experiencing the joy of programming again: trying out some new tools for a well defined, isolated problem. Without getting bogged down in side issues.

And the tools just worked!

Reminding me of what drew me to programming when I was in highschool. :-)
I implemented some pure Typescript functions that take parametrized recording rules and the lists of services, percentiles, ranges and windows and produce an iterator of concrete recording rule values. Using generator functions to create JS object values for all possible combinations. I.e. for the cartesian product.
function* rule(
  groupName: GroupName,
  queries: Query[],
  percentiles: Percentile[],
  apps: string[],
  windows: TimeFrame[],
): Generator<RecordingRule> {
  for (const query of queries) {
    for (const percentile of percentiles) {
      for (const app of apps) {
        for (const window of windows) {
          yield query.recordingRule(groupName, app, percentile, window)
        }
      }
    }
  }
  return
}
The function* denotes a generator function. And yield is where the invocation will return a value. Continuing from there on the next invocation. Lazily returning a RecordingRule on every invocation. Very much like an iterator.

Looks really obvious, right? I like that.

The generator you can easily turn into an array:
[...rule(groupName, queries, percentiles, apps, range.windows)]
Typescript template strings are used to parametrize the rule expressions which gives a bit more control than the C style formatting.
  latencyQuery({
    name: "http",
    expr: (percentile, app, window) =>
      `histogram_quantile(0.${percentile}, sum(rate(http_server_requests_seconds_bucket{job="${app}", uri!~".*/(test|admin|internal|\\*\\*).*|root"}[${window}])) by (le, job, service, uri, method, status))`,
  })
Using Deno and Typescript generator functions worked out very well. Of which more below. And generating the files took 0.2 second. As you'd expect.
You can find the code here
As I started to do this 'lab journal' write up of my findings I decided to lookup the DSL lib. Read a bit of documentation. Turns out it's not at all the little hobby project I thought it might be given the slowness. It's called jSonnet. And it was created by Google. And they use it a lot to manage yaml configuration files. Of which I'm sure Google has many.

If I'd known this I perhaps wouldn't have thought "Let's see if I can replace this in an evening of fiddling". :-)
jSonnet

jSonnet is a proper programming language.

json: It is a super set of json and it generates json. In that sense it reminds me of Lisp.
functional: It's a pure functional language. Free of side effects. Which fits the use case of generating configurations well.
array comprehensions: the array comprehensions mentioned above are indeed for creating cartesion products of output values. They mention how it's semantically equivalent to flatMap calls as I expected. And thus f.i. equivalent to Scala for comprehensions etc.
configuration files: it's geared for Kubernetes style configuration files

Turns out my alternative implementation has similar characteristics.

json: Typescript is javascript at runtime. So the JS objects my code produces are obviously equivalent to Json. It's nice when your generator logic is semantically close or identical to the output.
functional: for this kind of thing pure functions are just easier to understand. Thinking in transformations instead of processing steps.
array comprehensions: Generator functions take the imperative for loop that every programmer intuïtively understands. Give it a little twist, and make it a tool for side effect free transformations. Easily creating cartesion products logic that everybody understands. Of course there are libraries that offer some form of do notation or list comprehensions for Typescript. Instead I chose to stick with generator functions and the iterators they produce because it's more idiomatic Typescript. So should be accessible for every programmer.

Wrapping up

I really enjoyed working with Deno

It starts fast: coming from JVM programming it's such a luxury to have your code executed in sub second.

It's Typescript first. And Typescript is a great language. A joy to program in.

No build file and directory structure required. Just run the code. No horrible webpack build files. Or maven.xml. No convoluted directory structures. Even tsconfig.json is not required. Things that would distract from what I wanted to achieve.

Top level async calls. Essential for scripting. No 'on error' callbacks and such.

No npm 'node_modules' directory. Dependencies are urls to the Typescript files to import. That took a bit of getting used to. It's definitely nice and simple for small things.

Can be compiled to an executable. Which is great for use in infrastructure. No dependencies that have to be installed first. The executable size is in the tens of megabytes.
jSonnet vs my bit of code

jSonnet is data. Which my code is not. From the docs I understand that Google envisions applications using jSonnet configurations directly. Similar to HOCON for example. That's not something you'd do with my code.

jSonnet is a DSL. So people will have to learn the specific semantics of the DSL. While the meaning of Typescript code will be known beforehand by most developers. That's a standard tradeoff for external DSLs.

My code can be published and consumed as a library. Where jSonnet code easily ends up being copied and modified.

jSonnet is not statically typed.

jSonnet being a DSL you need a IDE plugin. Which gives syntax highlighting and code navigation. But obviously Typescript offers a much bigger tooling eco system. Which is handy for things like debugging.

I can't explain why the jSonnet code is so slow. 500 times slower than my naive implementation. Apparently the poor performance of jSonnet is well known. Because at Databricks they created an alternative JVM based implementation.