Apiman metrics can collect arbitrary headers (and query params) as per user configuration (once the relevant PR is merged, anyway!).
There are also a bunch of standard metrics that are always collected, so those are simple (i.e. always non-null columns)
For example, let's imagine the user decides they want to capture the X-Request-Id
header in the request and response, and X-Foo
in only the request.
In standard InfluxDB/QuestDB you can just dynamically add and remove columns for each one with some reasonable name.
Now they get something like:
headers.request.foo | headers.request.x-request-id | headers.response.x-request-id | lots of other fixed columns |
---|---|---|---|
Hello | Bazzy | Foo | (not important for this) |
Hello2 | Something | Great! | ... |
At some future point, the user decides they want to stop capturing x-request-id
, but continue to capture x-foo
.
Now (AIUI) for future requests you end up in the database with...
headers.request.foo | headers.request.x-request-id | headers.response.x-request-id | lots of other fixed columns |
---|---|---|---|
Hello3 | NULL | NULL | |
Hello4 | NULL | NULL |
Another approach -- although it appears it is not supported from what I can understand from the QuestDB docs -- is that you are allowed to have a list of tuples as a value in a column; it is a kind of correlated sub-document, or reference to another table (not sure what the correct name is in QuestDB parlance). I don't know if this is allowed in InfluxDB either - but wanted to know if it's allowed in QuestDB and if it's a good idea.
So you end up with something like ....
headers.request | headers.response | lots of other fixed columns |
---|---|---|
[(x-request-id, Bazzy), (x-foo, Hello)] |
[(x-request-id, Foo)] |
(not important for this |
And when you decide to stop capturing X-Request-Id
in the future, it's just something like this.
headers.request | headers.response | lots of other fixed columns |
---|---|---|
[(x-foo, Hello)] |
[] |
(not important for this) |
Which avoids all of the null columns.
From the perspective of Apiman writing into the database, both are equally easy -- but I wanted to know whether one particular approach or another is better for users of that data (and the DB itself) by avoiding all of the null columns when dynamically growing and shrinking the fields captured.
(excluded types, etc, to keep noise down)
What about deformalizing storage altogether? E.g. store something like:
?
You can lean on JIT to search for things quickly and minimize storage.