Updated 2022-05-19
- Additional observations since reverting to in-memory indexing.
On March 12 2022 I noticed a post on Discord by Ukkopahis saying:
I was thinking maybe
INFLUXDB_DATA_INDEX_VERSION=tsi1
should be added as default? (if this is what you are talking about)
Until then I hadn't really given too much thought to the question "how much RAM is InfluxDB using?" The post sent me on a little voyage of discovery.
A few facts to set the scene:
-
My IOTstack platform is a 4GB Raspberry Pi 4 Model B Rev 1.1 running Raspbian GNU/Linux 10 (buster) from a 480GB SSD, as a 32-bit OS with 64-bit kernel. And, yes, I should upgrade!
-
df -H
reports 3% utilisation of the SSD. -
sudo du -sh IOTstack/volumes
reports 1.1GB. -
sudo du -sh IOTstack/volumes/influxdb
reports 719MB. -
The largest database (in terms of rows) is a grid-power logger which gains a new row every 10 seconds. The oldest entry is in April 2018 and it currently has 12.8 million rows.
-
The whole arrangement is standard no-frills MING:
- sensors log via MQTT to Mosquitto
- Node-RED subscribes to Mosquitto topics and formats for insertion into InfluxDB
- Grafana displays charts based on what is stored in InfluxDB.
After using docker stats
for a few days and paying attention to what happened to memory utilisation if I restarted the container, I set up this small shell script:
#!/usr/bin/env bash
date
docker stats --no-stream influxdb
and hooked it to a crontab entry firing it every hour:
0 */1 * * * log_influx_ram >>./Logs/influx_ram.log 2>&1
We're the better part of 2 months down the track so it's time for some analysis.
The chart is a bit busy so let me break it down:
-
The X axis is time. The Y axis is the "MEM %" value reported for InfluxDB by
docker stats
. -
The shaded area marked "A" is the observed behaviour while the environment variable
INFLUXDB_DATA_INDEX_VERSION
was omitted. In other words, the default of in-memory indexing was in force. During that time, I would occasionally restart the container by hand. The typical pattern was memory utilisation slowly growing over time into the 8..10% range, falling back to the 1..2% range after a restart. -
On March 29 I added
INFLUXDB_DATA_INDEX_VERSION=tsi1
. That is in force for the two areas marked "B" and "C". -
Above the shaded areas are two time ranges:
-
Prior to April 24, I would occasionally restart the container by hand. You can see that memory climbs into the sub-20% area, falling back to the 1..2% range after any restart.
-
On April 24, I added the crontab entry:
30 3 * * * docker-compose -f ./IOTstack/docker-compose.yml restart influxdb >>./Logs/influx_ram.log 2>&1
That does a better job of keeping memory utilisation below about 7%, at least for the remainder of the area marked B.
-
-
On May 1st I tried to delete some extraneous data that had made its way into one of the databases, courtesy of insufficient care taken when debugging a sketch. Influx would not let me delete the series because I had a mixture of index types. A Discord post by Ukkopahis the next day included the hint:
This won't migrate existing shards
but at that time, I wasn't aware of this problem. There seem to be a few ways to migrate the shards. I just went with a script from IOTstackBackup:
$ iotstack_reload_influxdb
Thus, the area marked:
-
"A" is exclusively in-memory indexing;
-
"B" is a mixture of indexing types:
- existing data continues to use in-memory indexing, while
- newly-ingested data uses tsi1 indexing.
-
"C" is exclusively tsi1 indexing.
-
-
For the area marked "C", the daily cron-job restarting the Influx container is still firing but memory utilisation is never below about 12% while climbing into the low 20% range during the course of each day.
-
After publishing the first version of this gist on May 13, I reverted to in-memory indexing and reloaded the databases again. This is the area marked "D". The daily cron-job is still firing.
What conclusions do I draw from this?
-
On the face of it,
INFLUXDB_DATA_INDEX_VERSION=tsi1
results in worse memory utilisation. It looks to me like I'd be better off removing that option and doing anotheriotstack_reload_influxdb
. -
A daily restart via a cron-job certainly has the effect of keeping memory utilisation under control. I should keep that running.
-
My perception is that the traces in "D" are higher, on average, than the right hand end of "B". That perception is confirmed by a two-independent-sample t-test (equal variances). I don't quite know what to make of that but it's still clear, to me, that in-memory indexing with a daily kick-in-the-pants from cron is as good a way as any for keeping InfluxDB 1.8 memory utilisation down.
Does my experience generalise or is it likely to be something to do with the size/structure of my databases? Honestly, I have no idea.
But, getting back to the trigger question of whether INFLUXDB_DATA_INDEX_VERSION=tsi1
should be the IOTstack default, absent some other explanation of the behaviour I've discussed above, I'd be putting my vote in the box marked "no".
Since writing the above, I have done the following:
cron
job that does a daily restart of the InfluxDB container on and off a couple of times to see what happens.The graph is divided into three sections:
cron
reset is not operating.cron
reset is operating.The general pattern for the entire graph is that RAM utilisation grows until some external event causes it to stop. External events include:
sudo apt upgrade
does something like updatecontainer.io
which causes all containers to restart; orThe only real difference is that, for the green area,
cron
is triggering a restart every 24 hours.If you compare this graph with the earlier one, keep in mind that the system generating the statistics for the old graph had 4GB RAM whereas the system behind this graph is 8GB. In essence, you need to adjust by a factor of 2. For example, the two peaks in the left-most red area are approaching 12%. That would correspond with ~24% for the earlier graph. And, indeed, you will see numbers of that order in the earlier graph.
In neither case (earlier chart, this chart) was InfluxDB running any continuous queries. The container is only:
I think the most likely explanation is a memory leak somewhere. I could understand Influx running its own cache rather than relying on VM but I would expect RAM utilisation to climb to some level where there was a balance between its normal activities and demands on the cache, and it would plateau. The constant growth unless some external event breaks the cycle is what strongly suggests a leak. At least, it does to me.