Skip to content

Instantly share code, notes, and snippets.

@Paraphraser
Last active February 2, 2023 03:32
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Paraphraser/f055815a6f26cb4e66e58138fa3fe4ae to your computer and use it in GitHub Desktop.
Save Paraphraser/f055815a6f26cb4e66e58138fa3fe4ae to your computer and use it in GitHub Desktop.
On IOTstack, InfluxDB 1.8 & `INFLUXDB_DATA_INDEX_VERSION`

On IOTstack, InfluxDB 1.8 & INFLUXDB_DATA_INDEX_VERSION

Updated 2022-05-19

  • Additional observations since reverting to in-memory indexing.

the trigger question

On March 12 2022 I noticed a post on Discord by Ukkopahis saying:

I was thinking maybe INFLUXDB_DATA_INDEX_VERSION=tsi1 should be added as default? (if this is what you are talking about)

Until then I hadn't really given too much thought to the question "how much RAM is InfluxDB using?" The post sent me on a little voyage of discovery.

the environment

A few facts to set the scene:

  1. My IOTstack platform is a 4GB Raspberry Pi 4 Model B Rev 1.1 running Raspbian GNU/Linux 10 (buster) from a 480GB SSD, as a 32-bit OS with 64-bit kernel. And, yes, I should upgrade!

  2. df -H reports 3% utilisation of the SSD.

  3. sudo du -sh IOTstack/volumes reports 1.1GB.

  4. sudo du -sh IOTstack/volumes/influxdb reports 719MB.

  5. The largest database (in terms of rows) is a grid-power logger which gains a new row every 10 seconds. The oldest entry is in April 2018 and it currently has 12.8 million rows.

  6. The whole arrangement is standard no-frills MING:

    • sensors log via MQTT to Mosquitto
    • Node-RED subscribes to Mosquitto topics and formats for insertion into InfluxDB
    • Grafana displays charts based on what is stored in InfluxDB.

data acquisition

After using docker stats for a few days and paying attention to what happened to memory utilisation if I restarted the container, I set up this small shell script:

#!/usr/bin/env bash

date
docker stats --no-stream influxdb

and hooked it to a crontab entry firing it every hour:

0    */1  *    *    *    log_influx_ram >>./Logs/influx_ram.log 2>&1

data analysis

We're the better part of 2 months down the track so it's time for some analysis.

InfluxDB 1.8 Memory Utilisation

The chart is a bit busy so let me break it down:

  1. The X axis is time. The Y axis is the "MEM %" value reported for InfluxDB by docker stats.

  2. The shaded area marked "A" is the observed behaviour while the environment variable INFLUXDB_DATA_INDEX_VERSION was omitted. In other words, the default of in-memory indexing was in force. During that time, I would occasionally restart the container by hand. The typical pattern was memory utilisation slowly growing over time into the 8..10% range, falling back to the 1..2% range after a restart.

  3. On March 29 I added INFLUXDB_DATA_INDEX_VERSION=tsi1. That is in force for the two areas marked "B" and "C".

  4. Above the shaded areas are two time ranges:

    • Prior to April 24, I would occasionally restart the container by hand. You can see that memory climbs into the sub-20% area, falling back to the 1..2% range after any restart.

    • On April 24, I added the crontab entry:

       30   3    *    *    *    docker-compose -f ./IOTstack/docker-compose.yml restart influxdb >>./Logs/influx_ram.log 2>&1
      

      That does a better job of keeping memory utilisation below about 7%, at least for the remainder of the area marked B.

  5. On May 1st I tried to delete some extraneous data that had made its way into one of the databases, courtesy of insufficient care taken when debugging a sketch. Influx would not let me delete the series because I had a mixture of index types. A Discord post by Ukkopahis the next day included the hint:

    This won't migrate existing shards

    but at that time, I wasn't aware of this problem. There seem to be a few ways to migrate the shards. I just went with a script from IOTstackBackup:

    $ iotstack_reload_influxdb

    Thus, the area marked:

    • "A" is exclusively in-memory indexing;

    • "B" is a mixture of indexing types:

      • existing data continues to use in-memory indexing, while
      • newly-ingested data uses tsi1 indexing.
    • "C" is exclusively tsi1 indexing.

  6. For the area marked "C", the daily cron-job restarting the Influx container is still firing but memory utilisation is never below about 12% while climbing into the low 20% range during the course of each day.

  7. After publishing the first version of this gist on May 13, I reverted to in-memory indexing and reloaded the databases again. This is the area marked "D". The daily cron-job is still firing.

conclusions

What conclusions do I draw from this?

  1. On the face of it, INFLUXDB_DATA_INDEX_VERSION=tsi1 results in worse memory utilisation. It looks to me like I'd be better off removing that option and doing another iotstack_reload_influxdb.

  2. A daily restart via a cron-job certainly has the effect of keeping memory utilisation under control. I should keep that running.

  3. My perception is that the traces in "D" are higher, on average, than the right hand end of "B". That perception is confirmed by a two-independent-sample t-test (equal variances). I don't quite know what to make of that but it's still clear, to me, that in-memory indexing with a daily kick-in-the-pants from cron is as good a way as any for keeping InfluxDB 1.8 memory utilisation down.

Does my experience generalise or is it likely to be something to do with the size/structure of my databases? Honestly, I have no idea.

But, getting back to the trigger question of whether INFLUXDB_DATA_INDEX_VERSION=tsi1 should be the IOTstack default, absent some other explanation of the behaviour I've discussed above, I'd be putting my vote in the box marked "no".

@Paraphraser
Copy link
Author

Since writing the above, I have done the following:

  1. Upgraded the Pi to an 8GB Raspberry Pi 4 model B. This was for no reason other than I needed a new Pi, the supplier only had 8GB models, and I was (just) able to afford the asking price. I thought I may as well press it into service as my primary IoT host.
  2. Kept a general eye on InfluxDB as new updates are pulled down from DockerHub to see if memory utilisation is affected.
  3. Turned the cron job that does a daily restart of the InfluxDB container on and off a couple of times to see what happens.

InfluxDB RAM utilisation

The graph is divided into three sections:

  • red highlight: The daily cron reset is not operating.
  • green highlight: The daily cron reset is operating.

The general pattern for the entire graph is that RAM utilisation grows until some external event causes it to stop. External events include:

  • a new version of InfluxDB comes down on a "pull" so the container is recreated on the subsequent "up";
  • a sudo apt upgrade does something like update container.io which causes all containers to restart; or
  • restarting the container manually for some reason (rare).

The only real difference is that, for the green area, cron is triggering a restart every 24 hours.

If you compare this graph with the earlier one, keep in mind that the system generating the statistics for the old graph had 4GB RAM whereas the system behind this graph is 8GB. In essence, you need to adjust by a factor of 2. For example, the two peaks in the left-most red area are approaching 12%. That would correspond with ~24% for the earlier graph. And, indeed, you will see numbers of that order in the earlier graph.

In neither case (earlier chart, this chart) was InfluxDB running any continuous queries. The container is only:

  1. Ingesting data (the maximum arrival rate is 1 row per 10 seconds);
  2. Responding to queries from Grafana (on-demand dashboards, no permanent kiosk displays); and
  3. Being asked to create a portable backup twice a day.

I think the most likely explanation is a memory leak somewhere. I could understand Influx running its own cache rather than relying on VM but I would expect RAM utilisation to climb to some level where there was a balance between its normal activities and demands on the cache, and it would plateau. The constant growth unless some external event breaks the cycle is what strongly suggests a leak. At least, it does to me.

@Paraphraser
Copy link
Author

For anyone who wants to try a daily reset themselves:

$ mkdir ~/Logs

Then use crontab -e and add:

# restart influxdb container to try to keep RAM low
3    3    *    *    *    docker restart influxdb >>./Logs/restart_influxdb.log 2>&1

I trigger it at 03:03 each day (to avoid hitting something that occurs on the hour). Pick a time that suits you.

If you don't want the log (which, unless there's a fault, only ever contains the word "influxdb"), send it to /dev/null.

@Paraphraser
Copy link
Author

And, to complete the picture, in case anyone wants to construct their own graphs, I use this script (named log_influx_ram):

#!/usr/bin/env bash

TIMESTAMP=$(date "+%d/%m/%Y %H:%M:%S")
STATISTIC=$(docker stats --no-stream influxdb)
MEMPC=$(echo $STATISTIC | cut -d " " -f 23)
echo "\"$TIMESTAMP\",$MEMPC"

Note:

  • getting RAM figures out of docker stats depends on adding the following to your /boot/cmdline.txt (reboot needed):

     cgroup_memory=1 cgroup_enable=memory
    

The script writes to stdout. You can test it by just running it:

$ log_influx_ram
"02/02/2023 14:25:54",2.01%

I drive the script with this crontab entry that fires every hour on the hour:

# log influxdb ram usage every hour
0    */1  *    *    *    log_influx_ram >>./Logs/influx_ram.csv 2>&1

The script is in my PATH but you could also use an absolute path.

Let it run for a few days then open the CSV in something like Excel and away you go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment