Paraphraser/IOTstack-InfluxDB-tsi1.md

## IOTstack-InfluxDB-tsi1.md

      
    Raw
  

              IOTstack-InfluxDB-tsi1.md
            
          
    On IOTstack, InfluxDB 1.8 & INFLUXDB_DATA_INDEX_VERSION

Updated 2022-05-19

Additional observations since reverting to in-memory indexing.

the trigger question

On March 12 2022 I noticed a post on Discord by Ukkopahis saying:

I was thinking maybe INFLUXDB_DATA_INDEX_VERSION=tsi1 should be added as default? (if this is what you are talking about)

Until then I hadn't really given too much thought to the question "how much RAM is InfluxDB using?" The post sent me on a little voyage of discovery.
the environment

A few facts to set the scene:


My IOTstack platform is a 4GB Raspberry Pi 4 Model B Rev 1.1 running Raspbian GNU/Linux 10 (buster) from a 480GB SSD, as a 32-bit OS with 64-bit kernel. And, yes, I should upgrade!


df -H reports 3% utilisation of the SSD.


sudo du -sh IOTstack/volumes reports 1.1GB.


sudo du -sh IOTstack/volumes/influxdb reports 719MB.


The largest database (in terms of rows) is a grid-power logger which gains a new row every 10 seconds. The oldest entry is in April 2018 and it currently has 12.8 million rows.


The whole arrangement is standard no-frills MING:

sensors log via MQTT to Mosquitto
Node-RED subscribes to Mosquitto topics and formats for insertion into InfluxDB
Grafana displays charts based on what is stored in InfluxDB.


data acquisition

After using docker stats for a few days and paying attention to what happened to memory utilisation if I restarted the container, I set up this small shell script:
#!/usr/bin/env bash

date
docker stats --no-stream influxdb
and hooked it to a crontab entry firing it every hour:
0    */1  *    *    *    log_influx_ram >>./Logs/influx_ram.log 2>&1

data analysis

We're the better part of 2 months down the track so it's time for some analysis.

The chart is a bit busy so let me break it down:


The X axis is time. The Y axis is the "MEM %" value reported for InfluxDB by docker stats.


The shaded area marked "A" is the observed behaviour while the environment variable INFLUXDB_DATA_INDEX_VERSION was omitted. In other words, the default of in-memory indexing was in force. During that time, I would occasionally restart the container by hand. The typical pattern was memory utilisation slowly growing over time into the 8..10% range, falling back to the 1..2% range after a restart.


On March 29 I added INFLUXDB_DATA_INDEX_VERSION=tsi1. That is in force for the two areas marked "B" and "C".


Above the shaded areas are two time ranges:


Prior to April 24, I would occasionally restart the container by hand. You can see that memory climbs into the sub-20% area, falling back to the 1..2% range after any restart.


On April 24, I added the crontab entry:
 30   3    *    *    *    docker-compose -f ./IOTstack/docker-compose.yml restart influxdb >>./Logs/influx_ram.log 2>&1

That does a better job of keeping memory utilisation below about 7%, at least for the remainder of the area marked B.


On May 1st I tried to delete some extraneous data that had made its way into one of the databases, courtesy of insufficient care taken when debugging a sketch. Influx would not let me delete the series because I had a mixture of index types. A Discord post by Ukkopahis the next day included the hint:

This won't migrate existing shards

but at that time, I wasn't aware of this problem. There seem to be a few ways to migrate the shards. I just went with a script from IOTstackBackup:
$ iotstack_reload_influxdb
Thus, the area marked:


"A" is exclusively in-memory indexing;


"B" is a mixture of indexing types:

existing data continues to use in-memory indexing, while
newly-ingested data uses tsi1 indexing.


"C" is exclusively tsi1 indexing.


For the area marked "C", the daily cron-job restarting the Influx container is still firing but memory utilisation is never below about 12% while climbing into the low 20% range during the course of each day.


After publishing the first version of this gist on May 13, I reverted to in-memory indexing and reloaded the databases again. This is the area marked "D". The daily cron-job is still firing.


conclusions

What conclusions do I draw from this?


On the face of it, INFLUXDB_DATA_INDEX_VERSION=tsi1 results in worse memory utilisation. It looks to me like I'd be better off removing that option and doing another iotstack_reload_influxdb.


A daily restart via a cron-job certainly has the effect of keeping memory utilisation under control. I should keep that running.


My perception is that the traces in "D" are higher, on average, than the right hand end of "B". That perception is confirmed by a two-independent-sample t-test (equal variances). I don't quite know what to make of that but it's still clear, to me, that in-memory indexing with a daily kick-in-the-pants from cron is as good a way as any for keeping InfluxDB 1.8 memory utilisation down.


Does my experience generalise or is it likely to be something to do with the size/structure of my databases? Honestly, I have no idea.
But, getting back to the trigger question of whether INFLUXDB_DATA_INDEX_VERSION=tsi1 should be the IOTstack default, absent some other explanation of the behaviour I've discussed above, I'd be putting my vote in the box marked "no".