Skip to content

Instantly share code, notes, and snippets.

@Paraphraser
Last active February 21, 2024 21:54
Show Gist options
  • Star 45 You must be signed in to star a gist
  • Fork 5 You must be signed in to fork a gist
  • Save Paraphraser/c9db25d131dd4c09848ffb353b69038f to your computer and use it in GitHub Desktop.
Save Paraphraser/c9db25d131dd4c09848ffb353b69038f to your computer and use it in GitHub Desktop.
Efficient recipe for getting MQTT data into an InfluxDB database using Node-Red

Recipe: from MQTT to InfluxDB via Node-Red

  • 2023-12-02 revise graphics to correspond with InfluxDB-in node, and explain the pros and cons of InfluxDB 1.8 vs InfluxDB 2.

Introduction

Getting data produced by IoT sensors into a database is practically a mandatory step before effective visualisation (eg dashboards).

This recipe shows you how to get an MQTT payload into an InfluxDB database using three Node-Red nodes. It makes the following assumptions:

  • A client device of some kind publishing data to a topic via the MQTT protocol;
  • Mosquitto (MQTT broker);
  • Node-Red subscribing to the topic; and
  • InfluxDB running and accessible to Node-Red.

This recipe also show how to map between JSON keys and database field and tag names.

Some words about InfluxDB

At the time of writing (December 2023), InfluxData supports both InfluxDB 1.8 and InfluxDB 2.

InfluxDB 3 has been announced but is not yet generally available.

If you are beginning your IoT journey, you may wonder which version you should choose. Absent useful guidance, you may well reason like this:

Version 2.0 is obviously later than 1.8 therefore 2.0 must be "better".

In some ways, InfluxDB 2 can be seen as a natural progression from InfluxDB 1.8. In other ways, the two database systems are like chalk and cheese. From a user perspective, one of the most obvious differences between the two systems is the query language:

  • InfluxDB 1.8 uses InfluxQL natively with Flux being an experimental add-on which needs to be enabled;
  • InfluxDB 2 uses Flux natively with partial support for InfluxQL.

One consequence of the difference in query language affects Grafana. Its GUI has native point-and-click support for constructing InfluxQL queries that will run against InfluxDB 1.8 but has no equivalent support for constructing Flux queries that run against InfluxDB 2.

There is a workaround. It involves using the InfluxDB 2 web GUI to construct queries in Flux, which you then copy and paste into Grafana. This definitely works but the lack of native support for Flux in Grafana means that, if anything goes wrong, you often have to start from scratch.

This workaround may seem like an acceptable compromise but, eventually, you will encounter situations where you will want to manipulate the data in your databases. Common examples are:

  • deleting junk that crept in when a sensor went berserk; and
  • reformatting, splitting or merging tables to better reflect how your sensors operate and/or to fix poor design decisions (we all make those).

To manipulate your data, you will need to use the query language.

If you are familiar with SQL then you will likely be comfortable with InfluxQL. While InfluxQL is not the same as SQL and does have its own peculiarities, it has sufficient syntactic similarities with SQL that you can usually figure out how to make it do what you need it to do. Plus, if you get stuck, you can generally find a close example by Googling and go from there.

Flux, on the other hand, is its own language which … how can I put this politely … defies description. In my experience the official documentation is quite short on useful examples and Googling mostly fares no better.

To put this problem more succinctly: you are far more likely to be able to get help with InfluxQL than you are with Flux. Whether that matters is something only you will know.

The last point is that InfluxData has announced InfluxDB 3. The announcement includes:

  1. This statement which appears in a popup overlay in the Flux documentation:

    The future of Flux

    Flux is going into maintenance mode. You can continue using it as you currently are without any changes to your code.

  2. This statement which appears in the InfluxDB Clustered documentation:

    … brings with it native SQL support and improved InfluxQL performance.

I take those two statements to mean that Flux is a technological dead-end and, because of its dependence on Flux, so too is InfluxDB 2.

If you have a good reason for preferring InfluxDB 2 then it's your system and your rules. But please don't say you weren't warned.

This remainder of this gist focuses on InfluxDB 1.8 exclusively.

Task goal

Given this MQTT message structure:

  topic: /site/topic
message: {"b": true, "i": 123, "r": 456.78, "s": "hello world", "t": "tagValue"}

the task is to configure the Node-Red instance to:

  1. Listen for the topic "/site/topic";
  2. Parse the JSON-format message payload; and
  3. Add a row to a measurement (table) named "example" in an InfluxDB database named "test".

This recipe also demonstrates how to use abbreviated keys in a JSON message and map those to more meaningful field names (attributes) in the InfluxDB database:

JSON Key Influx Field Influx Tag Expected Value Implied type
b flag true Boolean
i discrete 123 Integer
r continuous 456.78 Real
s message "hello world" String
t identity "tagValue" String

There is nothing magical about either these JSON keys, or the InfluxDB field or tag names. JSON keys do not have to be single characters, and InfluxDB names do not have to be long, meaningful or descriptive. It is just that, in general, IoT devices are memory-constrained so short strings are to be preferred over longer strings; while "meaningful and descriptive" are considered best practice in a database environment.

There is also nothing magical about the choice of these implied data types. Booleans, numbers (both integers and reals) and strings are just what you usually need to pass between a client device and a database.

Preparation

client device (simulated)

During testing you will find it useful to have a computer with an MQTT client installed. Explaining the full how-to of this is beyond the scope of this recipe so you should start at mosquitto.org/download.

On Linux, it is not immediately obvious that you need either or both of the following:

$ sudo apt install mosquitto-clients
$ sudo apt install mosquitto

The first command installs the MQTT publishing and subscribing command-line clients. You will need this package to get the mosquitto_pub command.

The second command installs the MQTT broker. You will only need this if you do not have another MQTT broker running somewhere (eg in a Docker container).

InfluxDB

If your copy of InfluxDB is running inside a Docker container, consider adding this alias statement to your .profile or .bashrc:

$ alias influx='docker exec -it influxdb influx -precision=rfc3339'

That alias allows you to connect to the "influx" command line interface (CLI) simply by typing:

$ influx

By default, "influx" displays time as nanoseconds since 1970-01-01 UTC. The -precision=rfc3339 argument displays time in human-readable form.

The "test" database must be created by hand. If you omit this step you will get an error from Node-Red. You can initialise the database like this:

$ influx
> CREATE DATABASE test
> quit

Node-Red

  • Launch your browser and connect to your Node-Red server.
  • Use the main menu (three horizontal bars "≡" at the top, right of the Node-Red window) to open the Palette Manager:
    • select the "Nodes" tab and check whether node-red-contrib-influxdb is already installed. If it is not installed,
    • select the "Install" tab, then search for and install node-red-contrib-influxdb. If you prefer to install contributions from the command line, do that.

A three-node flow

Back in the Node-Red main window, click the "+" button to add a new empty flow. The default title will be something like "Flow 1". Double-click the title and rename the flow with something more descriptive, like "Influx Test". The actual name is not important.

1. "mqtt in" node

Drag an "mqtt in" node onto the canvas. Double-click to open and configure as follows:

mqtt-in-node configuration

  1. Select the option to add a new mqtt-broker.
  2. Click the pencil icon to open the server properties panel.
  3. Give the server a meaningful name (eg "Docker MQTT").
  4. Supply the network path to the host running Mosquitto:
    • In a Docker environment, this will be the name of the container running Mosquitto (eg "mosquitto").
    • In a non-Docker environment where Node-Red and Mosquitto are running on the same host, this will be the loopback address 127.0.0.1.
    • If Node-Red and Mosquitto are running on different hosts then this will be a static IP address or the fully-qualified domain name of the host running Mosquitto.
  5. Click "Add".
  6. Enter the topic string ("/site/topic").
  7. Set the "Output" popup to "a parsed JSON object".
  8. Enter a name for the node. This appears in the schematic and it is good practice to repeat the topic string.
  9. Click "Done" to complete the node setup.

All other fields can either be left at their default settings or changed to suit your requirements.

The purpose of this node is to:

  1. Listen to MQTT messages directed to "/site/topic"; and
  2. Convert the JSON string in the MQTT message body to a JavaScript object representation.

In other words, given an input of the JSON string specified in the task goal, the output from the node will be:

msg.payload = {
    b: true,
    i: 123,
    r: 456.78,
    s: "hello world"
    t: "tagValue"
}

tip - avoiding a common mistake

One common mistake is skipping step 7 above. Please go back and double-check that you have set the "Output" popup to "a parsed JSON object".

2. "change" node

Drag a "change" node onto the canvas.

Connect the outlet of the "mqtt in" node to the inlet of the "change" node.

Double-click the "change" node to open and configure as follows:

change-node configuration

  1. Enter a name for the node. This appears in the schematic and it is good practice to summarise the purpose of the node.

  2. A new change node contains a single rule to "Set msg.payload" but where the data type of the "to" field defaults to a string. Change the popup menu to a Java expression.

  3. Click the ellipsis ("…") to open the expression editor.

  4. Copy the expression below and paste it into this window.

    [
        {
            "flag": msg.payload.b,
            "discrete": msg.payload.i,
            "continuous": msg.payload.r,
            "message": msg.payload.s
        },{
            "identity": msg.payload.t
        }
    ]
    
  5. Click "Done".

  6. Click "Done" to complete the node setup.

The purpose of this node is to provide a cross-walk between the JSON keys ("b", "i", "r", "s" and "t"), and the field and tag names you need in the InfluxDB database. The basic pattern is:

[
   {
      fieldName : msg.payload.key,
      ...
   },{
      tagName : msg.payload.key,
      ...
   }
]

If you only want to pass fields, then omit the square brackets and the elements that describe the tag(s), like this:

{
   fieldName : msg.payload.key,
   ...
}

Note that it is feasible to omit this "change" node entirely. If you do that the JSON keys in the Node-Red "payload" variable will become the field names in the database. Before you take that shortcut, consider:

  • If an MQTT client starts providing unexpected JSON keys, those can easily pollute your InfluxDB database. Using a cross-walk between the expected JSON keys and the field and tag names you want in your database provides an effective barrier against unexpected keys.

  • You may wish to include keys in your MQTT payload that you do not want to wind up in your database. Good examples are millis() uptime, free heap and version numbers. Such values are usually ephemeral and only of interest at the moment when they are produced (and might only be produced when you are actively debugging your MQTT client). You can always see such values by subscribing to the MQTT feed or attaching a Debug node to the "mqtt in" node.

  • A "change" node simplifies the process of adding new tags and fields. You can:

    • Add a new key+value pair to the JSON payload being produced by your MQTT client, then
    • Attach a Debug node to the "mqtt in" node to confirm that you are receiving the expected data, then
    • Change the cross-walk when you are ready to start importing the data into your database.
  • If you want to include both tags and fields in your database, you really only have two options, either:

    • your MQTT client has to format the JSON payload correctly before transmission, or
    • you need a "change" node to implement a cross-walk.
  • Opting to do the work in your MQTT client effectively rules out the tactical use of ephemeral values and a step-wise refinement approach to development if you need to add new fields.

Given the output from the "mqtt in" node, the Javascript expression in the "change" node will result in:

msg.payload = [
   {
      flag: true,
      discrete: 123,
      continuous: 456.78,
      message: "hello world"
   },{
      identity: "tagValue"
   }
]

3. "influxdb out" node

Drag an "influxdb out" node onto the canvas.

Can't find the "influxdb out" node in the palette? Double-check that you installed "node-red-contrib-influxdb" as described above.

Connect the outlet of the "change" node to the inlet of the "influxdb out" node.

Double-click the "influxdb out" node to open and configure as follows:

influxdb-out-node configuration

  1. Enter a name for the node. This appears in the schematic. It is good practice to summarise the purpose of the node.
  2. From the "Server" popup menu, choose "Add new influxdb...".
  3. Click the pencil icon to open the server properties panel.
  4. I recommend leaving the "Name" field blank. If you do then the "Server" field in the previous panel will take on the "host:port/database" appearance shown at the end of the dotted line in the figure. You lose that valuable synopsis by supplying a name in this field.
  5. Set the Version popup to "1.x" (this gist does not cover InfluxDB 2; please see Some words about InfluxDB if you want to understand why).
  6. Supply the network path to the host running InfluxDB:
    • In a Docker environment, this will be the name of the container running InfluxDB (eg "influxdb").
    • In a non-Docker environment where Node-Red and InfluxDB are running on the same host, this will be the loopback address 127.0.0.1.
    • If Node-Red and InfluxDB are running on different hosts then this will be a static IP address or the fully-qualified domain name of the host running InfluxDB.
  7. The name of the InfluxDB database. This needs to be the same as you created earlier ("CREATE DATABASE test").
  8. Click "Add".
  9. Supply the name of the measurement you want to write to. This is analogous to a "table" in SQL parlance. The recommended name for this tutorial is "example".
  10. Click "Done" to complete the node setup.

All other fields can be left at their default settings or changed to suit your requirements.

Warning: InfluxDB database connections are global to Node-Red. Suppose you have an existing flow connected to the "test" database. When you create a new flow for a new database, it is very tempting to copy the "influxdb out" node from the old flow, paste it into the new flow, open it, click the pencil icon, and just change the database name. If you do that, you will break your old flow because it will refer to the new database. Always start from scratch by dragging a new "influxdb out" node onto the canvas.

Given the output from the "change" node, the practical effect of this node is:

$ influx
> USE test
> INSERT example,identity=tagValue flag=true,discrete=123,continuous=456.78,message="hello world"
> quit

Saving your work

Click the Deploy button near the top, right of the canvas.

Testing your work

Add two "debug" nodes to the canvas. Double-click each in turn and set its Output to "complete msg object". Connect the outlet of the "mqtt in" node to the first "debug" node, and the outlet of the "change" node to the second "debug" node. The final result should look something like this:

Node-Red test flow

Select the Debug panel (the controls below "Deploy").

Click "Deploy" to activate. Any errors will show up in the Debug panel.

Copy the following text to the clipboard then paste it into a text editor.

mosquitto_pub -h host -t '/site/topic' -m '{"b": true, "i": 123, "r": 456.78, "s": "hello world", "t": "tagValue"}'

Edit the "host" field to point to the server running your Mosquitto broker. This might be an IP address or a fully-qualified domain name.

Paste the text into a Terminal window on your client device and press return.

If all goes well, you will get two debug messages from Node-Red. The first is from the "mqtt in" node confirming receipt of the JSON payload:

▿ object
  topic: "/site/topic"
▿ payload: object
   b: true
   i: 123
   r: 456.78
   s: "hello world"
   t: "tagValue"

and the second is from the "change" node showing the effect of the cross-walk:

▿ object
  topic: "/site/topic"
▿ payload: array[2]
  ▿0: object
   flag: true
   discrete: 123
   continuous: 456.78
   message: "hello world"
  ▿1: object
   identity: "tagValue"

To confirm that the data made it all the way to the InfluxDB database:

$ influx
> USE test

> show measurements
name: measurements
name
----
example

> show series
key
---
example,identity=tagValue

> show tag keys
name: example
tagKey
------
identity

> show field keys
name: example
fieldKey   fieldType
--------   ---------
continuous float
discrete   float
flag       boolean
message    string

> select * from example
name: example
time                           continuous discrete flag identity message
----                           ---------- -------- ---- -------- -------
2020-02-12T03:56:07.844235334Z 456.78     123      true tagValue hello world

> quit

Cleaning up

You can either delete or deactivate the "debug" nodes in the Node-Red flow.

When you no longer need the test database, you can remove it like this:

$ influx
> DROP DATABASE test
> quit
@ignaciosala
Copy link

Hi Phill! Thanks again for your help! Everything work out perfect!!! This is the end result:

Screen Shot 2020-09-24 at 1 33 14 AM

@Paraphraser
Copy link
Author

Brilliant! Right now my Hiking (class 1 electricity meter) is in a box connected to a power point. It really only sees voltage and frequency because there's nothing connected on the output side, so that's all the ESP8266 project connected to it via an RS485 adapter sees. That project "works" but I had to add a lot of code to detect problems - which I ultimately traced to the RS485 adapter.

I have a duplicate meter that was installed in the house distribution board by an electrician which will see whole-of-house including whatever the solar system pushes back into the grid. I also have a new ESP32 project at breadboard stage sitting on my desk, using a new kind of RS485 adapter which is far more reliable. When I get some time ... I'll be able to follow your lead. 🤓

@dalklein
Copy link

dalklein commented Dec 8, 2020

Thank you Phill for writing up these topics. Not nearly done, but getting dangerous real fast! mqtt to nodered to influx to grafana is working! Thanks!
image

@syswombat
Copy link

Hello
i have a couple of question but the first "problem" i just get all the Message from 433MHz.
so i would like to use a Switch to tell unknown and known device
and the splitt the known device by ID. At this time would be just two known device
could you tell how to do this splitting?

have a nice day
vinc

@Paraphraser
Copy link
Author

I need to see some actual data. See the earlier discussion above about setting up the mosquitto clients and running the mosquitto_sub command to capture the MQTT packets arriving at your broker. Collect a couple of lines showing (a) the known device and (b) an unknown device, and I will see if I can work it out. I don't need pages of data. I only need one MQTT message from each device.

@syswombat
Copy link

syswombat commented Dec 18, 2020

Hello
thanks for your time

This are 3 example, both Oregon i know them, other not but still ok just to know it is there.
The time is 1h behind too.
The ID i am not sure where give this Number, but it will change as soon i have to change the battery - i just have tested it.

i would say a way to go

switch Message
case id = 126 go out 1
case id = 16 go out 2
case id = other go out 3

out 1 and 2 should then go to infulxDB
out 1 and 2 check Battery_Ok if 0 then send telegram

ok this time it worked to switch it

[{"id":"256b14f0.81a1dc","type":"mqtt in","z":"95ec0171.e3383","name":"","topic":"/iot12/433MHz","qos":"0","datatype":"json","broker":"fdd3c090.0e24","x":150,"y":300,"wires":[["92eda324.b8c89","c1745410.f9e998"]]},{"id":"c1745410.f9e998","type":"switch","z":"95ec0171.e3383","name":"switch by ID","property":"payload.id","propertyType":"msg","rules":[{"t":"eq","v":"246","vt":"str"},{"t":"eq","v":"126","vt":"str"},{"t":"else"}],"checkall":"true","repair":false,"outputs":3,"x":420,"y":320,"wires":[["db2d768a.165bc8"],["f5f2994b.a68ff8"],["d4ba2e5f.1a8cd"]]},{"id":"db2d768a.165bc8","type":"debug","z":"95ec0171.e3383","name":"","active":false,"tosidebar":true,"console":true,"tostatus":true,"complete":"payload","targetType":"msg","statusVal":"payload","statusType":"auto","x":700,"y":280,"wires":[]},{"id":"f5f2994b.a68ff8","type":"debug","z":"95ec0171.e3383","name":"","active":false,"tosidebar":true,"console":true,"tostatus":true,"complete":"payload","targetType":"msg","statusVal":"payload","statusType":"auto","x":700,"y":360,"wires":[]},{"id":"d4ba2e5f.1a8cd","type":"debug","z":"95ec0171.e3383","name":"","active":true,"tosidebar":true,"console":true,"tostatus":true,"complete":"payload","targetType":"msg","statusVal":"payload","statusType":"auto","x":700,"y":440,"wires":[]},{"id":"fdd3c090.0e24","type":"mqtt-broker","name":"localHost - iot1","broker":"10.18.42.175","port":"1883","clientid":"","usetls":false,"compatmode":true,"keepalive":"60","cleansession":true,"birthTopic":"","birthQos":"0","birthPayload":"","closeTopic":"","closeQos":"0","closePayload":"","willTopic":"","willQos":"0","willPayload":""}]

========================================
18.12.2020, 18:10:08 node: RAW msg.payload
/iot12/433MHz : msg.payload : Object
object
time: "2020-12-18 17:10:09"
brand: "OS"
model: "Oregon-THGR122N"
id: 126
channel: 2
battery_ok: 1
temperature_C: 6.8
humidity: 73


18.12.2020, 18:10:18node: RAW msg.payload
/iot12/433MHz : msg.payload : Object
object
time: "2020-12-18 17:10:19"
brand: "OS"
model: "Oregon-THGR122N"
id: 16
channel: 1
battery_ok: 1
temperature_C: 21.1
humidity: 40


18.12.2020, 18:12:16 node: RAW msg.payload
/iot12/433MHz : msg.payload : Object
object
time: "2020-12-18 17:12:16"
model: "TFA-Pool"
id: 142
channel: 1
battery_ok: 1
temperature_C: 8.1
mic: "CHECKSUM"


@Paraphraser
Copy link
Author

When I received the email copy of your last post, it didn't have your flow as part of it so I went ahead and created a test flow to your specs. I was about to paste that here, then I saw your flow so I pasted your flow into my Node-Red.

Our solutions were identical, save that you specified IDs 126 and 16, while your flow tests for 246 and 126. I mention that because I don't see 246 in your test data while I do see 16.

Oh, I edited your comment to wrap three back-ticks around your JSON. For future reference (not just if you post something here but elsewhere too), it makes flows more human-friendly if you do two things:

  1. When you export from Node-Red, turn on the "formatted" option in the "Export nodes" dialog. That gives you a well-formed output spanning a number of lines (instead of a single line that scrolls off to infinity); and
  2. Use the triple back-ticks before and after the paste so that markdown format makes it clear where the JSON begins and ends.

Here's your flow implementing those recommendations:

[
    {
        "id": "c1745410.f9e998",
        "type": "switch",
        "z": "a1af091f.5b352",
        "name": "switch by ID",
        "property": "payload.id",
        "propertyType": "msg",
        "rules": [
            {
                "t": "eq",
                "v": "246",
                "vt": "str"
            },
            {
                "t": "eq",
                "v": "126",
                "vt": "str"
            },
            {
                "t": "else"
            }
        ],
        "checkall": "true",
        "repair": false,
        "outputs": 3,
        "x": 430,
        "y": 260,
        "wires": [
            [
                "db2d768a.165bc8"
            ],
            [
                "f5f2994b.a68ff8"
            ],
            [
                "d4ba2e5f.1a8cd"
            ]
        ]
    }
]

Anyway, I'm glad that you solved the problem. Node-Red is a steep learning curve, isn't it? But it's fun too.

@syswombat
Copy link

Yes i am happy it start to work like a wish, but still many problems to solve ;-)
the ID change when changing the Battery on the device - i have tested it this is ther reason the ID change.

to bring it no to Telegram is another chalange...
on the video and description it looks easy but then just small things change and i am blocked!

have a nice day keep healty
vinc

@jabss
Copy link

jabss commented Jan 3, 2021

Hi,
Thanks for the excellent explanation.
I've just learned from your gist what a measurement is (I was using DBs as single-table).
I'm already able to insert information in the influxDB via node red, but I was wondering if there would be any best-practice do to it.

I'm having issues trying to create the grafana dashboards and therefore suspecting that maybe I'm inserting the values as string while maybe it should be as integer/floats, so this might the the opportunity to (re)make all as it should be.

Anyway, I have a couple of questions I'd like to ask your kind opinion:

  1. Are the mentioned json keys (b, i, r, s, t) an industry best practice (and therefore found on IoT devices) or are they your personal implementation/recommendation?
  2. Why the change node implements an array with two objects? Ie. Any reason why "identity" is separated?
  3. For the case where there are several sensors (temperature/humidity) around the house, what would be the DB best-practices?
    One table (measurement) for each device? / The same table for all temperature devices and another for all humidity ones? / Any other setup?
  4. For logging boolean sensors (like PIR and magnetic sensors) is a database like influxdb also recommended?

Thanks,
Joaoabs

@jabss
Copy link

jabss commented Jan 3, 2021

Hi Phill! Thanks again for your help! Everything work out perfect!!! This is the end result:

Screen Shot 2020-09-24 at 1 33 14 AM

Wow, that looks really great. Is that done by grafana?
If so, would you be able to share how you did that please?

I already have an influx db being populated with power information, but I'm not being able to build such a dashboard.
I believe I'd have in influxdb all the information required to build it (example printout below), just need some directions to build the dashboard.

Thanks,
Joaoabs

> use power
Using database power
> select * from power
name: power
time                activeaparentpower current exportactiveenergy importactiveenergy power   powerfactor reactiveaparentpower totalactiveenergy voltage
----                ------------------ ------- ------------------ ------------------ -----   ----------- -------------------- ----------------- -------
1609635431945968365 2769.65            11.27   252806             22743666           2765.10 1.00        -157.10              22996472          246.90
1609635731940326138 2806.40            11.72   252806             22743890           2788.90 0.99        -312.10              22996698          240.80
1609636031946486351 2537.83            10.74   252806             22744110           2530.20 1.00        -195.30              22996916          236.90
1609636331958220821 779.58             3.47    252806             22744242           747.80  0.96        -220.00              22997048          238.00
1609636631959526265 1497.89            6.13    252806             22744352           1466.30 0.98        -305.70              22997158          245.60
1609636931959220801 1650.54            6.68    252806             22744486           1644.20 1.00        -144.20              22997292          251.40

@Paraphraser
Copy link
Author

@jabss

Are the mentioned json keys (b, i, r, s, t) an industry best practice (and therefore found on IoT devices) or are they your personal implementation/recommendation?

Neither. They are just example names, corresponding with:

  • "b" for boolean (ie typical values are "true" and "false")
  • "i" for integer (ie typical values are whole numbers)
  • "r" for real (ie typical values include decimal points)
  • "s" for string (ie anything "encapsulated in quotes")
  • "t" also a string but intended to become a tag

Here are some examples of payload data coming from two of my own sensors:

{"v":244.60,"f":49.99,"a":0.00,"e":512.94,"eMask":0,"upTime":42928,"heap":41680}
{"v":246.0,"ac":300,"day":1.3,"sum":43254.5,"grid":-8,"p1":328,"p2":381,"batt":68}

You'll see I use a mixture of single-character keys and longer keys:

  • The first payload comes from an electricity meter. The values are volts, frequency, amps, accumulated kWh, an error mask, the uptime of the Arduino doing the work, and the free heap on the Arduino. The last three never get into any database but it's useful to see them when I subscribe to MQTT traffic using mosquitto_sub.
  • The second payload comes from a solar inverter. The values are volts, AC watts being generated at the moment (it's very cloudy), the accumulated kWh generated today, the running sum of kWh generated by this inverter since installed, watts to/from the grid (negative means from grid - like I said very cloudy), p1 and p2 are the DC watts being generated by the two strings of solar panels, and batt is the charge state of the batteries (68% full).

There's really no hard and fast rule for any of this. If you want the field key in your database to be "volts" there's no reason why you should use "v" or "volts" or "voltage" in the MQTT JSON payload. I offered "memory constraints" in an IoT device as one reason for using abbreviations rather than full names. Another is that shorter payloads are better when I'm using mosquitto_sub to monitor things (which I do quite a lot - my first debugging step if I think something might be wrong somewhere). Even if you were 100% consistent about JSON keys being equal to database keys, there are still good reasons to use a cross-walk, including:

  • filtering out things like "eMask", "uptime" and so on from the above,
  • the ease with which you can split into fields and tags.
  • doing things like splitting values from a single payload across different databases. An example is a battery-powered temperature sensor. You want to track the voltage as it runs down and the depletion curve is probably of some interest so you can predict when it is likely to run out but "vBat" doesn't really belong with temperature or humidity, plus you've also probably got a mixture of sensors, some battery powered, some not. Two separate paths through two separate cross-walks leading to two separate InfluxDB-out nodes is a better answer.

Why the change node implements an array with two objects? Ie. Any reason why "identity" is separated?

Yes. Because "identity" is a tag. Influx makes a distinction between "fields" and "tags". Suppose you have three temperature sensors. You've got one "outside", another in your "kitchen" and the third is in your "dining room". You might be seeing MQTT flows like this:

/home/temperature/outside {"temp": 19.6}
/home/temperature/kitchen {"temp": 22.6}
/home/temperature/diningRoom {"temp": 21.8}

Alternatively, it might be arranged like this:

/home/temperature {"location": "outside", "temp": 19.6}
/home/temperature {"location": "kitchen", "temp": 22.6}
/home/temperature {"location": "diningRoom", "temp": 21.8}

The incoming format really doesn't matter. When you prepare this data for insertion into Influx, you have two choices:

  • You can store everything as "fields", or
  • You can store variable data as "fields" and invariant data as "tags".

In this example, the temperatures are always going to be changing. You're interested in both the current measurement (what's the temperature in the kitchen now?), the differences over time (is it getting warmer or cooler outside?) and historical information (what was today's maximum temperature outside?).

Conversely, unless you move a sensor, the locations (outside, kitchen, diningRoom) are never going to change. The location-name is always the same on every MQTT message. It's invariant and a candidate for a tag.

Even if you do move a sensor (eg relocate the kitchen sensor to a bedroom), it might be the same physical device but it's a different logical device giving you a new data series, so it really needs a new name. You can't ask "is the temperature in the bedroom hotter than this time last year?" if the sensor was in the kitchen this time last year. Well, you can ask. But it's not a sensible question and it's better if your database returns a NaN or other error than an answer which is meaningless. Garbage in, garbage out!

So, why store something as a tag when a field will do the same job? It's about efficiency. If you define something as a tag, it is only stored once. The value is implied for every row in the series. It still behaves like a field for queries. In something like:

SELECT * FROM temperature WHERE location = 'kitchen'

it matters not at all whether location is a tag or a field. But, as your database grows and grows, that query is a lot more efficient if location is a tag, than if it is a field.

I suppose another way to think of it is like a sub-select in SQL:

SELECT min(temp), max(temp) FROM (SELECT temp FROM temperature WHERE location = 'kitchen')

In that expression, the query engine would create a temporary result set containing just temperature values for kitchen, then the outer query would look for the min() and max() values in that result set. My impression is that Influx tags essentially pre-split the data so the "sub selects" are an implicit part of the database. Another SQL analogue might be that you are indexing by location.

I don't know any of that for hard fact. I'm just guessing from what I see as I fiddle about.

A word to the wise:

  • right back at the get-go I did not understand tags. I've been using SQL for a long time and I couldn't immediately see the point of tags so I shoved the issue to back of mind. I suffered a lot of pain putting everything right once the penny finally dropped. Please try to avoid making the same mistake. If you're thinking something like, "I still don't really understand, I'll stick with fields for now" then my advice is to take the time to understand it now.

For the case where there are several sensors (temperature/humidity) around the house, what would be the DB best-practices? One table (measurement) for each device? / The same table for all temperature devices and another for all humidity ones? / Any other setup?

Probably:

  • one database called something like "weather"
  • one measurement for each distinct metric:
    • temperature
    • humidity
    • pressure
    • etc
  • one series for each contributing sensor - still using the example from above:
    • temperature,location=outside
    • temperature,location=kitchen
    • temperature,location=diningRoom

By the way, suppose you embed the location in the topic:

/home/temperature/outside

and you want to get access to "outside", the way I do it is to use a "change" node immediately after the MQTT-in node:

  • Set msg.location
  • to the Javascript expression $split(msg.topic,'/')[3]

Not "msg.payload.location", just "msg.location" because we don't want to interfere with the payload.

Using "3" as the index might seem counter-intuitive but it's correct. The string is "split" about the "/" so you wind up with:

  • Index 0 = null string
  • Index 1 = home
  • Index 2 = temperature
  • Index 3 = outside

Then, in the change node you use to set up the Influx insert, you'd do something like this:

[
    {
        "temp": msg.payload.temp
    },{
        "location": msg.location
    }
]

I tried doing the $split in the above but couldn't get it to work for some reason so I went with a separate change node. That's turned out to be handy because you can then more easily "switch" if you want to do things like "if outside falls below 5° then send me an alert".

For logging boolean sensors (like PIR and magnetic sensors) is a database like influxdb also recommended?

Influx is a time-series database. Time is the primary key of every table. While there may be situations where you are only interested in "now", it seems to me that I'm always asking questions like, "when was the door with that trigboard last opened?" or "has the postie opened the mailbox today to deliver something?" To me, it seems natural to log every event.

Although I haven't used them, I believe Influx's "retention policies" help with the question of "when is data so old as to be useless?". For example, if I think that I am never going to be interested in whether the mailbox was opened more than a week ago, I could set a retention policy to discard events more than one week old.

I'm having issues trying to create the grafana dashboards and therefore suspecting that maybe I'm inserting the values as string while maybe it should be as integer/floats, so this might the the opportunity to (re)make all as it should be.

I don't know. I suppose it's possible. Have you listed what Influx thinks are the data types? Here's an example:

$ influx
> USE weather
> SHOW FIELD KEYS

name: humidity
fieldKey fieldType
-------- ---------
humidity float

name: pressure
fieldKey         fieldType
--------         ---------
localPressure    float
seaLevelPressure float
trend            string

name: rainfall
fieldKey        fieldType
--------        ---------
rainGaugeClicks float

name: temperature
fieldKey fieldType
-------- ---------
temp     float

name: wind
fieldKey  fieldType
--------  ---------
compass   string
gustSpeed float
heading   float
windSpeed float

You can also do a "SHOW TAG KEYS" but it doesn't really tell you much because I'm pretty sure tags are always strings.

In Grafana, you might try the old programmer's trick of multiplying by 1.0. In many cases, even if "temp" was a string instead of a float, something like:

SELECT 1.0 * temp from temperature

would force an expression evaluation. If "temp" didn't evaluate to a number then it would be an implementation dependency as to whether the expression evaluated as zero, a NaN or chucked an error. If you're going to try this from the CLI, also stick a limit on the end, as in:

SELECT 1.0 * temp from temperature limit 5

You could be certain it was working by changing the coefficient to 2.0 and verifying that the numbers doubled.

@Paraphraser
Copy link
Author

Wow, that looks really great. Is that done by grafana?
If so, would you be able to share how you did that please?

@jabss - that isn't mine - you'd have to ask @ignaciosala

Mine are a bit simpler. This integrates 5-minute data from my solar inverter with 10-second data from probing an electricity meter:

BAE1B751-410E-4055-916F-B4085D63F87E

The meter is attached to a wall plug so it's only good for voltage and frequency. I had an electrician install an identical meter in the distribution board so I could measure whole of house but I haven't gotten around to using that yet. When I do it'll probably be a dashboard more like @ignaciosala

My main interest with the original implementation was tracking the ghastly mains power foisted on us. Australia adopted 230V in 2003. Our power should always be in the green band with very few excursions into the blue band. As you can see, it's the other way around because the suppliers are more interested in maximising profits than expending $ changing transformer taps, and nobody in government has got the guts to do something simple and obvious like say, "if power spends more than 5% of its time in the blue band during any metering period, you can't charge for electricity for the whole of that metering period." That'd wise-'em-up something chronic. There'd be electricians in every street fixing the taps quicker than you could say "what electricity bill?".

We also get days so far into the red zone it's amazing appliances don't just up and melt:

ECDBFEE3-2029-44AF-A955-ED333B74F0CD

Rooftop solar gets blamed for this but, strangely enough, the worst power spikes we ever had were all on the same day:

time                          voltage
----                          -------
2018-05-20T02:18:36.105+10:00 300
2018-05-20T02:18:46.097+10:00 300.4
2018-05-20T07:16:55.562+10:00 300.9
2018-05-20T07:17:05.559+10:00 300.1

Not only did that blow off the top of my chart but any fool can see the sun ain't shining at 2am and, given May 20th is practically the start of winter in the Southern Hemisphere, the sun ain't heaving a whole lot of high-energy photons in our direction at 7.15 in the morning.

60D6B310-F2B3-471C-A474-A9359E0575E3

This is my weather thing. Sensors all over the joint gathering temperature, humidity and pressure. Most are ESP8266 or 32. A couple are battery powered with WhisperNodes (AtMega328p chips + LoRa radio + exceptionally efficient power utilisation). I just stuck a weather station kit in the back yard (replaced the 433Mhz guts with a WhisperNode) so I'm getting rain & wind but I have to figure out how to shoehorn wind into this (there doesn't seem to be a Grafana widget for a compass rose).

5E504C7B-EF61-4EE6-8476-9AA29D75265F

The particulate charts come from a PurpleAir. Those speak HTTP rather than MQTT but it was easy enough to figure out how to get the data into Influx.

@Paraphraser
Copy link
Author

Paraphraser commented Jul 13, 2021 via email

@Numex106
Copy link

@Paraphraser
I had two silly issues that are both fixed and it's working perfectly now, thanks!

  1. I forgot to select "complete msg object" in the debug node, leading to my initial question.
  2. I also didn't realize I had a pesky space at the end of my msg payload keys causing the change node to come up empty.

Thanks again for the fantastic write-up.

@NewHopeApiary
Copy link

Thanks for the write-up! This was very helpful for a newb.

(I did have to add a JSON node in between to get things to work right.)

@Paraphraser
Copy link
Author

@NewHopeApiary you probably missed a step when you first set up the MQTT-in node. If you look at "7" in the diagram you'll see the output needs to be set to "parsed JSON object".

Your way isn't "wrong" so there's no need to change it unless you want to. The result is exactly the same as you are getting by using the extra JSON node. It's just that "MQTT-in to JSON to rest of flow" is such a common pattern that the MQTT-in node lets you do it in a single step. Make sense?

@NewHopeApiary
Copy link

You are correct of course. :) I already had built out a flow and was just modifying it, so I missed that. Appreciate the reply, really wasn't expecting it. Again, thanks for sharing your knowledge!

@jpalecrim
Copy link

Hello @Paraphraser, Thanks a lot for your commitment and effort on this! This has been helping me a lot.
I'm facing a problem where i have 5 Weather Stations (IDs = WS1,WS2,WS3,WS4,WS5) sending readings from simple sensors.
I want to feed the InfluxDB with them and have a single dashboard page for each of them. This project could escalate more so i'm having troubles how to build a flow that could handle and differentiate the IDs so it can send the data for the correct Gauges and Database.

I saw above that you provided a simple flow in which you extract the Sensor ID and prepare the Influx Insert. Could you Explain a little more how can i Write the Change Node JSON for that? I'm not experienced with this language, to be honest. But it does seems to be a simple solution.

@Paraphraser
Copy link
Author

Basically, if the station identifier (WS1, WS2) etc is carried in the message payload, you can use it directly. The "Prepare influx insert" Change node is "set msg.payload to the JSON expression", where the expression field is something like:

[
    {
        "temperature": msg.payload.temp
    },{
        "station": msg.payload.station
    }
]

remembering that this double-array structure tells the Influx insert node that temperature is a field, and station is a tag.

However, if the station identifier is carried in the topic string like this:

/home/weatherstation/WS1

it seems that you need to use a separate Change node ahead of the "Prepare influx insert". I usually call that separate Change node "extract station". It is set up as "set msg.station to the JSON expression" where the expression field is something like:

$split(msg.topic,'/')[3]

Then, the expression in the "Prepare influx insert" node becomes:

[
    {
        "temperature": msg.payload.temp
    },{
        "station": msg.station
    }
]

I assume you see what's going on in:

$split(msg.topic,'/')[3]

It's saying "crack the topic string about the slash". If you use a leading slash on your topic (as in this example of "/home/weatherstation/WS1") you wind up with four elements in the array where the zeroth element is a null string so element 1 is "home" and element 3 is "WS1". If you don't use a leading slash (and opinions vary on whether you should or shouldn't) then a topic like "home/weatherstation/WS1" would see you aiming at element [2] for WS1. Make sense?

I also assume you can see that the "extract station" node is only setting the field "msg.station" and isn't interfering with "msg.payload", so msg.payload persists unchanged into the "Prepare influx insert" node where "msg.station" is also then available.

Last point. The MQTT in node should be a wildcard match. It should be looking for:

/home/weatherstation/+

Of course, it doesn't really matter where the station is:

/home/+/weatherstation/

will match on "/home/WS1/weatherstation". If the device reports against topics like this:

/home/WS1/weatherstation/temp
/home/WS1/weatherstation/humidity

then you'll probably be running two flows, one for temperature, one for humidity, with the temperature flow matching on:

/home/+/weatherstation/temp

I'm really making the point that you can adapt to whatever the sensor provides.

I hope that all makes sense.


I really do not understand why the whole thing can't be done in a single Change node. Something like:

[
    {
        "temperature": msg.payload.temp
    },{
        "station": $split(msg.topic,'/')[3]
    }
]

I tried a few variations on that theme but Node-RED always either grizzled about syntax errors or accepted the expression as valid syntax but it then didn't work. I gave up and took it to be a rule that I needed two nodes.

If you, or anyone else reading this, can nut-out how to do it all in a single node, please point me in the right direction. I use this double-Change-node pattern all over the place and it would be great if it could be reduced to a single Change node.

@tferrin
Copy link

tferrin commented Dec 29, 2022

@Paraphraser, Just a tip-of-the-hat "Thank You" to your gist. Your explanations of MQTT-relevant Node-Red nodes and answers to several questions has been extremely helpful. I especially value your tip about taking the time to figure out what you may not fully understand about InFlux DBs prior to savings lots of data. This caused me to re-configure my data flows into ones that are now more efficient. My only advice to others is to read (and re-read) this gist carefully because there are lots of details included here and it may take a few reads to fully comprehend all the subtle details.

@theDiverDK
Copy link

OMG this is an excellent description :)

After finding this, it was very easy to get it to actually work :)

Thanks

@F4Kbov
Copy link

F4Kbov commented Sep 28, 2023

Excellent gist. Some great tips in there. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment