This document summarizes my findings in potential problems using the MQTT retain flag as a default for every topic in an MQTT application. MQTT retain is a useful feature if used properly. However, the risks of using it (as developer and for users of the software) are rarely warned about in many guides. Only when working with IoT applications for some time and reading through user forums, the MQTT specs and with a bit experience using other middleware projects potential risks may surface. I write this document mainly for myself for future implementation of MQTT application so I can make better decision when to use and when not use the MQTT retain flag.
Note that I'm only considering the MQTT 5 specs here. This document is not a writeup in prosa (except for this section here maybe) and should be used more as a sheet of notes with links for making your own decisions. It's by no means complete and comments helping me improve this document are welcome.
The MQTT 5 specification is very approachable. The section about the RETAIN flag can be found here: https://docs.oasis-open.org/mqtt/mqtt/v5.0/os/mqtt-v5.0-os.html#_Toc3901104
- Good tutorial/explanation of different QoS settings in combination with retain:
- http://www.steves-internet-guide.com/mqtt-retained-messages-example/
- also contains good explanation of where the retain flag can be used properly
Citing from the MQTT 5 specification https://docs.oasis-open.org/mqtt/mqtt/v5.0/os/mqtt-v5.0-os.html Retained messages are useful where publishers send state messages on an irregular basis. A new non-shared subscriber will receive the most recent state.
- Status topics with infrequent updates
- e.g. announcement, IP addresses, configuration, ...
- low power clients that only come online infrequently but the status usually is still valid (e.g. door sensor)
- storing settings across sessions
- not intended by the MQTT specs, but done by some applications
- using retain flag is useful for LWT to indicate online status of a client
- but also works without retain flag set (see [MQTT-3.1.2-14])
- topic space convolution
- especially when paired with message persistence on the broker, as retained message never get deleted
- new messages without retain flag don't remove retained message from server see [MQTT-3.3.1-8] only a null message does remove the retained message [MQTT-3.3.1-6]
- which means clients still receive the (outdated) retained message on reconnect
- client/topic user may not be aware of outdated messages
- for example if no LWT or heartbeat topic is used and the client state is unknown by the topic user
- autodiscovery messages/topics for hass and homie may linger around and cause wrong autodiscovery
- messages don't contain a timestamp, the age of the message is unknown
- can be fixed with an additional timestamp topic
- ghost switching if used incorrectly
- especially problematic for non-experts of the MQTT middleware
- many examples and references in user forums
- potential risk on implementation side of mixing publish with and without retain flag, e.g. the paho Python module uses
retain
flag on per message level - Retained messages with QoS=0 (default in many libraries) set can be unconditionally discarded by the broker. This means if the message isn't re-sent in a certain interval, a new subscriber to a topic might not receive a default value until a new publish happens on the publisher side.
- citing from the MQTT specification
- If the Server receives a PUBLISH packet with the RETAIN flag set to 1, and QoS 0 it SHOULD store the new QoS 0 message as the new retained message for that topic, but MAY choose to discard it at any time. If this happens there will be no retained message for that topic.
- (Retained messages cannot contain historic values (QoS 2), but always return the most recent value)
- (potential security risk as mentioned here https://www.txone.com/blog/mqtt-series-2-potential-risks-of-exposed-mqtt-brokers/ altough the claims are questionable)
In short, consumer/user facing products don't use them as a default, but give expert users the option to use them.
Examples:
- Shelly MQTT guide
- https://shelly-api-docs.shelly.cloud/gen1/#mqtt-configuration
- Citing from the docs
- Default LWT topic and message are
shellies/<shellymodel>-<deviceid>/online
,false
. If these are not set after a firmware upgrade -- perform a factory reset of the device. The LWT topic is retained on user configuration (if the Retain flag is set). However, we do not recommend using retained MQTT messages.
- Tasmota guide
- https://tasmota.github.io/docs/MQTT/#retained-mqtt-messages
- Explains problems that can be caused by retained messages and how to clean them up.
- remove with empty message
- restart broker if no persistence is enabled
- MQTT 5, message expiry interval https://docs.oasis-open.org/mqtt/mqtt/v5.0/os/mqtt-v5.0-os.html#_Toc3901112 mentioned in https://dev.to/emqx/the-beginners-guide-to-mqtt-retained-messages-2no3
- manual cleanup (as explained by Tasmota docs for example)
The MQTT specs don't define who is responsible for cleaning up unused retained messages. However, it would be good practice to at least let user know how to do this by hand when not done automatically (see Tasmota guide on retained MQTT messages for a good example).
Other middleware frameworks try to solve the problem of getting the most recent value of a topic in different but similar ways. Note that the mentioned frameworks are P2P an therefore don't store anything on a broker.
- ROS1: latching topic (https://answers.ros.org/answers/360900/revisions/ not the official guide, but also has a good example)
- ROS2/DDS: QoS setting durability transient local https://docs.ros.org/en/rolling/Concepts/About-Quality-of-Service-Settings.html
- Note that mapping this behavior into other middleware frameworks can be difficult RobotWebTools/ros2-web-bridge#134
- Machinetalk: full update message on connect (https://machinekoder.com/machinetalk-explained-part-4-hal-remote/)