Skip to content

Instantly share code, notes, and snippets.

@jbrzozoski
Last active April 20, 2022 16:06
Show Gist options
  • Save jbrzozoski/2f0985953e4da789c47d724bb30e9bc7 to your computer and use it in GitHub Desktop.
Save jbrzozoski/2f0985953e4da789c47d724bb30e9bc7 to your computer and use it in GitHub Desktop.

This is ultra-low-priority junk I had stuck in my head last night, and I figured writing it all down might be the best way to stop thinking about it.

One of the issues with Sparkplug B is support for systems with multiple application nodes. Specifically, in terms of coordinating BIRTH messages.

These thoughts are not compatible with Sparkplug B, but are still interesting, possibly for a future generation... I'm trying to think them through before badgering people on the committee with them.

I'm contemplating what happens if you split the variable and non-variable portions of the NBIRTH into two messages, publishing the non-variant portion persistently, and using the LWT to both announce the node has gone offline and clear the persistent publish.

Let's call the new persistent topic NSCHEMA.

I'm trying to think through this setup:

  • When an edge node connects to a broker, it persistently publishes a NSCHEMA payload containing the non-variable parts of the NBIRTH (metric names, aliases, datatypes, unchanging properties, etc.)
  • The LWT would be a non-persistent message to NSCHEMA, with a payload indicating the node has gone offline
  • As part of the connection process, the node would be required to send an NDATA containing the current values for all metrics
  • Nodes would still support a "Node Control/Rebirth" metric which would trigger a resend of both the persistent NSCHEMA and a full NDATA
  • Nodes would also have a "Node Control/Resync" command, which trigger a resend of a full NDATA

The benefits of this are hopefully obvious:

  • When a new application comes online and subscribes, it will automatically receive the NSCHEMA of all nodes that are currently online
  • If the application needs to know the full status of any of those nodes, it can request a resync
  • Even if the application doesn't need to know the full status or isn't allowed to publish, it can still understand new values as the nodes publish them
  • As nodes go offline, listening application will be notified immediately, and the retained NSCHEMA will be cleared (decreasing broker load)

I'm not sure how sub-devices work in this system. I think having a DSCHEMA makes sense, and maybe persistently, but they can't be automatically cleared by the LWT. An application could still tell that that the sub-devices are offline by the fact that it didn't receive a NSCHEMA for a parent node, but the broker could accumulate old DSCHEMA over time. Perhaps an application node could clear them based on some rules.

I'm also not sure how sequence numbers work in this system. They traditionally were reset when a NBIRTH was sent, but there is no NBIRTH message any more. Maybe the full-NDATA should have a special flag indicating it is a complete resync and also reset the sequence number when it gets sent?

@jbrzozoski
Copy link
Author

I use "persistent" to mean MQTT "retain".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment