jbrzozoski/sparkplug_schema_thoughts.md

## sparkplug_schema_thoughts.md

      
    Raw
  

              sparkplug_schema_thoughts.md
            
          
    This is ultra-low-priority junk I had stuck in my head last night, and I figured writing it all down might be the best way to stop thinking about it.
One of the issues with Sparkplug B is support for systems with multiple application nodes.  Specifically, in terms of coordinating BIRTH messages.
These thoughts are not compatible with Sparkplug B, but are still interesting, possibly for a future generation...  I'm trying to think them through before badgering people on the committee with them.
I'm contemplating what happens if you split the variable and non-variable portions of the NBIRTH into two messages, publishing the non-variant portion persistently, and using the LWT to both announce the node has gone offline and clear the persistent publish.
Let's call the new persistent topic NSCHEMA.
I'm trying to think through this setup:

When an edge node connects to a broker, it persistently publishes a NSCHEMA payload containing the non-variable parts of the NBIRTH (metric names, aliases, datatypes, unchanging properties, etc.)
The LWT would be a non-persistent message to NSCHEMA, with a payload indicating the node has gone offline
As part of the connection process, the node would be required to send an NDATA containing the current values for all metrics
Nodes would still support a "Node Control/Rebirth" metric which would trigger a resend of both the persistent NSCHEMA and a full NDATA
Nodes would also have a "Node Control/Resync" command, which trigger a resend of a full NDATA

The benefits of this are hopefully obvious:

When a new application comes online and subscribes, it will automatically receive the NSCHEMA of all nodes that are currently online
If the application needs to know the full status of any of those nodes, it can request a resync
Even if the application doesn't need to know the full status or isn't allowed to publish, it can still understand new values as the nodes publish them
As nodes go offline, listening application will be notified immediately, and the retained NSCHEMA will be cleared (decreasing broker load)

I'm not sure how sub-devices work in this system.  I think having a DSCHEMA makes sense, and maybe persistently, but they can't be automatically cleared by the LWT.  An application could still tell that that the sub-devices are offline by the fact that it didn't receive a NSCHEMA for a parent node, but the broker could accumulate old DSCHEMA over time.  Perhaps an application node could clear them based on some rules.
I'm also not sure how sequence numbers work in this system. They traditionally were reset when a NBIRTH was sent, but there is no NBIRTH message any more. Maybe the full-NDATA should have a special flag indicating it is a complete resync and also reset the sequence number when it gets sent?