title |
---|
Elements of Messaging and Eventing Platforms |
This document provides a brief overview of the essential elements of a messaging and eventing platform and how they relate to each other.
Message and event brokers accept, store, and forward messages. When we only consider trivial applications, the differences between event and message brokers appears to be very minute since an event stream broker and a queue-based broker are both capable of getting a message from a producer to a consumer. Yet, event streams, queueing, and pub/sub services implement very different patterns and are generally implemented in distinct products and/or cloud services.
A simple task queue helps with managing discrete jobs that need to be performed reliably.
A simple task queue broker provides focused functionality for this: Messages can be sent into a queue and then delivered under a lock and subsequently either settled by being removed from the queue or put back for another processing attempt.
Delivery attempts are tracked and exceeding a threshold moves the message to a dead-letter queue. To allow for setting processing deadlines, messages can have an assigned expiration time after which they are no longer delivered.
Many of the discrete job scenarios do not require any assurances about the relative order of messages (first in, first out) and no assurances that the distribution of messages amongst workers is fair (first come, first served), which means that simple task queues can forgo those constraining assurances and therefore offer far higher scalability than brokers that need to provide them.
In distributed systems, where applications and their dependencies span many different machines and failure domains, the comfortable illusion of always-consistent outcomes of operations as promised by “classic” two-phase-commit transaction technology is no longer realistic.
In a large, distributed system, it’s impossible to keep promises about the future outcome of an operation and it’s also impossible to build and operate a transaction coordinator that is perfect and never fails.
In absence of that comfortable model, modern cloud applications that require robust execution of work often rely on queues to ensure that critical work gets done once and once only and that consequential workflow steps are only triggered once that work is done.
Transactional queue brokers provide a broad range of features built for scalable execution of transactional workflows. It allows putting transaction boundaries across the settlement of input messages and resulting output sends and it commonly allows for multiplexing many thousands of different application contexts through a single entity (message groups or sessions).
A deduplication feature helps with avoiding unwanted duplicate executions of work, it has a scheduling feature that helps with creating supervisor functions for assigned work, and it allows setting messages safely aside for later processing if process inputs arrive out the expected order.
Many of these features are common and standardized across transactional message brokers, including the transaction model itself. The AMQP 1.0 wire protocol and the Java Message Service (JMS) 2.0 API are two such standards, and those allow applications to switch from one transactional queue broker to another with relative ease.
Transactional queue brokers are commonly the foundation and main integration point of a messaging infrastructure, with other brokers playing auxiliary roles for specific use cases.
An Event Router is a broker that acts as an intermediary between parties raising events and parties that are interested in such events. The publisher can entrust events to the router and walk away, and the router deals with throttling, error conditions, and delivery retry scenarios that the producing application would otherwise have to deal with directly when raising events to external parties. Routers might straddle isolated networks to provide "Layer 7" application-level routing of data.
Event routers commonly support delivery into queues and event streams as well as delivery of events to subscribers in a “push” fashion, where the router initiates the connection. Those subscribers may be HTTP event handlers as commonly exposed by "serverless" compute infrastructures (aka "Lambdas"). With the ability to distribute events into a broad set of targets, an event router is a general-purpose publish/subscribe engine and router.
In business applications, event routers act as the glue for flexible extensibility of the application's core functionality. The core application logic reports state changes and other occurrences into the event router. Authorized parties can subscribe to those events and then extend the functionality of the application without having to change the core logic.
Event stream engines are specialized brokers that are optimized for high throughput and minimal end-to-end latencies. They accept events singly or as batches and then put them into append-only logs with minimal extra processing.
A topic, which is the data structure managed by the broker and usually holds many concurrent streams of a similar kind, is commonly split up into one or multiple partitions, each of which is a storage log. Producers send related events (a stream) to the broker and the broker associates a partition with the stream such that relative order of the events can be preserved. The event engine's partitioning model might not be visibly exposed to producers and consumers.
Consumers then pick a partition and choose an offset from where to start reading. Consumers can read and re-read events repeatedly while they are retained. Multiple consumers can read events concurrently without interfering with each other.
Those principles push all work related to tracking the state of individual messages to the consumer. Event stream engines commonly don’t provide server-side message filtering capabilities and also no deduplication support beyond protecting against duplicates caused by uncertainty about the outcome of an immediate send operation.
On the consumer side, event stream engines usually offer a built-in facility that allows a group of multiple cooperating consumers to coordinate the ownership of partitions amongst themselves and for each partition owner of that group to store “checkpoint” information which tracks how far processing has progressed against the stream but compared to a queue broker’s message-level locking and settlement, this is comparatively lightweight functionality.
As a result of these imposed constraints, an event stream engine can provide far higher throughput and lower latencies on top of the same platform resources than a queue broker because there are fewer “moving parts” inside the broker and fewer feature flags to be considered.
This relative simplicity makes it comparatively easy for event stream engines to scale out across multiple machines and to accept and distribute enormous torrents of data for a topic.
Spanning across those kinds of brokers is a set of common, supporting services that are useful for any messaging infrastructure and which are discussed in the following.
Schema-dependent serialization formats and related frameworks such as Apache Avro or Google Protobuf are popular in eventing and messaging scenarios. With most of the metadata information such as names and types of fields being held externally in schema documents, these frameworks are able to produce very compact on-wire representations of structured data carried as the payload of events and messages.
While the wire-footprint savings are often very significant, the required schema handling add complexity. All communicating parties who need to deserialize an event or message payload need access to the exact schema used to produce it.
For formats that do not depend on external schemas for serialization, schemas might still be desired to allow for consumers or inspecting intermediaries to validate the data structure's compliance with a set of rules.
A schema registry is a service focused on storing, organizing, and accessing schema documents that are used to drive serialization logic, validate data structures, or both. Messages carrying schema-bound payloads can then refer to the schema held by the schema registry by reference, and all consumers that have access to the registry can obtain the correct schema document to decode or validate the payload.
Schema registries can also be federated, allowing schema information to be shared and replicated across network isolation boundaries such that schemas can be brought near consumers that do not share a network scope with producers.
An event catalog contains groups of abstract templates of events or messages that a producer might send. Event catalogs are useful tools at design/development time because they allow explorative discovery of the kinds of events that are available for subscription from a system and to finds out which event sources raise them. An identified event/message catalog entry can then serve as the basis for generating strongly typed representations of the event/message or other code.
The event/message definition entries in the catalog contain templatized metadata information that identifies a particular type of event/message through a set of fixed attributes/properties like, for example, the "type" of CloudEvents, a wildcard "topic" of MQTT, or the "subject" of an AMQP message.
An event/message template in the catalog may then refer to an external or internal schema registry for describing the event payload and/or the structure of complex metadata fields.
Structurally and functionally, an event catalog is very similar to a schema registry, and it is reasonable for a schema registry and an event catalog to be collocated in the same service and on the same endpoint.
In order for consumers to obtain events/messages from services, they will need to understand what information is available from where. The event catalog just discussed is one such facility, focusing on providing information to developers at design/development time.
A discovery service is a more scoped capability that provides an application with programmatic runtime information about subscription services in its reach. The metadata information about event/messages will be very similar to that available from an event catalog, but a discovery service provides this information more concretely for event sources that are currently active, including endpoints, protocols and security configuration details that are not available at design/development time.
For an event-enabled cloud service that raises events for each state change, the information about the event types raised and the schemas used to convey the detail information about the event will be available through the event catalog.
The concrete tenant/account endpoints on which subscriptions can be created to obtain such events and the available choice of events, which may be a scoped down based on runtime constraints such as authorization, will be able from the discovery service.
The discovery service and an event catalog may be collocated in the same service and on the same endpoint.
A subscription service is an endpoint that enables an event delivery destination to subscribe for delivery of events to itself and manage the lifetime of that subscription.
A subscription mechanism is either inherent to the protocol of a message/event broker or requires an extra API if the protocol does not support subscriptions. With protocols like AMQP or MQTT, a subscription operation is a built-in gesture of the protocol interaction. HTTP has no such concept and an explicit subscription service API is needed to make the HTTP delivery endpoint known to the event router.
Queues, pub/sub brokers, and event stream engines are all examples of such subscription services. An event router that enables push delivery is an example where the act of subscribing is initiated by the subscriber (or a party acting on behalf of it) and deliveries are later initiated by the router.
A further capability of a subscription service may be subscription propagation and aggregation. In complex systems, the source of the desired events may be behind a network isolation boundary and unreachable for the subscriber. A subscription service may then accept the subscription request from the subscriber within its own network scope and then propagate the request to the original source such that the event is being published and then routed via an appropriate intermediary towards the desired destination.
With Avro or Protobuf, you can’t properly decode a message without a schema document, which is why there are schema registries. Very similarly, if you are using signatures and encryption, you can’t validate and decrypt a message if you don’t have the proper keys at hand.
A special issue with unidirectional data flows, and especially those that distribute copies of messages to multiple parties, is that the communicating parties cannot easily negotiate symmetric session keys for encryption as they would for a peer-to-peer conversation as with TLS. Instead, they need to lean on asymmetric encryption schemes with private/public keys end-to-end and roll those keys periodically to avoid key degradation.
Producers and consumers therefore need a key management service where they can share public or symmetric keys required to validate or en-/decrypt messages, including key identifiers and epochs for rotating keys. As with a schema registry, the key management service needs to be able to securely replicate both keys and access control rules for those keys across network isolation boundaries where needed.
Many sophisticated solutions require messages to be replicated across entities and broker boundaries in order to implement routing and federation patterns. Messages may have to flow between brokers associated with multiple, different application tenants, or across multiple, different geographical regions.
Realizing such flows requires configurable, managed replication capabilities that can copy messages and events across element of a topology. Replication is different from an event router in that replication routes typically establish relationships between a source and a target, and the replication engine actively acquires the event stream or messages from the source.
Replication often also includes of stateful, rules-based aggregation as well as stateful and stateless event/message-triggered serverless compute capabilities that act on events and messages, including their payload content.
Acting on payload content is an important distinction to event routers, which generally only act on message metadata.
An aggregation engine may have to validate and decrypt payloads and then decode structured data from those payloads with the help of schemas, then act on the structured data, and eventually encode as well as potentially encrypt and sign the aggregation result as it is being forwarded, meaning that aggregation engines build on and require all most other supporting services enumerated in this section.