rponte/messaging-and-eventing-platforms.md

## messaging-and-eventing-platforms.md

      
    Raw
  

              messaging-and-eventing-platforms.md
            
          
  title
  
  
  Elements of Messaging and Eventing Platforms
  
  
This document provides a brief overview of the essential elements of a messaging
and eventing platform and how they relate to each other.
Message and Event Broker Categories

Message and event brokers accept, store, and forward messages. When we only
consider trivial applications, the differences between event and message brokers
appears to be very minute since an event stream broker and a queue-based broker
are both capable of getting a message from a producer to a consumer. Yet, event
streams, queueing, and pub/sub services implement very different patterns and
are generally implemented in distinct products and/or cloud services.
Simple Task Queues

A simple task queue helps with managing discrete jobs that need to be performed
reliably.
A simple task queue broker provides focused functionality for this:
Messages can be sent into a queue and then delivered under a lock and
subsequently either settled by being removed from the queue or put back for
another processing attempt.
Delivery attempts are tracked and exceeding a threshold moves the message to a
dead-letter queue. To allow for setting processing deadlines, messages can have
an assigned expiration time after which they are no longer delivered.
Many of the discrete job scenarios do not require any assurances about the
relative order of messages (first in, first out) and no assurances that the
distribution of messages amongst workers is fair (first come, first served),
which means that simple task queues can forgo those constraining assurances and
therefore offer far higher scalability than brokers that need to provide them.
Transactional Queuing and Pub/Sub

In distributed systems, where applications and their dependencies span many
different machines and failure domains, the comfortable illusion of
always-consistent outcomes of operations as promised by “classic”
two-phase-commit transaction technology is no longer realistic.
In a large, distributed system, it’s impossible to keep promises about the
future outcome of an operation and it’s also impossible to build and operate a
transaction coordinator that is perfect and never fails.
In absence of that comfortable model, modern cloud applications that require
robust execution of work often rely on queues to ensure that critical work gets
done once and once only and that consequential workflow steps are only triggered
once that work is done.
Transactional queue brokers provide a broad range of features built for
scalable execution of transactional workflows. It allows putting transaction
boundaries across the settlement of input messages and resulting output sends
and it commonly allows for multiplexing many thousands of different application
contexts through a single entity (message groups or sessions).
A deduplication feature helps with avoiding unwanted duplicate executions of
work, it has a scheduling feature that helps with creating supervisor functions
for assigned work, and it allows setting messages safely aside for later
processing if process inputs arrive out the expected order.
Many of these features are common and standardized across transactional message
brokers, including the transaction model itself. The AMQP 1.0 wire protocol and
the Java Message Service (JMS) 2.0 API are two such standards, and those allow
applications to switch from one transactional queue broker to another with
relative ease.
Transactional queue brokers are commonly the foundation and main integration
point of a messaging infrastructure, with other brokers playing auxiliary roles
for specific use cases.
Event Router

An Event Router is a broker that acts as an intermediary between parties raising
events and parties that are interested in such events. The publisher can entrust
events to the router and walk away, and the router deals with throttling, error
conditions, and delivery retry scenarios that the producing application would
otherwise have to deal with directly when raising events to external parties.
Routers might straddle isolated networks to provide "Layer 7" application-level
routing of data.
Event routers commonly support delivery into queues and event streams as well as
delivery of events to subscribers in a “push” fashion, where the router
initiates the connection. Those subscribers may be HTTP event handlers as
commonly exposed by "serverless" compute infrastructures (aka "Lambdas"). With
the ability to distribute events into a broad set of targets, an event router is
a general-purpose publish/subscribe engine and router.
In business applications, event routers act as the glue for flexible
extensibility of the application's core functionality. The core application
logic reports state changes and other occurrences into the event router.
Authorized parties can subscribe to those events and then extend the
functionality of the application without having to change the core logic.
Event Stream Engine

Event stream engines are specialized brokers that are optimized for high
throughput and minimal end-to-end latencies. They accept events singly or as
batches and then put them into append-only logs with minimal extra processing.
A topic, which is the data structure managed by the broker and usually holds
many concurrent streams of a similar kind, is commonly split up into one or
multiple partitions, each of which is a storage log. Producers send related
events (a stream) to the broker and the broker associates a partition with the
stream such that relative order of the events can be preserved. The event
engine's partitioning model might not be visibly exposed to producers and
consumers.
Consumers then pick a partition and choose an offset from where to start
reading. Consumers can read and re-read events repeatedly while they are
retained. Multiple consumers can read events concurrently without interfering
with each other.
Those principles push all work related to tracking the state of individual
messages to the consumer. Event stream engines commonly don’t provide
server-side message filtering capabilities and also no deduplication support
beyond protecting against duplicates caused by uncertainty about the outcome of
an immediate send operation.
On the consumer side, event stream engines usually offer a built-in facility
that allows a group of multiple cooperating consumers to coordinate the
ownership of partitions amongst themselves and for each partition owner of that
group to store “checkpoint” information which tracks how far processing has
progressed against the stream but compared to a queue broker’s message-level
locking and settlement, this is comparatively lightweight functionality.
As a result of these imposed constraints, an event stream engine can provide far
higher throughput and lower latencies on top of the same platform resources than
a queue broker because there are fewer “moving parts” inside the broker and
fewer feature flags to be considered.
This relative simplicity makes it comparatively easy for event stream engines to
scale out across multiple machines and to accept and distribute enormous
torrents of data for a topic.
Supporting Services

Spanning across those kinds of brokers is a set of common, supporting services
that are useful for any messaging infrastructure and which are discussed in the
following.
Schema Registries

Schema-dependent serialization formats and related frameworks such as Apache
Avro or Google Protobuf are popular in eventing and messaging scenarios. With
most of the metadata information such as names and types of fields being held
externally in schema documents, these frameworks are able to produce very
compact on-wire representations of structured data carried as the payload of
events and messages.
While the wire-footprint savings are often very significant, the required schema
handling add complexity. All communicating parties who need to deserialize an
event or message payload need access to the exact schema used to produce it.
For formats that do not depend on external schemas for serialization, schemas
might still be desired to allow for consumers or inspecting intermediaries to
validate the data structure's compliance with a set of rules.
A schema registry is a service focused on storing, organizing, and accessing
schema documents that are used to drive serialization logic, validate data
structures, or both. Messages carrying schema-bound payloads can then refer to
the schema held by the schema registry by reference, and all consumers that have
access to the registry can obtain the correct schema document to decode or
validate the payload.
Schema registries can also be federated, allowing schema information to be
shared and replicated across network isolation boundaries such that schemas can
be brought near consumers that do not share a network scope with producers.
Event/Message Catalog

An event catalog contains groups of abstract templates of events or messages
that a producer might send. Event catalogs are useful tools at
design/development time because they allow explorative discovery of the kinds of
events that are available for subscription from a system and to finds out which
event sources raise them. An identified event/message catalog entry can then
serve as the basis for generating strongly typed representations of the
event/message or other code.
The event/message definition entries in the catalog contain templatized metadata
information that identifies a particular type of event/message through a set of
fixed attributes/properties like, for example, the "type" of CloudEvents, a
wildcard "topic" of MQTT, or the "subject" of an AMQP message.
An event/message template in the catalog may then refer to an external or
internal schema registry for describing the event payload and/or the structure
of complex metadata fields.
Structurally and functionally, an event catalog is very similar to a schema
registry, and it is reasonable for a schema registry and an event catalog to be
collocated in the same service and on the same endpoint.
Discovery Service

In order for consumers to obtain events/messages from services, they will need
to understand what information is available from where. The event catalog just
discussed is one such facility, focusing on providing information to developers
at design/development time.
A discovery service is a more scoped capability that provides an application
with programmatic runtime information about subscription services in its reach.
The metadata information about event/messages will be very similar to that
available from an event catalog, but a discovery service provides this
information more concretely for event sources that are currently active,
including endpoints, protocols and security configuration details that are not
available at design/development time.
For an event-enabled cloud service that raises events for each state change, the
information about the event types raised and the schemas used to convey the
detail information about the event will be available through the event catalog.
The concrete tenant/account endpoints on which subscriptions can be created to
obtain such events and the available choice of events, which may be a scoped
down based on runtime constraints such as authorization, will be able from the
discovery service.
The discovery service and an event catalog may be collocated in the same service
and on the same endpoint.
Subscription Services

A subscription service is an endpoint that enables an event delivery destination
to subscribe for delivery of events to itself and manage the lifetime of that
subscription.
A subscription mechanism is either inherent to the protocol of a message/event
broker or requires an extra API if the protocol does not support subscriptions.
With protocols like AMQP or MQTT, a subscription operation is a built-in gesture
of the protocol interaction. HTTP has no such concept and an explicit
subscription service API is needed to make the HTTP delivery endpoint known to
the event router.
Queues, pub/sub brokers, and event stream engines are all examples of such
subscription services. An event router that enables push delivery is an example
where the act of subscribing is initiated by the subscriber (or a party acting
on behalf of it) and deliveries are later initiated by the router.
A further capability of a subscription service may be subscription propagation
and aggregation. In complex systems, the source of the desired events may be
behind a network isolation boundary and unreachable for the subscriber. A
subscription service may then accept the subscription request from the
subscriber within its own network scope and then propagate the request to the
original source such that the event is being published and then routed via an
appropriate intermediary towards the desired destination.
Encryption Key Management

With Avro or Protobuf, you can’t properly decode a message without a schema
document, which is why there are schema registries. Very similarly, if you are
using signatures and encryption, you can’t validate and decrypt a message if you
don’t have the proper keys at hand.
A special issue with unidirectional data flows, and especially those that
distribute copies of messages to multiple parties, is that the communicating
parties cannot easily negotiate symmetric session keys for encryption as they
would for a peer-to-peer conversation as with TLS. Instead, they need to lean on
asymmetric encryption schemes with private/public keys end-to-end and roll those
keys periodically to avoid key degradation.
Producers and consumers therefore need a key management service where they can
share public or symmetric keys required to validate or en-/decrypt messages,
including key identifiers and epochs for rotating keys. As with a schema
registry, the key management service needs to be able to securely replicate both
keys and access control rules for those keys across network isolation boundaries
where needed.
Flows and Aggregation

Many sophisticated solutions require messages to be replicated across entities
and broker boundaries in order to implement routing and federation patterns.
Messages may have to flow between brokers associated with multiple, different
application tenants, or across multiple, different geographical regions.
Realizing such flows requires configurable, managed replication capabilities
that can copy messages and events across element of a topology. Replication is
different from an event router in that replication routes typically establish
relationships between a source and a target, and the replication engine actively
acquires the event stream or messages from the source.
Replication often also includes of stateful, rules-based aggregation as well as
stateful and stateless event/message-triggered serverless compute capabilities
that act on events and messages, including their payload content.
Acting on payload content is an important distinction to event routers, which
generally only act on message metadata.
An aggregation engine may have to validate and decrypt payloads and then decode
structured data from those payloads with the help of schemas, then act on the
structured data, and eventually encode as well as potentially encrypt and sign
the aggregation result as it is being forwarded, meaning that aggregation
engines build on and require all most other supporting services enumerated in
this section.