Skip to content

Instantly share code, notes, and snippets.

@jsquire
Last active October 7, 2021 02:16
Show Gist options
  • Save jsquire/caeacd165785a35811e00292203eb36d to your computer and use it in GitHub Desktop.
Save jsquire/caeacd165785a35811e00292203eb36d to your computer and use it in GitHub Desktop.
Event Hubs: Idempotent Producer Context and Requirements

Event Hubs: Idempotent Event Publishing

When publishing events to Event Hubs, timeouts or other transient failures may introduce ambiguity into the understanding of whether a batch of events was received by the service. Because Event Hubs has an at-least-once guarantee for delivery and consumers are strongly encouraged to be idempotent in their processing, the common approach is to resend any batch where the status of receipt was unknown.

In some specialized scenarios, producers have a need to make efforts to avoid publishing duplicate events. To support these scenarios, the Event Hubs service is adding support for annotating messages with metadata to indicate the sequence in which events were intended to be published and to use that as an indicator of which events were already received by a the target partition. This functionality comes at a performance cost, however, and requires producers to follow a strict set of semantics to allow the service to perform server-side deduplication based on the intent of the producer.

It is important to note that idempotent publishing endeavors to reduce the number of duplicate events that are published, but cannot fully eliminate them. The guarantee of the Event Hubs service is not altered by this feature and remains an at-least-once guarantee.

Target segment: Developers with specialized needs

These are developers working on products which have special needs that are often specialized and do not fit into the majority case for many Event Hubs client library users. While this segment has a much smaller addressable market, those that fall into this segment often drive a large amount of ACR.

In the context of the idempotent publishing feature, these developers are ones for whom expensive to process duplicates when consuming events. In order to reduce duplicates when consuming, they are willing to take on additional complexity, conform to a restrictive set of semantics, and experience some performance degradation when publishing.

Why this is needed

The Event Hubs service is offering published event deduplication as a new feature. Because usage requires metadata to be sent when establishing the connection to the Event Hubs service and reacting to information received at the transport-level, client library support will be needed to take take advantage of the feature.

Terminology

  • Idempotent Producer is the name used by the Event Hubs service team for the feature performing server-side deduplication.

  • Idempotent denotes that an operation may be safely performed multiple times without the end state changing after the first successful execution. In the context of this feature, the use of idempotent is constrained to the publishing of events to a specific partition, by a specific producer client to a specific partition and limited to the period of time in which the Event Hubs service retains the associated state.

High level scenarios

Real-time data aggregation and analysis

A financial company offers stock brokering service to its customers, offering a fully managed experience where brokers monitor market activity and make trades in response. Because of the volatility of the markets, access to real-time information with the highest accuracy and lowest possible latency is paramount. An extra second of delay or too large an inaccuracy of a calculation can make a significant difference in the valuation of a trade, sometimes swinging the amount by millions of dollars.

For the platform that compiles market data, the cost of guaranteeing idempotency when processing data is too high. While a very small amount of duplicate processing is acceptable, there exists a threshold where too many duplicates would negatively impact the margin of accuracy, introducing unreliable aggregate data. Because data is compiled from multiple sources, each flowing into a set of message brokers, there is slightly less sensitivity over timing and availability for a single source. This provides an opportunity to apply some form of duplicate prevention without unreasonable risk of impact to the downstream system.

Though there is reduced sensitivity to latency, it is still highly important to maintain a consistently high throughput when publishing events to the broker. Ideally, this is a function of the broker, allowing the approach for deduplication to be applied consistently across data sources and allowing producers to concentrate on publishing.

High level use cases

Target segment: Developers with specialized needs

  • Users should be able to create a producer for idempotent publishing.

  • Once a producer is created, users should not be able to disable idempotent publishing if it was enabled; the state of the feature is determined at the time of creation.

  • Once a producer is created, users should not be able to enable idempotent publishing if it was disabled; the state of the feature is determined at the time of creation.

  • Producers with idempotent publishing enabled should only support idempotent publishing. Users should be presented with a single, consistent set of semantics.

  • Users should not be able to publish to the Event Hubs gateway for automatic routing or with a partition key using a producer with idempotent publishing enabled, as idempotency is only supported when publishing directly to a partition.

  • Users should be able to publish to any partition of the Event Hub using the producer; each partition should function independently and not interfere or influence the state of other partitions.

  • Users should be able to create events and batches without consideration of idempotency. Sequence numbering should be applied by the client when the user attempts to publish an event batch or set of events.

  • Users should be able to publish an event batch to a partition without the need to manually coordinate sequencing.

  • Users should be able to publish a set of events to a partition without the need to manually coordinate sequencing.

  • If an event batch was successfully published, the batch should be updated to reflect the starting sequence number applied for publishing; using the starting sequence number and count of events in the batch, users should be able to understand the full set of sequence numbers that were published by the associated SendAsync call.

  • If a set of events was successfully published, each event should be updated to reflect the sequence number that was published; by inspecting the sequence number of each event in the set, users should be able to understand the full set of sequence numbers that were published by the associated SendAsync call.

  • If an event batch or set of events is successfully published, users should be notified of success.

  • If a publishing operation fails with a transient error, the producer should perform retries as governed by the configured retry policy. During retries, the sequence numbers assigned to events should not change; each attempt should pass the same set of events using the same sequence numbers to the Event Hubs service.

  • If a publishing operation fails with a transient error after all retry attempts have been exhausted, users should be presented with a meaningful error.

  • If a publishing operation fails with a fatal error, the producer will not attempt to retry and will consider the operation a failure immediately. Users should be presented with a meaningful error.

  • If a publishing operation fails with a fatal error related to sequencing, the producer will not attempt to retry and will consider the operation a failure immediately. Users should be presented with a meaningful error that indicates that it is not safe to continue using the producer to publish events to the associated partition; users should be encouraged to close and recreate the producer.

  • If publishing of an event batch completes with failure, the batch should not be updated to reflect sequencing. Users should be able to publish the provide the same batch to SendAsync and trigger a new publishing operation that will sequence the batch and treat it as if no previous attempt to publish was made.

  • If publishing of a set of events completes with failure, none of the events should not be updated to reflect sequencing. Users should be able to provide the same events, in this set or in different sets, to SendAsync and trigger a new publishing operation that will sequence the events and treat them as if no previous attempt to publish was made.

  • If a user cancels a publishing operation, the producer should respect the request while ensuring that it takes place at a safe and deterministic point. If cancellation is triggered and events have not been confirmed as accepted by the service, behavior will mirror the failure scenario. If events have already been accepted by the service, behavior will mirror the successful send scenario.

Target segment: Advanced developers with specialized needs

  • All use cases from the “Developers with specialized needs” segment apply here as well.

  • Users should be able to query the state of a partition using the producer. The state should reflect the critical attributes for idempotent publishing, including the Producer Group Id, Owner Level, and Latest Sequence Number.

  • Users should be able to choose the sequence number at which sequencing starts for a partition and express that preference to both the producer and the Event Hubs service. For partitions that the user does not provide explicit configuration, the sequence number specified by the Event Hubs service should be used.

Out of scope

These items are not necessary to satisfy the use cases required for the initial release of the idempotent producer feature and/or warrant being treated as an independent feature. This categorization does not imply that they are without merit or should not be included in the client library, only that consideration and discussion should be reserved for a different context.

  • When publishing an event batch or set of events that has already been sequenced, users should be able to request that the existing sequence numbers be ignored, and a new set of sequencing be applied.

  • A streaming model for publishing events has been discussed in some detail in the context of idempotent publishing. While the streaming model is something that should be given consideration, it is large enough in scope to warrant being treated as a feature unto itself and is orthogonal to idempotent publishing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment