Skip to content

Instantly share code, notes, and snippets.

@ZachBray
Last active December 8, 2023 15:43
Show Gist options
  • Save ZachBray/36cafc8bb0ba70ee7f834c067d93cef3 to your computer and use it in GitHub Desktop.
Save ZachBray/36cafc8bb0ba70ee7f834c067d93cef3 to your computer and use it in GitHub Desktop.
Design ideas for building fault-tolerant systems of micro-services leveraging a distributed event log

Competing Transformers/Producers - WIP

Design ideas for building fault-tolerant systems of micro-services leveraging a distributed event log.

Components

A Distributed Event Log (DEL)

A DEL maintains an event-log and some computed state across several machines. Each event is tagged with an Event Source Key (ESK) that describes its origin. A DEL provides a total order of all events regardless of their ESK.

The computed state comprises:

  • A mapping from Event Source Key (ESK) to event count
  • A mapping from ESK to subscribers

Necessary API for consumers:

  • write(eventSourceKey: EventSourceKey, event: Event): Async<unit>
    • Attempts to write an event to the log.
  • existsOrWrite(eventSourceKey: EventSourceKey, eventIndex: uint64, event: Event): Async<unit>
    • Attempts to write the event to the log if eventIndex = eventCount(eventSourceKey).
  • streamEvents(eventSourceKeys): IObservable<EventSourceKey * Event>
    • Attempts to subscribe to events for the provided ESKs.

Possible implementations:

Event Source (ESo)

An ESo publishes events into the DEL. Each event published by an ESo is tagged with an ESK that identifies the stream of events. Note: the implementation may use the connection to detemine the ESK in order to reduce payloads size.

Each time an event is published, the DEL increments the event counter for the ESK.

Event Sink (ESi)

An ESi subscribes to events from the DEL. This may be for all events or a subset of events based on their ESKs.

Deterministic Event Transform (DET)

A DET is both an ESo and a ESi. For each event it receives it produces zero or more events tagged with its own ESK.

This is where the business logic of an application lives. For example, a matching engine could be implemented as a DET. It may subscribe to orderbook events and produce trade events.

It is important that a DET is deterministic. This allows multiple instances of the same DET logic to run side-by-side and race to produce the event transformation. This uses the existsOrWrite(...) mechanism.

It is possible to generate the same, unique eventIndex for each event across instances because:

  • Each instance will receive input events in the same order.
  • Each instance knows the number of events already written.

Alternatives

  • Business logic inside Raft state-machines rather than
    • Fewer network hops
    • Cannot use log as a generic messaging middleware to reduce service coupling
  • Single process
    • Lowest overhead in terms of network and disk
    • No fault tolerance

TODO

  • Describe snapshotting process
  • Describe journalling to "near-line" storage
  • Introduce streams into terminology (common interfaces to sources, e.g., two sources may produce elements for a single stream)
  • Create implementation
  • Measure latency and throughput
  • Measure latency across differing degrees of DET competition (see if GC jitter is reduced)
  • Compare to other possible implementations
  • Consider computing transformations before event is commited through consensus algorithm (what are the downsides space, forced immutability/transactions?)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment