Skip to content

Instantly share code, notes, and snippets.

@iftheshoefritz
Last active February 14, 2023 13:43
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save iftheshoefritz/a0c592ce3b92916bd3c5749f1c18aa26 to your computer and use it in GitHub Desktop.
Save iftheshoefritz/a0c592ce3b92916bd3c5749f1c18aa26 to your computer and use it in GitHub Desktop.

## Current ordering problems:

  • the most complicated parts of the system are coupled to orders being placed
    • e.g. order -> payment completed -> branch assignment -> driver assignment
    • “everything initiated by an order” is hard to avoid in standard web apps where user input (requests) initiate all activity.
    • This chain is too long, it’s hard to reason about
  • ordering features fight for resources
    • restaurant dashboard vs driver assignment vs order list vs driver list
      • all have different purposes with different requirements in terms of:
        • how often they need to be updated
        • what data is queried
        • how denormalised they are
      • currently all constructed from the same data
        • lots of competition
      • all working very hard to transform normalised data for its own use case
  • reads fight with writes
    • e.g. drivers constantly updating their status affects whether driver can be chosen for an order
  • ordering async features are only scalable by function, not by data
    • can scale PaymentProcessor and DriverAssignmentJob easily
    • not so easy to scale DriverAssignmentJob for Branch A separately from DriverAssignmentJob for Branch B
  • understanding state changes that lead to a problem is hard
    • requires current state in DB + extensive use of logs that might not be there

## Characteristics of event streaming architecture that can address the above:*

  • events can drive small pieces of logic (easy to understand) that then create another event for a different piece of logic to pick up.
    • creates natural decoupling
  • normal/easy/deterministic to create different views of the same data
    • stay up to date
    • parallelisable
    • each piece of the app becomes fast because it reads from a view that is optimised for it
  • normal to have separate reads and writes
  • scale based on different data
    • e.g. branch A, B, C only have a trickle of data but D needs a dedicated consumer
  • inspect how a piece of data reached its current state by looking at all the events that apply

Notes from various sources below:

## Why async jobs aren’t an answer

  • https://3.basecamp.com/3091943/buckets/15558617/messages/3571977802
    • things that Joe says you’d re-implement badly if you tried to do a kafka project with job queing:
      • interwoven with explanations from https://ssudan16.medium.com/kafka-internals-47e594e3f006
      • replaying
        • replay events from the beginning for new apps
      • archival
      • partitioning
        • topics are partitioned
        • partition can be determined by producer, or it can be hash of the content
        • partitions have leaders and replicas
        • in-sync replicas is a list of replicas that are caught up with the leader
          • if the leader goes away, one of the ISRs is chosen to replace
        • leaders can remove dodgy replica if it falls too far behind or doesn’t send heartbeats
      • message identification
        • messages uniquely identified by combination of topic, partition, and offset
        • dunno why this is important and different from DJ
          • maybe messages retried get new IDs?
      • time skew resolution
        • different concepts of time: when the event was generated, when the event was ingested, when the event was processed
        • not too sure of what this was
      • consistency
      • resiliency
        • leader/replica are tracked and can be replaced if they become dodgy
        • min insync replicas must be met before the message is commited from the write-ahead log
        • when a broker doesn’t send a heartbeat to zookeeper, ZK notifies cluster controller to fetch all partitions for which BROKER was the leader (check that) and promote another broker to the new leader
      • aggregation
      • compaction
      • scaling
        • topics split into partitions
        • partitions can be split across multiple brokers
        • each partition has one consumer within a consumer group
        • multiple partitions can be consumed by a single consumer
          • as soon as more consumers are added the partitions are distributed between the consumers
          • max scaling within a group up to one consumer per partition
        • more apps consuming events = more consumer groups
        • long polling probably also helps scaling?
    • also higher level tools for:
      • aggregation
      • batching
      • consistency guarantees
      • archival

## Martin Fowler event sourcing doc

  • https://martinfowler.com/eaaDev/EventSourcing.html
    • just adding events from processing point of view is unnecessary step
    • does give you history of application
      • this could be done just by logging
    • gains with events
      • complete rebuild: discard state and replay events from the beginning
      • temporal query: start from the beginning and replay up to the time you want
      • replay: reverse events back to a broken event, fix and replay
        • this can also fix events out of order
    • state storage
      • getting state by replaying from the beginning is expensive
        • can start from overnight snapshot and play events after that point in time to get to latest
        • faster recovery from crash
    • interacting with external systems
      • when doing an event replay, we don’t want to (e.g.) send reports for those items that were sold again
        • gateway between you and external payment system can decide if this update is because of event replay
        • or notification to outside world can be scheduled using a different event (e.g. monthly) rather than being based on the individual items being sold
      • getting data externally based on an old event might be a problem if the external system can’t give you data as of the date of the old event
        • maybe gateway to external system can save responses so that they can be used again when replaying
    • changes to existing code
      • reprocess data using newly fixed code, but should old events have values based on code at the old time or should they be based on the new code?
    • when to use AKA it’s difficult so there must be some kind of return
      • complete audit log can be nice for support or debugging
        • text logs can do some of this
        • debugging by rewinding to the exact state at time X is different to debugging with logs
      • can decouple things and parallelise
      • parallel models
        • you could take the current state and then simulate a christmas rush by playing lots of fake events
      • retroactive events
        • in a normal system, when some data is broken you have to figure out what the system did with that bad data and work out what it should have been based on the fixed data
        • with event sourcing and retroactive events you can just throw an event on the log that fixes the data and get the fixed result “for free”

## New York Times case study

  • NY Times publishing
  • consumers:
    • elasticsearch cluster that can be a bit behind but must be easily re-indexable when indexing requirements change
    • list providers that need to be updated when a new piece of content matches their query or stops matching
    • live content that must up to date asap with new content but only cares about latest content
    • personalisation service that needs to reprocess content every time a personalisation algorithm changes
  • API problems:
    • every API has its own schema, same name different content, different name same content etc
    • API structures are inconsistent
    • consumers need knowledge of each different API’s quirks
    • previously published content is hard to retrieve, difficult to reprocess everything
  • DB as source of truth:
    • problem
      • hard to make big schema changes
      • hard to replace without downtime
        • snapshots immediately become outdated
        • therefore hard to create derived store like search index
        • clients of DBs end up writing to DB AND something else (e.g. search index)
          • consistency problem if one write fails and others succeed
  • Log architectures
    • problems with DB are result of DB being storage that is result of events like updates
    • in log architecture the store IS the list of events
      • use the log to create materialized custom data stores
      • each application’s data store (probably traditional RDBMS) can store what it wants from the log in the structure that makes sense for that app
    • distinction between getting all data and just getting updates goes away
      • Start at the beginning or start at latest (it doesn’t matter) then keep consuming events after that
      • therefore getting a “dump” is easy
  • kafka uses a log architecture
    • this is just an implementation detail when using Kafka as msg broker
      • other services do just as well, e.g. SQS
    • log architecture fundamental characteristic of using kafka when you need both of the following:
      • ability to recreate a data store from scractch (with all data events ever)
      • order of events matters
  • NYT has a monolog of everything published since 1851
    • recently published service only gets content from the last few hours
    • “science section” service would build up its list from the beginning of time
    • assets are normalised, e.g. an image can be in multiple articles, a tag can be on multiple articles
      • image, tag, article are each separate events
    • monolog is single partition topic
      • want to know that assets referenced by article appear before article
  • NYT also has a denormalised log
    • when article is published, a single event containing the article and everything it references is published
      • including assets that already existed on the normalised log
    • denormalised log is built by a Denormaliser application that consumes the normalised log
      • keeps local state of latest version of every article
      • on create article or update of dependency asset, denormaliser publishes the article again to denormalised log
    • denormalised log doesn’t need everything to be ordered completely, just different versions of the same article need to be kept in order
      • so now they can put those articles into lots of different partitions, as long as the same article will always appear on the same partition
  • advantages of Kafka at NYT:
    • software is simpler because all content is consistent (i.e. events with particular structure from that one monolog)
    • simpler deployments e.g. just replay all the content when an Elasticsearch analyzer changes instead of making in-place replacements
    • can track updates to each individual piece of content easily

## “Turning the database inside out with Apache Samza”

  • typically web app = stateless client + stateless backend + shared global mutable DB
    • but we try to get rid of shared global mutable state in code everywhere else
  • alternatives can be derived by looking at 4 common use cases:
    1. Replication
      • update cart set qty = 3 where customer_id = 123 and product_id = 999
      • write query to leader
      • then replicate:
        • update row 8765 old = [123, 999, 1] new = [123, 999, 3]
      • original update query was imperative
      • replication update is an immutable fact: the customer 123 changed the row in their cart for product 999 from 1->3
    2. Secondary indexes
      • CREATE INDEX ON cart(customer_id); CREATE INDEX ON cart(product_id)
      • DB creates extra data structure which is a different representation of the same data
        • usually some sort of key-value store
        • automatically updates values
          • in a transactional way so index is always consistent
        • no extra thought required from the user, the DB just knows how to do it
      • giant indexes can be done concurrently
        • CREATE INDEX CONCURRENTLY
        • DB also just knows how to do this
    3. caching
      • possible to do this in application layer
        • r = cache.get(key); if !r r = db.get(key); cache.put(key, r);
          • invalidation: how do you know you have to update the cache?
          • race conditions
          • cold starts where DB gets overwhelmed because nothing is cached
      • contrast this with ease of CREATE INDEX CONCURRENTLY
    4. Materialized view
      • DB view that is persisted on disk
      • differences with application level caching:
        • more flexibility with application level caching
        • much easier to create materialized view (CREATE MATERIALIZED VIEW)
        • doesn’t take all the load off the DB
  • All 4 above are derived data. Differ in how well they work:
    1. replication: mature
    2. secondary indexes: really good
    3. caching: complete mess
    4. materialized view: meh
  • can we rethink materialized view so that it works better
    • make replication stream a first class concept
    • can write to it efficiently (always just append)
    • can’t query randomly
    • build materialized state using replication stream
      • Apache Samza does this
        • can build materialized view
          • in parallel (so it’s fast)
          • compaction (save disk space)
          • consistent with the base data set
          • immune to race conditions
      • AKA Kappa architecture
  • Benefits:
    1. Better data
      • RDBMSes conflate concerns of reader and writer
      • writes are easy because log is easy to write to
      • reading can be from materialised views that are optimised for whatever reader wants to do
        • as denormalised as you like
        • denormalised data normally has issue of how/when to update based on dependencies
        • having a pure function that takes stream and creates materialized view eliminates uncertainty
      • events are useful for analytics
        • not just “cart has X”, but also the user added X then removed X then added it again
      • historical point-in-time queries
      • recovery from human error
    2. Fully precomputed caches
      • no cold start with lots of missing keys
      • no invalidation
        • logic for updating view based on stream is nicely encapsulated, doesn’t infest business logic everywhere in the system
      • no race conditions
        • stream processors operate sequentially
        • still parallelisable
    3. Streams everywhere
      • instead of request/response, do subscribe/notify
      • subscribe/notify easier with streaming data
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment