Skip to content

Instantly share code, notes, and snippets.

@johnpmitsch
Last active October 10, 2020 01:15
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save johnpmitsch/4d29eedb0f44e9439c3dd6a1f6dcc5da to your computer and use it in GitHub Desktop.
Save johnpmitsch/4d29eedb0f44e9439c3dd6a1f6dcc5da to your computer and use it in GitHub Desktop.
Bradfield CS Distributed Systems

Distributed Systems

Lesson 1 - Introduction

  • Systems today are generally distributed, both in-house and third-party systems
  • Stream processing is batch processing but in smaller increments, it isn't real-time, but is quick
  • Project, work as pre and post work
  • Reliability
  • Scalability
  • Maintainability

8,465.38

Lesson 2 - Background knowledge review

  • 2^10 1000
  • how much ram addressable in 32 bits: 2^32 4GB
  • 11 Base64 characters: 2^6 6 bits of information 66 bits total
  • byte: 8 bits, register: 32 or 64 bit, cache line: 32 bytes or 64 bytes, disk sector: 4kb block?
  • 1 clock cycle per instruction, L1: .7 ns, L2:
  • threads share memory and processes get their own. Thread can be more flexibly used, process is an active program
  • can't share process memory, gets its own virtual memory

Lesson 3 - Communication

  • Language specific serialization (marshall, pickle, etc...): Can be insecure, can be slow, also assumes you are serializing according to language specs
  • Need to consider backwards compatability (new version reads old schema) and forward compatability (old version reads new schema)
  • Does communication have with textual or binary format?
    • Textual: no data validation, larger size, but easier

Lesson 5 - Replication

  • single leader replication:
    • less latency
    • durable/availability/fault-tolerance
    • throughput
  • Simple solution: primary and replica db
  • Synchronous transactions: replica is written, then transaction is considered ok
  • Asynchronous transactions: leader is written, transaction has succeeded, replicas then write asynchronously
  • ship statements to replicas: This works, but breaks down with non-deterministic functions like random or time functions
  • ship logs:
    • write-ahead log: write to the log then make the transaction
    • these logs can just be shipped to the replicas and replayed on them
    • write ahead log is on disk, why write to it on disk, it writes sequentially
    • data shipping, build a logical representation of the statements
  • synchronous replication: takes a long time, increases load, when they run slowly transactions take forever.
  • semi-synchronous replication: Wait for only 1 or some of the replicas to write and then consider that successful

Lesson 6 - Partitioning

  • Partitioning spreads out data, this can be independent of replication.
  • Partition boundaries are chosen, can be key-based. This is similar to the alphabetical index of an encyclopedia
  • The partition boundaries can be fixed and manually set or rebalanced automatically
  • key range can lead to hot spots, e.g. timestamp keys could mean the most recent ones are used
  • Hash-based partitioning: Use a hash on the key and partition based on that
    • A disadvantage of this is range-based queries are more spread out
  • Keys that are hot spots can have a random number appended or prepended to it, this can alleviate some of the issues of hot spots. Usually is selectively assigned as it will spread out related data.
  • Secondary indexes: indexes on the partition itself
  • Rebalancing: a good strategy to prepare for more nodes being added later is to add more partitions to the nodes than what is needed.
  • service discovery: how does a client find which node has the data? It can be handled by adding the mapping to the actual nodes, having a routing table, or having the client have knowledge of the table.

Lesson 7 - Transactions

  • Kademlia: early P2P distributed hash table https://en.wikipedia.org/wiki/Kademlia 1 hop per bit
  • DB Transactions should be isolated from each other
  • Queries should not see in-flight data before it's commited
  • ACID: Atomicity, Consistency, Isolation, and Durability: Sometimes loose-ly defined but the guarantee by some databases
  • Write-ahead log helps acheive atomicity and durability
  • The standard is serializable isolation:
    • Fully serial is don't start one transaction until another one happens
    • Locking can help appear fully serial, the lock manager does "two-phase" locking, which tracks read and write locks. Acquire locks then distribute them.
    • Locking usually happens on granular level: lock table, row, pages (disk chunks), on change as transcation happens
    • Can be slow, either getting too many locks or a slow queue due to large amount of locks
  • Deadlocking is when two locks happen and both are waiting on each other
  • database isolation, need to retry transactions sometimes (most ORMs don't)
  • Database need to be set to serializiable
  • Read committed means that only commmited data can be read: no dirty reads or dirty writes
  • Phantom read: Brand new data was inserted into a range you were depending on. Can happened during read committed
  • "snapshot isolation": store all the data and only expose

Lesson 8 - Consistency

  • linearizability: loosely defined as the system acting as if there is only one copy of the data
  • http://jepsen.io/ finds faults in distributed systems
  • CAP theorem:
    • Consistency: clients see the same data
    • Availability: system continues when nodes are down
    • partition tolerant: system continues even in network failures
  • Can't acheive full linearizability, but may have to give up some availability
  • Making a fully linearizability will have a lot of coordination overhead and will always be limited by the slowest node
  • Eventually consistency: At some point the system will be consistent
  • Casual consistency: The system doesn't need every action to be in sequence, but some need to be consistent within a certain constraint. For instance, each user has their data on one server.
  • exercise:
  • linearizable: 2,5
  • eventual consistency home: 0,1,2,3,4,5 visitors: 0,1,2
  • consistent prefix:

Lesson 9 - Distributed Transactiions and Consensus

  • Could get split brain. Leader replication, leader fails, new leader updates, previous leader gets back up
  • two-phased commit: Distributed transactions
    • first phase: can I commit?
    • second phase: actually send message
  • leader, should we have one coordinator? Could change based on epoch/timestamp/number
  • nodes can elect a leader
  • Raft consensus algorithm:
    • All nodes start as a leader
    • They elect a new leader if they don't hear from one, they can try to be the leader

Lesson 11 - Batch processing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment