Skip to content

Instantly share code, notes, and snippets.

@slpsys
Created September 29, 2015 03:49
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save slpsys/d3ec286a6bb7ce227198 to your computer and use it in GitHub Desktop.
Save slpsys/d3ec286a6bb7ce227198 to your computer and use it in GitHub Desktop.

Keynote, "I See What You Mean" by Peter Alvaro

Synopsis

5/5. This is pretty academic, but it's pretty great, too. It starts with datalog, which heavily inspires his project Dedalus; there's a way better dissection of this talk here.

"Managing Containers at Scale with CoreOS and Kubernetes" by Kelsey Hightower

Synopsis

3/5. This was basically a walkthrough of Kubernetes, which is cool, but not exactly Notes

  • Main topic of discussion: "How would you [re]design your infrastructure" if you could never log in
  • Containers are the MVP unit for deploying apps; VMs overkill
    • Processes, not lightweight VMs
  • Bunch of Docker demo crap
  • Tons of cute names up in this business "kubelet," etc.
  • Pods: logical applications
    • 1+ containers
    • Shared namespaces
    • Like docker compose, etc.?
    • Contains manifest, etc.
    • One-off, by itself
  • Kubernetes scheduler
    • Does obvious things
    • Pluggable
  • Replication controller
    • Used for long-running apps (like web servers)
    • Self-healing
    • Hand pod to replication controller
  • Kubernetes has a p sweet UI
  • Uses Raft, apparently
  • Unclear at this point in time how to avoid issues like we saw where other containers need to be up before health check for main app starts passing...
  • Dude played Tetris to explain bin-packing
  • Pretty ok resource management with cgroups, etc.
  • Fairly easy to implement canary pattern / roll new versions using service manifests
    • kubectl rolling-update --update-period=n00ms

"Immutable Data Science with Datomic, Spark and Kafka" by Konrad Scorciapino and Mauro Lopes

Synopsis

1/5. All I got out of this talk was that they are using Datomic for something, Spark for something, and don't know anything about giving presentations. Notes

  • Using Datomic
  • Kinda cool, I guess
  • Weak talk so far
  • Handwaved over using timestamps on separate dbs for ordering?
  • Interesting question about unifying/enriching data in microservice dbs with point-in-time dbs, though
  • I guess no distributed transaction coordination in Datomic?
  • These dudes put like zero effort into sensible, labeled graphics
    • And they just reused the same colored ball to mean different things in different slides
  • There's like nothing of value in this talk

"When "Worst" is Best (in Distributed Systems)" by Peter Bailis

Synopsis

4/5. Not the world's most technical talk, but I'm a fan of Bailis' work. Adrian Colyer does a blog that summarizes interesting CS papers every day, and he's covered a few of Bailis' work (the latter including Peter Alvaro). A lot of this has to do with consistency guarantees at the margins. This one's more about software in general. Notes

  • What if we designed systems for worst-case scenarios
  • Cluster provisioning: 7B simultaneous users!
  • Hardware: put that shit on Mars
  • Security: all developers malicious!
  • Basic takeaway: designing for the worst case penalizes the average case
  • This talk: When can designing for the worst-case improve average
  • Distsys, beyond the network, lessons
  • Networks make design hard: packets delayed, dropped, can't tell the diff
  • CAP theorem, etc.
  • Coordination-free systems: enable infinite scale-out, improve throughput, ensure low latency, guarantee always-on
  • Follow-up: Harvest & Yield paper
  • Is coordination required for read commited? Research: no! Buffer writes; OOM increase
  • Other scenarios: failure helps devops; automated failover for upgrades, revocation, stragglers, etc.
  • Strong business case for accessibility
  • Universal design
  • Pull quote: defining "normal" defines our designs

"All In With Determinism for Performance and Testing in Distributed Systems" by John Hugg

Synopsis

4/5. As the first line says, this is basically about testing and performance in VoltDB. Notes

  • Basically a VoltDB talk
  • Active-active with some coordination service = logical log
  • Be aware of sources of nondeterminism
  • random numbers, wall clock time
  • record order
  • External systems
  • Non-user sources: bad memory,
  • Voltdb's sql planner understands determinism
  • 100% of DML deterministic
  • R/W transactions slow down and do ...smart things
  • Warn you when you load a ND procedure
  • Coordinator service ensures determinism, sends statements out to all replicas, receives hashes back to ensure consistent results
  • Out-of-band checks with hashes to do consistency checks on replicas
  • ND reads allowed
  • One huge caveat: when a divergence is detected (obviously super-rare), cluster shuts down, essentially
  • Voltdb apparently allows you to partition tables on a column
  • https://github.com/VoltDB/voltdb/tree/master/tests/test_apps/txnid-selfcheck2

"Building Scalable Stateful Services" by Caitie McCaffrey

Synopsis

4/5. This was a really great talk about, in the era of microservices (a common theme), how you can build state into systems without necessarily giving up the sweet scalability that stateless systems enjoy. Notes

  • Services intrinsically have state
  • Moved to the DB for consistently, state coordination leaking into services
  • Stateless services: Data shipping paradigm
  • Benefits
    • Data locality for low-latency/data intensive programs
    • Function shipping paradigm
  • Sticky connections really make a difference in this model; enable read-your-write w/ AP semantics
  • Building sticky connections
    • Persistent connections: load balancing problems, no stickiness when connection breaks
      • Need to implement backpressure for load balancing
    • Routing logic respects stickiness
      • Cluster membership / work distribution are the issues
      • Static cluster membership sux
      • Dynamic cluster membership
        • Gossip protocols
        • Consensus
        • Availability vs. consistency
      • Work distribution
        • Random placement (write anywhere / read everywhere -- Bloom filters?)
        • Consistent hashing
          • Deterministic placement (hash ring)
        • Distributed Hash Tables
  • Facebook Scuba DB
    • Static membership
    • Write to one random node, acts as broker to coordinate writes. After time period (configurable?) returns % of replicas written to, user decides if ok, etc.
    • Read paper for more
  • Uber ringpop
    • App-layer sharding for dispatching platform services
    • Swim/Gossip protocol + Consistent hashing
  • Orleans
    • Gossip protocol + Consistent Hashing + DHT
    • Actor model
    • 100ks of actors per machine
    • First round: consistent hashing to get DHT for actor location, second round: route to machine with actual actor running
  • Hot probs
    • Unbounded data structures: implicit assumptions are the killers of dist systems
    • Memory management
    • Reloading state
      • "Fast restarts at Facebook" paper

"Apache Kafka and the Next 700 Stream Processing Systems" by Jay Kreps

Synopsis

3/5. This is like The Kafka Talk. If you're familiar with Kafka, it's a lot of already-tread ground. If you're not, it can get you up to speed. Notes

  • What is stream processing?
  • Request/Response
  • Batch (data locality huge advantage)
  • Stream processing
    • Doesn't necessarily mean transient, approximate, or lossy
  • Confluent/Kafka trying to address fact that work often pushed down into batch or up into request/response
  • Stream processing: The Hard Parts
    • Partitioning/scalability
    • Semantics & fault tolerance
    • Unifying with tables
    • Time
    • Reprocessing
  • Kafka is like a commit log in a database
  • Stream processing = Logs + App code
  • Log compaction: reducing space for redundant content
  • Kafka streams (KIP-28)

Keynote, "Hopelessness and Confidence in Distributed Systems Design" by Camille Fournier

Synopsis

3.5/5. This was pretty good, and has to do with realistic tradeoffs in designing distributed systems. I've seen Camille present on video, and this talk isn't particularly my favorite of hers. Interesting links are her preparation notes for the talk, and the ETTO Principle, with which I was unfamiliar with.

"Specter: overcome your fear of nested Clojure data" by Nathan Marz

Synopsis

3/5. It was nice to see Nathan talk, but the talk itself isn't super necessary to watch. Specter seems like a really useful library, but it's fairly straightforward[ish] and small enough that I'm not worried if he abandons it like JCascalog. Notes

  • Bank example: user->bank
  • Points out mapv vs. map (e.g. keep input vector a vector on the way out)
  • Specter example predicated on select and transform
    • Agnostic to types
  • This talk is kind of just going through these exercises:
  • A lot of the magic of clojure is "The power of composable abstractions"
    • si
  • collect-one does some weird shit where it sets aside data per-tuple in...somewhere?

Example:

(print-results
   (transform [ALL (collect-one :b) :a even?]
              +
              [{:a 1 :b 3} {:a 2 :b -10} {:a 4 :b 10} {:a 3}]))
  • select etc often a bottleneck; comp-paths allows you to compile paths, rather than have them evaluated each time
  • 0.7.0 now allows you to comp-paths paths that are parameterized

"Transactions: myths, surprises and opportunities" by Martin Kleppmann

Synopsis

4.5/5. I don't give fives. This was a pretty great talk about how ACID meets CAP, and isolation levels in principle and practice. Notes

  • Martin Kleppmann is a little bit of a weird dude
  • Basically chapter 4 of the Data-Intensive Applications book (in the Lumos ebooks site)
  • ACID "more mnemonic than precise" - Brewer
  • Consistency "tossed in to make the acronym work" - Hellerstein
    • Definitely not C in CAP
  • Atomicity: about transactionality, not concurrency
    • Transactions = multi-object atomicity (e.g. roll back writes on abort)
    • Many failure states collapse to "Abort" (network fault, deadlock, constraint violations, etc.)
  • Isolation levels are really implementation details of System R, really
  • SQL Server apparently uses lots of locking for Repeatable Read
  • Most other DBs use snapshot isolation for same level (this is MVCC)
  • SI/RR prevents read skew, S prevents write skew
  • Options:
    • 2PL, lots of locks, pessimistic
    • Literally one, serial writer (VoltDB, redis?)
  • Serializable Snapshot Isolation (SSI)
    • Optimistic, compared to 2PL's pessimistic. Try your transactions, conflict detector blocks conflicting txions, aborts them, they retry
  • Serializability across services
  • Atomic Commitment (2pc, 3pc) <-> Total ordering (atomic broadcast) <-> Consensus (Paxos, raft, zab)
  • This shit is hard globally
  • Without cross-service transactions:
    • Compensating txions: try txion on one, txion on another fails, roll back first txion in app (:grimace:)
    • Apologies: detect & fix constraint violations (literally, apologize / reconcile)
  • Levels of consistency:
    • Serialible - total order
    • Causal - necessary items "happen before" are ordered correctly
    • Eventually consistent - commutative / associative operations that eventually
  • The latter part of this is

"Distributed, Eventually Consistent Computations" by Christopher Meiklejohn

Synopsis

5/5. Ok, I give fives. Super good talk about Meiklejohn's work on LASP; it's basically an advanced talk on stuffing your computations into CRDTs for consistency. Notes

  • Synchronization is expensive, but enforces order, eliminates accidental nondeterminism, etc.
  • IoT makes things difficult, mobile gaming (yep)
  • Zero synchronization would be greaaaat
  • Perhaps start with zero synchronization, and only add it later
  • Goal is to get to strong sharing, but weak
  • We can think of consistency models as analogous to programming paradigms
  • Handling physical time in applications is difficult
  • Time has three major avatars in computing
    • Mutable state in sequential systems
    • Nondeterminism in concurrent systems
    • Network latency in distributed systems
  • "Time is not friction" -- reduce as much as possible, but intrinsic to the problem, like a car engine hates friction, but is necessary to propel the car
  • Strong Eventual Consistency (SEC)
    • Idempotent
    • Commutative (and associative)
  • Basically CRDTs
  • Programming SEC: eliminate accidental nondeterminism (how do we model non-monotonicity in a monotonic way)
  • Under concurrency, CRDTs will converge
  • CRDTs operate per-object, not over composition
  • Set modification: Removes modeled with tombstone sets
  • Lattice processing (Lasp)
    • Distributed, deterministic dataflow
    • Functional / set-theoretic operations on sets
    • Metadata computation
      • Performs transformation on the internal metadata so output contains metadata input necessary to continue to be a CRDT
  • Holy god, this dude is blowing through all kinds of madness
  • Selective hearing
    • Epidemic broadcast-based runtime system
    • Basically, CRDTs + Gossip protocols
    • Called selective hearing, because you can pick and choose what to compute over
  • Advertising counter: clients go offline, still shown ads, want to charge advertisers based on convergence of impression data
  • We've built up from zero synchronization, _instead of working to remove synchronization

"A History of Programming Languages for 2 Voices" by David Nolen and Michael Bernstein

Synopsis

6/5. In which David and Michael introduce a roomful of nerds to free jazz and blast Merzbow at them. Everyone should watch this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment