slpsys/Strangeloop Day 1 Notes.markdown

## Strangeloop Day 1 Notes.markdown

      
    Raw
  

              Strangeloop Day 1 Notes.markdown
            
          
    Keynote, "I See What You Mean" by Peter Alvaro

Video

Synopsis

5/5. This is pretty academic, but it's pretty great, too. It starts with datalog, which heavily inspires his project Dedalus; there's a way better dissection of this talk here.
"Managing Containers at Scale with CoreOS and Kubernetes" by Kelsey Hightower

Video

Synopsis

3/5. This was basically a walkthrough of Kubernetes, which is cool, but not exactly
Notes


Main topic of discussion: "How would you [re]design your infrastructure" if you could never log in
Containers are the MVP unit for deploying apps; VMs overkill

Processes, not lightweight VMs


Bunch of Docker demo crap
Tons of cute names up in this business "kubelet," etc.
Pods: logical applications

1+ containers
Shared namespaces
Like docker compose, etc.?
Contains manifest, etc.
One-off, by itself


Kubernetes scheduler

Does obvious things
Pluggable


Replication controller

Used for long-running apps (like web servers)
Self-healing
Hand pod to replication controller


Kubernetes has a p sweet UI
Uses Raft, apparently
Unclear at this point in time how to avoid issues like we saw where other containers need to be up before health check for main app starts passing...
Dude played Tetris to explain bin-packing
Pretty ok resource management with cgroups, etc.
Fairly easy to implement canary pattern / roll new versions using service manifests

kubectl rolling-update --update-period=n00ms


"Immutable Data Science with Datomic, Spark and Kafka" by Konrad Scorciapino and Mauro Lopes

Video

Synopsis

1/5. All I got out of this talk was that they are using Datomic for something, Spark for something, and don't know anything about giving presentations.
Notes


Using Datomic
Kinda cool, I guess
Weak talk so far
Handwaved over using timestamps on separate dbs for ordering?
Interesting question about unifying/enriching data in microservice dbs with point-in-time dbs, though
I guess no distributed transaction coordination in Datomic?
These dudes put like zero effort into sensible, labeled graphics

And they just reused the same colored ball to mean different things in different slides


There's like nothing of value in this talk

"When "Worst" is Best (in Distributed Systems)" by Peter Bailis

Video

Synopsis

4/5. Not the world's most technical talk, but I'm a fan of Bailis' work. Adrian Colyer does a blog that summarizes interesting CS papers every day, and he's covered a few of Bailis' work (the latter including Peter Alvaro). A lot of this has to do with consistency guarantees at the margins. This one's more about software in general.
Notes


What if we designed systems for worst-case scenarios
Cluster provisioning: 7B simultaneous users!
Hardware: put that shit on Mars
Security: all developers malicious!
Basic takeaway: designing for the worst case penalizes the average case
This talk: When can designing for the worst-case improve average
Distsys, beyond the network, lessons
Networks make design hard: packets delayed, dropped, can't tell the diff
CAP theorem, etc.
Coordination-free systems: enable infinite scale-out, improve throughput, ensure low latency, guarantee always-on
Follow-up: Harvest & Yield paper
Is coordination required for read commited? Research: no! Buffer writes; OOM increase
Other scenarios: failure helps devops; automated failover for upgrades, revocation, stragglers, etc.
Strong business case for accessibility
Universal design
Pull quote: defining "normal" defines our designs

"All In With Determinism for Performance and Testing in Distributed Systems" by John Hugg

Video

Synopsis

4/5. As the first line says, this is basically about testing and performance in VoltDB.
Notes


Basically a VoltDB talk
Active-active with some coordination service = logical log
Be aware of sources of nondeterminism
random numbers, wall clock time
record order
External systems
Non-user sources: bad memory,
Voltdb's sql planner understands determinism
100% of DML deterministic
R/W transactions slow down and do ...smart things
Warn you when you load a ND procedure
Coordinator service ensures determinism, sends statements out to all replicas, receives hashes back to ensure consistent results
Out-of-band checks with hashes to do consistency checks on replicas
ND reads allowed
One huge caveat: when a divergence is detected (obviously super-rare), cluster shuts down, essentially
Voltdb apparently allows you to partition tables on a column
https://github.com/VoltDB/voltdb/tree/master/tests/test_apps/txnid-selfcheck2

"Building Scalable Stateful Services" by Caitie McCaffrey

Video

Synopsis

4/5. This was a really great talk about, in the era of microservices (a common theme), how you can build state into systems without necessarily giving up the sweet scalability that stateless systems enjoy.
Notes


Services intrinsically have state
Moved to the DB for consistently, state coordination leaking into services
Stateless services: Data shipping paradigm
Benefits

Data locality for low-latency/data intensive programs
Function shipping paradigm


Sticky connections really make a difference in this model; enable read-your-write w/ AP semantics
Building sticky connections

Persistent connections: load balancing problems, no stickiness when connection breaks

Need to implement backpressure for load balancing


Routing logic respects stickiness

Cluster membership / work distribution are the issues
Static cluster membership sux
Dynamic cluster membership

Gossip protocols
Consensus
Availability vs. consistency


Work distribution

Random placement (write anywhere / read everywhere -- Bloom filters?)
Consistent hashing

Deterministic placement (hash ring)


Distributed Hash Tables


Facebook Scuba DB

Static membership
Write to one random node, acts as broker to coordinate writes. After time period (configurable?) returns % of replicas written to, user decides if ok, etc.
Read paper for more


Uber ringpop

App-layer sharding for dispatching platform services
Swim/Gossip protocol + Consistent hashing


Orleans

Gossip protocol + Consistent Hashing + DHT
Actor model
100ks of actors per machine
First round: consistent hashing to get DHT for actor location, second round: route to machine with actual actor running


Hot probs

Unbounded data structures: implicit assumptions are the killers of dist systems
Memory management
Reloading state

"Fast restarts at Facebook" paper


"Apache Kafka and the Next 700 Stream Processing Systems" by Jay Kreps

Video

Synopsis

3/5. This is like The Kafka Talk. If you're familiar with Kafka, it's a lot of already-tread ground. If you're not, it can get you up to speed.
Notes


What is stream processing?
Request/Response
Batch (data locality huge advantage)
Stream processing

Doesn't necessarily mean transient, approximate, or lossy


Confluent/Kafka trying to address fact that work often pushed down into batch or up into request/response
Stream processing: The Hard Parts

Partitioning/scalability
Semantics & fault tolerance
Unifying with tables
Time
Reprocessing


Kafka is like a commit log in a database
Stream processing = Logs + App code
Log compaction: reducing space for redundant content
Kafka streams (KIP-28)


## Strangeloop Day 2 Notes.markdown

      
    Raw
  

              Strangeloop Day 2 Notes.markdown
            
          
    Keynote, "Hopelessness and Confidence in Distributed Systems Design" by Camille Fournier

Video

Synopsis

3.5/5. This was pretty good, and has to do with realistic tradeoffs in designing distributed systems. I've seen Camille present on video, and this talk isn't particularly my favorite of hers. Interesting links are her preparation notes for the talk, and the ETTO Principle, with which I was unfamiliar with.
"Specter: overcome your fear of nested Clojure data" by Nathan Marz

Video

Synopsis

3/5. It was nice to see Nathan talk, but the talk itself isn't super necessary to watch. Specter seems like a really useful library, but it's fairly straightforward[ish] and small enough that I'm not worried if he abandons it like JCascalog.
Notes


Bank example: user->bank
Points out mapv vs. map (e.g. keep input vector a vector on the way out)
Specter example predicated on select and transform

Agnostic to types


This talk is kind of just going through these exercises:

https://github.com/nathanmarz/specter-demo/blob/master/src/specter_demo/examples.clj


A lot of the magic of clojure is "The power of composable abstractions"

si


collect-one does some weird shit where it sets aside data per-tuple in...somewhere?

Example:
(print-results
   (transform [ALL (collect-one :b) :a even?]
              +
              [{:a 1 :b 3} {:a 2 :b -10} {:a 4 :b 10} {:a 3}]))


select etc often a bottleneck; comp-paths allows you to compile paths, rather than have them evaluated each time
0.7.0 now allows you to comp-paths paths that are parameterized

"Transactions: myths, surprises and opportunities" by Martin Kleppmann

Video

Synopsis

4.5/5. I don't give fives. This was a pretty great talk about how ACID meets CAP, and isolation levels in principle and practice.
Notes


Martin Kleppmann is a little bit of a weird dude
Basically chapter 4 of the Data-Intensive Applications book (in the Lumos ebooks site)
ACID "more mnemonic than precise" - Brewer
Consistency "tossed in to make the acronym work" - Hellerstein

Definitely not C in CAP


Atomicity: about transactionality, not concurrency

Transactions = multi-object atomicity (e.g. roll back writes on abort)
Many failure states collapse to "Abort" (network fault, deadlock, constraint violations, etc.)


Isolation levels are really implementation details of System R, really
SQL Server apparently uses lots of locking for Repeatable Read
Most other DBs use snapshot isolation for same level (this is MVCC)
SI/RR prevents read skew, S prevents write skew
Options:

2PL, lots of locks, pessimistic
Literally one, serial writer (VoltDB, redis?)


Serializable Snapshot Isolation (SSI)

Optimistic, compared to 2PL's pessimistic. Try your transactions, conflict detector blocks conflicting txions, aborts them, they retry


Serializability across services
Atomic Commitment (2pc, 3pc) <-> Total ordering (atomic broadcast) <-> Consensus (Paxos, raft, zab)
This shit is hard globally
Without cross-service transactions:

Compensating txions: try txion on one, txion on another fails, roll back first txion in app (:grimace:)
Apologies: detect & fix constraint violations (literally, apologize / reconcile)


Levels of consistency:

Serialible - total order
Causal - necessary items "happen before" are ordered correctly
Eventually consistent - commutative / associative operations that eventually


The latter part of this is

"Distributed, Eventually Consistent Computations" by Christopher Meiklejohn

Video

Synopsis

5/5. Ok, I give fives. Super good talk about Meiklejohn's work on LASP; it's basically an advanced talk on stuffing your computations into CRDTs for consistency.
Notes


Synchronization is expensive, but enforces order, eliminates accidental nondeterminism, etc.
IoT makes things difficult, mobile gaming (yep)
Zero synchronization would be greaaaat
Perhaps start with zero synchronization, and only add it later
Goal is to get to strong sharing, but weak
We can think of consistency models as analogous to programming paradigms
Handling physical time in applications is difficult
Time has three major avatars in computing

Mutable state in sequential systems
Nondeterminism in concurrent systems
Network latency in distributed systems


"Time is not friction" -- reduce as much as possible, but intrinsic to the problem, like a car engine hates friction, but is necessary to propel the car
Strong Eventual Consistency (SEC)

Idempotent
Commutative (and associative)


Basically CRDTs
Programming SEC: eliminate accidental nondeterminism (how do we model non-monotonicity in a monotonic way)
Under concurrency, CRDTs will converge
CRDTs operate per-object, not over composition
Set modification: Removes modeled with tombstone sets

Otherwise everything is convergent on its own
https://pages.lip6.fr/Marc.Shapiro/papers/RR-6956.pdf


Lattice processing (Lasp)

Distributed, deterministic dataflow
Functional / set-theoretic operations on sets
Metadata computation

Performs transformation on the internal metadata so output contains metadata input necessary to continue to be a CRDT


Holy god, this dude is blowing through all kinds of madness
Selective hearing

Epidemic broadcast-based runtime system
Basically, CRDTs + Gossip protocols
Called selective hearing, because you can pick and choose what to compute over


Advertising counter: clients go offline, still shown ads, want to charge advertisers based on convergence of impression data
We've built up from zero synchronization, _instead of working to remove synchronization

"A History of Programming Languages for 2 Voices" by David Nolen and Michael Bernstein

Video

Synopsis

6/5. In which David and Michael introduce a roomful of nerds to free jazz and blast Merzbow at them. Everyone should watch this.