Skip to content

Instantly share code, notes, and snippets.

@apg
Last active August 29, 2015 14:08
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save apg/02aeaeee11742f3aa3bd to your computer and use it in GitHub Desktop.
Save apg/02aeaeee11742f3aa3bd to your computer and use it in GitHub Desktop.
Notes from Ricon 2014

Ricon 2014 Notes

Keynote: Marten Mickos (SVP and General Manager, HP Cloud): Open Source Wins

Started with RDBMSes and needed to “scale out”.

  • Why? Well, we couldn’t scale up anymore. (talking about MySQL)

Composibility of simple components

Distributed (not monolithic)

Design for failure (Assume everything is always going to fail)

  • Shift in trust. (> more reliable)
  • Old: Hardware > Software
  • New: Software > Hardware

Open Source Projects still need a chief architect (Torvalds, etc)

David Pick (Braintree): Building a Real-Time Data Pipeline with Clojure and Kafka

Have a data warehouse using Redshift, synced from Production database.

  • Easily get out of sync, and deletes are impossible to track (sometimes due to batch processes, etc)

Thought about reading the postgres replication log

  • Required waiting for newer postgres…

Instead adopted a triggering system (Longis? - not sure what it was called)

  • The trigger puts it into pgq, which holds for a small time.
    • pgq has great features:
      • ACID
      • Fast
      • Persistant
    • But, pgq can’t keep stuff forever, an hour-ish, which doesn’t give you much buffer.

Enter Kafka with pgq

(Database) ----pgq----> (Kafka) ------> (Redshift)
                               \______> (Elastic Search)
                                \______> Wherever else

Postgres is shareded by merchant_id, which is also the partition setup in Kafka

Advice

  • “Don’t use default configs” – they’re almost never what you want
  • “Avoid race conditions”

Christophe Meiklejohn (Basho): Eventually Consistency with CRDTs

Derflow: Distributed deterministic dataflow programming for Erlang (to appear in SIGPLAN ‘14)

Sean Cribbs (Basho): A Brief History of Time in Riak

Band recommendation: Lost in the Trees

Latency from SF -> NY = 14ms

Lamport defines the “happens before” relationship, logical time.

  • “Time, Clocks and the Ordering of Events in a Distributed System”
  • Logical clocks, not physical

1983: Stott Parker, et. al, “Detection of Mutal Inconsistency in a Distributed System”

VERSION VECTORS ARE NOT VECTOR CLOCKS

Comparing version vectors [1 3 2] >= [1 3 1]

  • Descends A >= B
  • Dominates A > B
  • Concurrent A | B ( [2 3 2] | [2 3 1 1] )

Merging version vectors

  • A merge B = C (pairwise maximum)
  • C >= A and C >= B
  • A | B ====> C > A and C > B

Paper: Dotted Version Vectors: Efficient Causality Tracking, 2012

“Time can’t move backwards”

Paper: Global Version Vectors (to appear)

Wes Chow (CTO, Chartbeat) A|B Testing with a Multi-Armed Bandit

Thompson Sampling

  • Always pick the best
  • Update your beliefs
  • How do you know which is the best?
  • Answer: Throw darts at the different distributions that you know, and pick the biggest (then you update your knowledge when you get new information, which changes the distribution, which changes the probability that you’ll select the best one, and it eventually converges to something)

Linear Probabilistic Counters

  • Count number of unique things without a lot of storage
  • Hash unique identifier % m, set bit
  • Unique Count = - m ln ( (m - w) / m ) m = size of a bit vector w = weight (number of 1s in bit vector)
  • Pitfall: when w = m, useless.
  • Pitfall: when w ~= m, you’ve got an upper bound (e.g. w = 15, m = 16, 44 max estimate)

HyperLogLog

  • Good for mid 100,000 - 1 billion or so, use a linear counter for less

Peter Cline (Comcast): Building Robust Distributed Systems When You’re Not Initially an Expert

Needed to build a large system requiring consensus

Read and understood Paxos, decided to implement

  • Should we be doing this?
  • “Have you considered ZooKeeper?”
  • They did. They did. Continued

Lesson: Do your homework

Lesson: Shoot for “just the right” engineering at the right time (i.e. the minimum effort that doesn’t have to change)

  • The WAL was initially just “|” delimited text.
    • Made debugging easier, eventually became a bottleneck, and then they replaced it.

Lesson: You aren’t testing enough. Do more of it.

  • Things failed in ways they could have tested (and now do).

Lesson: “No exponents anywhere”

Lesson: It’s going to break in ways you can’t even imagine.

  • A single NIC took down their entire cluster when it went from 10Gb to ~5Mb when it was starting to fail. Failed acknowledgements causing resends due to data not being sent completely (bandwidth loss) before timeout.
  • Be flexible operationally and ensure things don’t happen again.

Colin Hemmings (Dataloop.io) Event Processing with Riak

Anecdotal talk, discussing current architecture, etc. Not much learnings yet, small.

Neha Narula (PhD Candidate CSAIL, MIT): Multicore and Distributed Systems: Sharing Ideas to Build a Scalable Database

Phase Reconciliation, current research (http://pdos.csail.mit.edu/papers/doppel:osdi14.pdf)

Transactions: Serializable and Atomic

Applications contend

Concurrency control forces serial execution

Optimistic concurrency control

     |
     |        _________
txns |    . '
     |  /                         
     | / __________________ cores 

  • switches to a split phase. updates to contended items modify per-core state, proceeding in parallel on different cores
  • rewrites transactions such that they can be executed in phases
  • works with things that commute, but right now INCR, MULT, etc.
  • Reconcile in a more serial manner.
  • Works best for situations where there are few popular records (i.e. Retweet, Favorite counts on extremely popular tweets)

Keynote: Peter Alvaro (PhD Candidate, UC Berkeley): Outwards From the Middle of the Maze

“The Bottom-up Ethos”

“When do we get our guarantees back?” – e.g. transactions, etc

“Composition is the last hard problem”

  • How do you compose guarantees?

Distributed systems are HARD

  • Asynchrony and Partial failover
    • Think about both at the same time, or the system sucks
  • FUNDAMENTAL UNCERTAINTY

CRDTS

  • Associativity, Commutative, Idempotent

Confluent Components

  • They compose!

Reordering, batching, retry/duplication

CALM Theorem

  • Monotonicity -> confluent

Molly: Lineage driven fault rejection.

  • Molly (to appear) is a proof like tool which uses lineage and a SAT solver to test a crap ton of possible interactions in a system and reject (with explanation) why it doesn’t work, or proves it does.
  • Would have found the epic Kafka data loss bug
  • Proves 2PC has problems, others
  • Seems to actually work. No idea how hard it is to actually use, or build models.

Dedalus language

Fault tolerance

  • reasons backwards from outcomes using lineage.

Aysylu Greenberg (Google): Benchmarking: You’re Doing it Wrong! (http://www.slideshare.net/aysylu/benchmarking-youre-doing-it-wrong-ricon-2014)

You’re wrong about the machines

“Caches all the way down”

  • Are you sure you’re not testing the cache? L1, L2, L3 might hold all your data.

What’s the goal of the benchmark?

  • Do that, and test for that.

Warm up and Timing

  • Perhaps don’t measure immediately if you don’t want warmup
  • You might be interested in this though, if you’re benchmarking things related to restart, etc.

Periodic Interference: e.g. GC Pauses, etc

  • Avoid by measuring throughput, not individual latencies

Test != Prod (in most cases)

Power mode changes

  • Is the CPU scaling you up or down for some reason do things (e.g. 1Ghz, to 3Ghz)

You’re wrong about the stats

Two few samples

  • Convergence of median on samples
  • Know your time scales

Mean not robust with outliers, percentiles are better.

Gaussian (not)

Percentiles in multimodal distributions are useless

  • Look at histograms (Mode detection algorithms!)

Investigate outliers

  • Possible coordinated omission (serial execution might be 1ms apart, an outlier might be 80ms later, which might not be good)
  • Outliers may indicate larger problems, even if they are outliers. Why are they so different. Figure it out.

You’re wrong about what matters

Premature Optimization: 97% of inefficiencies aren’t worth worrying about. Focus on the 3%

Unrepresentative workloads, timescales

Memory pressure: GC after benchmark setup. Then after to measure memory pressure.

Load Balancing

Reproducibility of measurements

Becoming Less Wrong

User Actions Matter

  • X > Y for workload Z, with trade offs A, B, and C

Profiling, Code instrumentation

Aggregate over logs

Traces (dist)

Microbenchmarks

  • Narrow questions, misleading results
  • Not representative of entire program
  • “Microbenchmarks should be Microtrusted”
  • Choose your N wisely — “Caching all the way down”
  • Measure side effects
  • Clock resolution – combine and divide
  • Do constant work per iteration

Followup

Takeaway #1: Caching all the way down

Takeaway #2: Outliers matter

Takeaway #3: Workload

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment