Skip to content

Instantly share code, notes, and snippets.

@spikegronim
Created November 27, 2012 18:57
Show Gist options
  • Save spikegronim/4156253 to your computer and use it in GitHub Desktop.
Save spikegronim/4156253 to your computer and use it in GitHub Desktop.
Spike's favorite distributed systems papers
Time, Clocks, and the Ordering of Events in a Distributed System
Leslie Lamport
http://cnlab.kaist.ac.kr/~ikjun/data/Course_work/CS642-Distributed_Systems/papers/lamport1978.pdf
This is a foundational paper because it introduces the idea of event ordering without referring to wall clock time.
Bigtable: A Distributed Storage System for Structured Data
Jeffrey Dean et al
http://research.google.com/archive/bigtable.html
A real world implementation of a complex distributed system. Influenced a lot of other projects - HBase is an implementation of BigTable.
Dynamo: Amazon’s Highly Available Key-value Store
http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
Giuseppe DeCandia et al
Another real world implementation. This was influential because it promoted the ideas of eventual consistency and application specific conflict resolution. Cassandra, Voldemort, and others were all influenced by this paper.
MapReduce: Simplified Data Processing on Large Clusters
Jeffrey Dean et al
http://research.google.com/archive/mapreduce.html
Probably don't need to explain why this one is influential.
The Google File System
Sanjay Ghemawat et al
http://research.google.com/archive/gfs.html
The storage system for MapReduce. HDFS is an implementation of this.
Paxos Made Simple
http://www.cs.utexas.edu/users/lorenzo/corsi/cs380d/past/03F/notes/paxos-simple.pdf
Leslie Lamport
Paxos is a family of protocols for distributed consensus / transactions. Influential for two major reasons: advocates representing your system explicitly as a state machine, and proves the correctness of a family of "distributed transaction" protocols. Paxos lets you run a state machine on multiple machines such that the sequence of states seen by each machine is the same.
The Byzantine Generals Problem
Leslie Lamport et al
http://reference.kfupm.edu.sa/content/b/y/the_byzantine_generals_problem__55988.pdf
This paper proves the lower bound on the number of functioning machines required for a system to function correctly. It also defines a fault tolerance model where "correct" is a finite state machine and "faults" are ANYTHING outside the finite set of correct states. Leslie Lamport loves state machines.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment