Skip to content

Instantly share code, notes, and snippets.

  • Star 2 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save mrzasa/58e3ffe58d2cdb5054c94406be136cda to your computer and use it in GitHub Desktop.
distributed systems readings

#Distributed System Course List



[] Brown CSCI 2270 - Advanced Topics in Database Management - no notes but a good set of [] readings. Does a good job of categorising the last ~15 years of distributed and parallel database research which has moved away from shared-something RDBMSs.

##Distributed Algorithms

  • MIT 6.852 - Distributed Algorithms - Goodish lecture slides, detailed but manageable set of readings and some homework problems. Lectured by Professor Lynch at MIT, who literally wrote the book on the subject.

[] Distributed Algorithms Lecture Notes - Very readable set of lecture notes on distributed algorithms, for a course given in 1993 at the Technion in Israel (I think).

[] MIT 6.885 - Distributed Algorithms for Mobile Wireless Ad-Hoc Networks - One of the only courses on this particular niche subject. Taught simultaneously by Jennifer Welch and Nancy Lynch. Notes are very good, reading lists is very comprehensive and there are also some good handouts!

##Data Structures and Algorithms

[] UIUC CS573: Algorithmic Game Theory - good scribe notes and a pointer to a massive online text book.

##Discrete Mathematics and Probability

  • MIT 6.042J (OCW) - Elementary discrete maths, including graph theory and some combinatorics. Lecture slides are available, and good, but the real meat is in the readings.

#Paper List

##Thought Provokers

Ramblings that make you think about the way you design. Not everything can be solved with big servers, databases and transactions.

[] Harvest, Yield and Scalable Tolerant Systems- Real world applications of CAP from Brewer et al

[] On Designing and Deploying Internet Scale Services - James Hamilton

[] Latency Exists, Cope!

  • Commentary on coping with latency and it's architectural impacts

[] Latency - the new web performance bottleneck - not at all new (see * Patterson), but noteworthy

[] The Perils of Good Abstractions- Building the perfect API/interface is difficult

[] Chaotic Perspectives

  • Large scale systems are everything developers dislike - unpredictable, unordered and parallel

[] Website Architecture

  • A collection of scalable architecture papers from various of the large websites

[] Data on the Outside versus Data on the Inside - Pat Helland

[] Memories, Guesses and Apologies - Pat Helland

[] SOA and Newton's Universe - Pat Helland

[] Building on Quicksand - Pat Helland

[] Why Distributed Computing? - Jim Waldo

[] A Note on Distributed Computing - Waldo, Wollrath et al

[] Stevey's Google Platforms Rant - Yegge's SOA platform experience


Somewhat about the technology but more interesting is the culture and organization they've created to work with it.

[] A Conversation with Werner Vogels - Coverage of Amazon's transition to a service-based architecture

[] Discipline and Focus - Additional coverage of Amazon's transition to a service-based architecture

[] Vogels on Scalability


Current "rocket science" in distributed systems.

[] Chubby Lock Manager

[] Large-scale Incremental Processing Using Distributed Transactions and Notifications


Interesting they dumped most of J2EE and use a lot of db partitioning. Check out their site upgrade tool as well.

##Consistency Models

Key to building systems that suit their environments is finding the right tradeoff between consistency and availability.

[] CAP Conjecture - Consistency, Availability, Parition Tolerance cannot all be satisfied at once

[] Consistency, Availability, and Convergence - Proves the upper bound for consistency possible in a typical system

[] CAP Twelve Years Later: How the "Rules" Have Changed - Eric Brewer expands on the original tradeoff description

[] Consistency and Availability - Vogels

[] Eventual Consistency - Vogels

[] Avoiding Two-Phase Commit

  • Two phase commit avoidance approaches

[] 2PC or not 2PC, Wherefore Art Thou XA?

  • Two phase commit isn't a silver bullet


Papers that describe various important elements of distributed systems design.

[] Distributed Computing Economics - Jim Gray

[] Rules of Thumb in Data Engineering - Jim Gray and Prashant Shenoy

[] Unreliable Failure Detectors for Reliable Distributed Systems. A method for handling the challenges of FLP

[] Lamport Clocks - How do you establish a global view of time when each computer's clock is independent

##Languages and Tools

Issues of distributed systems construction with specific technologies.


[] Principles of Robust Timing over the Internet - Managing clocks is essential for even basics such as debugging


[] Consistent Hashing and Random Trees

##Paxos Consensus

Understanding this algorithm is the challenge. I would suggest reading "Paxos Made Simple" before the other papers and again afterward.

[] Paxos Made Simple - Leslie Lamport

Other Consensus Papers

##Gossip Protocols (Epidemic Behaviours)

[] How robust are gossip-based communication protocols?


  • Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications

  • Kademlia: A Peer-to-peer Information System Based on the XOR Metric

  • Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems

[] PAST: A large-scale, persistent peer-to-peer storage utility - storage system atop Pastry

  • SCRIBE: A large-scale and decentralised application-level multicast infrastructure - wide area messaging atop Pastry
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment