Create a gist now

Instantly share code, notes, and snippets.

What would you like to do?

I'm putting this list together as a sort of reading plan for myself in order to learn more about general cluster scheduling/utilization and various ways of generically programming to them. Lists of direct links to PDFs here in the order I think makes some sense from skimming reference sections.

Happy to here of any additions that might be sensible.

The Basics

  1. Google File System since everything references it and data locality is a thing.
  2. Google MapReduce because it's one of the earlier well-known functional approaches to programming against a cluster.
  3. Dryad for a more general (iterative?) programming model.
  4. Quincy for a different take on scheduling.
  5. Delay Scheduling for another approach to scheduling.

Dedicated Cluster Schedulers

  1. Mesos
  2. Omega

More Programming Models

  1. DryadLINQ for a higher-level approach.
  2. Pregel for graph processing.
  3. MapReduce Online for an iterative MapReduce.
  4. Distributed GraphLab for an approach that apparently embraces asynchrony.

A bit more general

  1. The Datacenter as a Computer
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment