Skip to content

Instantly share code, notes, and snippets.

@j14159
Last active August 29, 2015 14:15
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save j14159/19d100a556effacd1475 to your computer and use it in GitHub Desktop.
Save j14159/19d100a556effacd1475 to your computer and use it in GitHub Desktop.

I'm putting this list together as a sort of reading plan for myself in order to learn more about general cluster scheduling/utilization and various ways of generically programming to them. Lists of direct links to PDFs here in the order I think makes some sense from skimming reference sections.

Happy to here of any additions that might be sensible.

The Basics

  1. Google File System since everything references it and data locality is a thing.
  2. Google MapReduce because it's one of the earlier well-known functional approaches to programming against a cluster.
  3. Dryad for a more general (iterative?) programming model.
  4. Quincy for a different take on scheduling.
  5. Delay Scheduling for another approach to scheduling.

Dedicated Cluster Schedulers

  1. Mesos
  2. Omega

More Programming Models

  1. DryadLINQ for a higher-level approach.
  2. Pregel for graph processing.
  3. MapReduce Online for an iterative MapReduce.
  4. Distributed GraphLab for an approach that apparently embraces asynchrony.

A bit more general

  1. The Datacenter as a Computer
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment