Extreme Computing Exam Review (December 2018) Papers Topic Article PDF Type 1 Map Reduce MapReduce: Simplified Data Processing on Large Clusters Required 2 Pig Pig Latin: A Not-So-Foreign Language for Data Processing Required Building a High-Level Dataflow System on top of Map-Reduce: The Pig Experience Recommended 3 Google File System The Google File System Required 4 BigTable Bigtable: A Distributed Storage System for Structured Data Required Spanner: Google’s Globally-Distributed Database Recommended 5 Zookeeper ZooKeeper: Wait-free coordination for Internet-scale systems Required The Chubby lock service for loosely-coupled distributed systems Recommended Zab: High-performance broadcast for primary-backup systems Recommended Wait-Free Synchronization Recommended 6 Pregel Pregel: A System for Large-Scale Graph Processing Required GraphX: Graph Processing in a Distributed Dataflow Framework Recommended PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs Recommended 7 Virtualization + Containers Xen and the Art of Virtualization Recommended An Updated Performance Comparison of Virtual Machines and Linux Containers Recommended kvm: the Linux Virtual Machine Monitor Recommended Docker ecosystem – Vulnerability Analysis Recommended Notes: Topics sourced from the review lecture; papers sourced from slides and resource list Recommended papers about Spark omitted since it's not examinable (I think)
Might be worth putting here - regarding Virtualization and Containers, this Piazza post gives some pointers on what is required.