Skip to content

Instantly share code, notes, and snippets.

@misho-kr
Last active March 4, 2020 01:23
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save misho-kr/4f5d843831260e2c54e4 to your computer and use it in GitHub Desktop.
Save misho-kr/4f5d843831260e2c54e4 to your computer and use it in GitHub Desktop.
Summary of "Cloud Computing Concepts" course at Coursera.Org

This course is at an undergraduate level, likely situated in third or fourth year. Students should feel programming language concepts, including recursion, as well as proof techniques, including induction.

  1. Introduction: Clouds, MapReduce, Key-value stores
  2. Classical Precursors: Peer-to-peer systems, Grids
  3. Widely-used algorithms: Gossip, Membership, Paxos
  4. Classical algorithms: Time and Ordering, Snapshots, Multicast
  5. Fun: Interviews with leading managers and researchers, from both industry and academia

Week 1

Introduction

  • Cloud computing, Amazon AWS, Google Compute Engine, Microsoft Azure, etc.; public and private clouds
  • What is a cloud -- cluster, supercomputer, datastore, all of the above; massive scale, on-demand
  • HaaS (hardware), IaaS, PaaS, SaaS
  • Trends -- data-centers, time-sharing, grids, clouds
  • Economics of cloud computing

MapReduce

  • Terms are borrowed from functional language
  • WordCount application as MapReduce task
  • Map parallelly processes a large number of records
  • Reduce merges all intermediate results
  • More applications -- Distributed Grep, Reverse Web-Link Graph, URL access frequency, Sort
  • Yarn Scheduler -- Global Resource Manager (RM), per-server Node Manager (NM), per-application Application Master (AM)
  • Problems -- server/application failure (fault tolerance), slow servers (stragglers), locality and data replication

Week 2

Gossip

  • Multicast protocol for fault tolerance and scalability, centralized or tree-based (build a spanning tree)
  • Epidemic multicast, or Gossip -- periodically transmit b messages, then those nodes do the same after receiving multicast
  • Push gossip, also pull, and hybrid variants
  • Properties -- light-weight, fault-tolerant, can be topology-aware

Membership

  • How to deal with failures -- fault detectors, membership protocol to disseminate membership and failures
  • Distributed fault detectors, desired properties:
  • Completeness -- each mistake is detected
  • Accuracy -- there is no mistaken detection
  • Speed -- time to first detection of failure
  • Scale -- equal load on each member, network load
  • Impossible to achieve perfect completeness and accuracy at the same time, so accuracy is partial
  • What is the best failure detector, what is the optimal
  • Swim failure detection algorithm, uses delegation to ask other nodes to do the ping
  • Dissemination options -- multicast, point-to-point, piggyback on pings, "infection" style dissemination
  • Suspect a failure before declaring it as such

Grids

  • Applications and Scheduling, Two-level scheduling, inter-site and intra-site protocols
  • Grids are federated, no single entity controls the entire infrastructure
  • In clouds the focus is on failures, scale, on-demand

Week 3

P2P Systems

  • First distributed systems that seriously focused on scalability with respect to number of nodes
  • Napster stores a directory, filenames with pointers to their location; no security, responsible for user' copyright violations
  • Centralized server is single point of failure, congestion
  • Gnutella
  • Eliminates the servers so clients search and retrieve among themselves
  • Five message types -- ping, pong, query, query-hit and push
  • Avoids excessive by keeping recently received messages
  • Problems -- ping-pong consumes 50% of traffic, repeated searches, free-loaders, flooding
  • FastTrak is a hybrid between Napster and Gnutella, uses supernodes to store file pointers, any peer can become supernode provides it has earned reputation
  • BitTorrent
  • Chord -- distributed hash table (DHS), uses Ring of peers to achieve O(log(N)) for memory and lookup latency
  • Peer pointers -- successors and finger table
  • Search under peer failure
  • Deal with dynamic changes, churn
  • Stabilization protocol and rounds
  • Pastry -- routing tables based on prefix matching, hops are short (in the underlying network)
  • Kelips -- k affinity groups, each node hashed to a group; filename hashed to a group, any node in the group may store the file

Week 4

Key-Value Stores

  • Distributed dictionary data structure, similar to DHT, sort of database
  • Today's workloads -- lots of random reads and writes, joins infrequent, foreign keys rarely needed, need to scale out not scale up
  • NoSQL => Not Only SQL, operations like get and put, may not have schema, don't always support joins or have foreign keys
  • NoSQL systems often use column-oriented storage
  • Cassandra uses a ring-based DHT but without finger tables or routing, replication strategy is SimpleStrategy or NetworkTopologyStrategy for multi-DC deployments
  • Snitch options --- IPs to racks and DCs maps (config file), SimpleSnitch (unaware), RackInferring, EC2Snitch
  • Writes go to Coordinator (per-key, per-client or per-query) that uses Partitioner to send query to all replica nodes responsible for the key
  • When replica is down the Coordinator writes to all other replicas and keeps the write locally until it comes back up
  • Writes on replica nodes go to commit log, then to memtable -- write-back cache as opposed to write-through; when full memtables are flushed to _SSTable (Sorted String Table); also uses Index file and Bloom filter; SSTables and logs nee dto be compacted
  • Deletes don't remove items right away, instead tombstone is written to the log, until compaction
  • Reads are similar to writes
  • CAP Theorem -- distributed systems can satisfy at most 2 out of 3 guarantees -- consistency , availability, partition tolerance
  • NoSQL databases provide eventual (weak) consistency, traditional RDBMS favors strong consistency over availability under a parition
  • Cassandra consistency levels -- ANY, ALL, ONE, QUORUM, LOCAL_QUORUM, EACH_QUORUM
  • Newer consistency levels -- causal consistency, per-key sequential, red-blue consistency, sequential consistency
  • HBase -- Google's BigTable, Yahoo open-sourced it

Time and Ordering

  • Time synchronization is required for correctness and fairness; processes in Internet-bases systems follow an asynchronous system model
  • Clock skew -- relative difference between clock values; clock drift -- relative difference between clock frequencies
  • Need to synchronize any pair of clocks at least once every: "maximum acceptable skew" / ( 2 * "maximum drift rare" )
  • External synchronization -- Christian's Algorithm and NTP; allowed to increase but should never decrease clock value; error can never be zero; can we avoid synchronizing clocks altogether and still be able to order events
  • Instead assign timestamps that are not absolute time -- as long as they obey causality that should work
  • Logical ordering proposed by Lamport in the 1970s, used in almost all distributed systems today
  • Logical relation Happens-Before defines partial ordering, not all events are related to each other
  • A process increments its counter when a send or an instruction happens; on receive the counter may be updated and is incremented
  • Lamport timestamps not guaranteed to be ordered or unequal for concurrent events
  • Vector timestamps -- on send or instruction event at process i only i-th element of the vector clock is incremented
  • VT1 = VT2, VT1 <= VT2, VT < VT2 (causally related), or NOT (VT1 <= VT2) and NOT (VT2 <= Vt1) (concurrent)
  • Obeys causality, uses more space but can also identify concurrent events

Week 5

Snapshots

  • Global snapshot = Global State = individual state of each process + individual state of each communication channel in the distributed system
  • Naive solution -- synchronize all clocks and asks processes to record their states at known time
  • Time synchronization always has errors
  • Does not capture state of messages in the channels
  • Synchronization is not required, causality is enough
  • System model -- N processes, two unidirectional channels between each ordered process pair, channels are FIFO, no failures, no transmission errors (corruption or duplication)
  • Snapshots should not interfere with the normal application actions
  • Chandy-Lamport global snapshot algorithm uses "marker" messages
  • Consistent cuts
  • Cut is a time frontier at each process and at each channel
  • Events that happen before the cut are in the cut, otherwise out of the cut
  • Consistent iff for each pair of events e,f if e is in the cut and f -> e, then f is also in the cut
  • Any run of the Chandy-Lamport global snapshot algorithm creates a consistent cut
  • Correctness in distributed systems:
  • Liveness -- something good will happen, eventually
  • Safety -- something bad will never happen
  • Difficult to satisfy both liveness and safety in an asynchronous distributed system

Multicast Ordering

  • Problem, D=definition (unicast, broadcast, multicast), applications; 3 protocols -- FIFO, causal and total ordering
  • FIFO ordering -- multicasts from each sender are received in the order they are sent
  • Causal ordering -- multicasts whole send events are causally related, must be received in the same causally-obeying order at all receivers
  • Causal ordering => FIFO ordering, but the reverse is not true
  • Total ordering -- ensures all processes receive all multicasts in the same order
  • Implementation
  • FIFO -- each receiver maintains a per-sender sequence number
  • Total -- one process elected as leader or sequencer that maintains the order
  • Causal -- each receiver maintains a vector of per-sender sequence numbers
  • Reliability of multicasting is orthogonal to ordering
  • To reliably send multicasts -- sender sequentially sends a reliable unicast message to all group recipients, and each receiver also sequentially sends the message to all group processes; not efficient but reliable
  • Virtual Synchrony combines membership protocol with a multicast protocol; can not implement consensus because VSync groups are susceptible to partitioning

Paxos

  • Problem -- a group of processes attempting to elect a leader, ensure mutually exclusive access to a resource, receive updates in the same order; coordinate and reach agreement
  • What is consensus -- N processes, each process has input variable xp and output variable yp, design a protocol so that at the end all processes set yp to all 0s or all 1s
  • Constraints -- validity, integrity, non-triviality
  • Two models of distributed systems
  • Synchronous -- each message is delivered within bounded time, local clock drift has known bound, each process takes finite time to complete
  • Asynchronous -- no bounds for process execution, drift rate of a clock is arbitrary, no bounds on message transmission delays
  • In the synchronous system model consensus is solvable
  • For a system with at most f processes that may crash the algorithm proceeds in f+1 rounds (with timeout), using reliable communication for all members
  • In the asynchronous model consensus is impossible to solve -- can not distinguish between failed process from one that is very very slow
  • Paxos does not solve the consensus problem but provides safety and eventual liveness
  • Paxos has rounds, each with a unique ballot id
  • Rounds are asynchronous, time synchronization is not required
  • Each round is broken into phases -- leader election, leader proposes a value and processes acknowledge (bill), leader multicasts final value (law)
  • What is the point of no-return
  • Fischer, Lynch, and Patterson, 1983 (FLP) proof
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment