Skip to content

Instantly share code, notes, and snippets.

@Kashif-Rabbani
Forked from shagunsodhani/DistributedGraphLab.md
Created November 13, 2018 14:38
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Kashif-Rabbani/9231f2e73030138a214dc89b573dbf67 to your computer and use it in GitHub Desktop.
Save Kashif-Rabbani/9231f2e73030138a214dc89b573dbf67 to your computer and use it in GitHub Desktop.
Notes for "Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud" paper.

Introduction

  • GraphLab abstraction exposes asynchronous, dynamic, graph-parallel computation model in the shared-memory setting.
  • This paper extends the abstraction to the distributed setting.
  • Link to the paper.

Characteristics of MLDM (Machine Learning and Data Mining)

  • Graph Structured Computation
    • Sometimes computation requires modeling dependencies between data.
    • eg modeling dependencies between similar users for the recommendation use case.
  • Asynchronous Iterative Computation
    • In many cases, asynchronous procedures outperform synchronous ones.
    • eg linear systems, belief propagation, stochastic optimization etc.
  • Dynamic Computation
    • Iterative computation converges asymmetrically.
    • Convergence can be accelerated by dynamic scheduling.
    • eg do not update parameters that have already converged.
  • Serializability
    • Ensuring that all parallel executions have an equivalent serial execution is desirable for both correctness and faster convergence.

GraphLab Abstraction

Data Graph

  • Store program state as a directed graph.
  • G = (V,E,D) where D is the user defined data (model parameters, algorithm state, statistical data etc).
  • The graph data D is mutable but the state of the graph (V,E) is immutable.

Update Function

  • Stateless procedure that modifies the data within the scope of a vertex and schedules the execution of the update function on other vertices.
  • Scope of a vertex (S) - data corresponding to the vertex, its edges and its adjacent vertices.
  • update: f (v, Sv) -> (Sv, T) where T is the set of vertices where update function is scheduled to be invoked.
  • Scheduling of computation id decoupled from movement of data and no message passing is required between vertices.

Execution Model

  • Input to the model is G and T, the initial set of vertices to be updated.
  • During each step, a vertex is extracted from T, updated and a set of vertices is added to T (for future computation).
  • Vertices in T can be executed in any order with the only constraint that all vertices be eventually executed.

Sync Operation

  • Sync operation runs in the background to maintain global aggregates concurrently.
  • These global values are read by update function and written by the sync operation.

Consistency Models

  • Full consistency
    • Full read/write access in the scope.
    • Scope of concurrently updating vertices cannot overlap.
  • Edge consistency
    • Read/write access on the vertex and the adjacent edges but only read access to adjacent vertices.
    • Slightly overlapping scope.
  • Vertex consistency
    • Write access to the vertex and read access to adjacent edges and vertices.
    • All vertices can run update function simultaneously.

Distributed Data Graph

  • Two-phase partitioning process for load balancing the graph on arbitrary cluster size.
  • In the first phase, partition the graph into k parts (k >> number of machines).
  • Each part, called atom, is a file of graph generating commands.
  • Atom also stores information about ghosts (set of vertices and edges adjacent to the partition boundary).
  • Atom index file contains connectivity structure and file location for the k atoms as a meta-graph.
  • In the second phase, this meta-graph is partitioned over the physical machines.

Distributed GraphLab Engines

Chromatic Engine

  • A vertex coloring (no adjacent vertices have the same color) is constructed to serialize parallel execution of dependent tasks (in our case, vertices in the graph).
  • For edge consistency model, execute all vertices of the same color before going to next color and run sync operation between color steps.
  • Changes to ghost vertices and edges are communicated asynchronously as they are made.
  • Vertex consistency is trivial - assign same color to all the vertices.
  • For full consistency, construct second-order vertex coloring (no vertex shares the same color as any of its distance two neighbors)

Distributed Locking Engine

  • Associate reader-writer locks on each vertex.
  • Each machine can update only the local vertices.
  • Optimisations
    • Ghosting system uses caching to eliminate wait on remote, unchanged data.
    • Lock request and synchronization are pipelined to hide network latency.
      • Each machine maintains a pipeline of vertices for which locks have been requested but not granted.
      • A vertex is executed once lock acquisition and data synchronization are complete.
      • Nonblocking reader-writer locks, that work through callback functions, are used.

Fault Tolerance

  • Distributed checkpointing via two modes:
  • Synchronous checkpointing
    • Suspend computation to save all modified data since the last checkpoint.
  • Asynchronous checkpointing based on Chandy-Lamport snapshot algorithm.
    • The snapshot step becomes an update function in the GraphLab abstraction.
    • Better than synchronous checkpointing.

System Design

  • One instance of GraphLab runs on each machine.
  • These processes are symmetric and communicate via RPC.
  • The first process additionally acts as the master and computes placement of atoms based on atom index.
  • Each process maintains a local scheduler (for its vertices) and a cache to access remote data.
  • Distributed consensus algorithm to decide when all the schedulers are empty.

Observations

  • The biggest strength of the paper are its extensive experiments.
  • GraphLab benefits from the use of background asynchronous communication and pipelined locking but its communication layer is not as efficient as MPI's communication layer.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment