In this paper, the authors discuss about FrogWild, their Monte Carlo based
Approximate Distributed PageRank approach for obtaining top-k vertices in
the graph (where k ∈ [100, 1000]
). They discuss scenarios where obtaining the
top-k vertices is desired. These include finding top-k customers for a
business' customer loyalty program, or identifying top-k keywords/lines in a
given text (where each unique word is a vertex and is linked to words in its
close proximity in the text). A simple solution to obtaining top-k vertices is
to run the standard PageRank algorithm for fewer iterations, but the authors
indicate that their algorithm performs better.
Each random walker, called a frog, is initiated at uniformly random vertex
(instead of at each vertex) with a lot of mass (frog?
), and performs random
walk for a life span that follows a geometric random variable (but their
algorithm does not require waiting for the last frog to expire). Their
algorithm relies on the Bulk Synchronous execution mode of the GraphLab
PowerGraph, but is executed asynchronously (similar to HogWild, which is a
lock-free asynchronous stochastic gradient descent algorithm).