Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
Random Walk Generation on a Directed Graph with PySpark
def generate_random_walks(page_ids, adjacency_list, num_walks=10, len_walks=20):
convenience method to generate a list of numWalks random walks. This saves a random walk in targetPath.
:param page_ids: an RDD of page ids for which the random walks should be generated.
:param adjacency_list: a simple RDD with tuples of the form (page_id, [list(id)]).
:param num_walks: optional. The number of walks, which are to be generated for each page id.
:param len_walks: optional. The maximum length of each walk.
:return: a RDD of random walks
walkers = page_ids.flatMap(lambda page_id: [(page_id, [page_id])] * num_walks)
for _ in range(len_walks - 1):
walkers = walkers \
.leftOuterJoin(adjacency_list) \
.map(random_append) \
return x: x[1])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.