Skip to content

Instantly share code, notes, and snippets.

@NoahMarconi
Last active September 20, 2016 18:20
Show Gist options
  • Save NoahMarconi/18ff71921c9789ec9421ecb21bab8572 to your computer and use it in GitHub Desktop.
Save NoahMarconi/18ff71921c9789ec9421ecb21bab8572 to your computer and use it in GitHub Desktop.
Run Page Rank on Segments
import org.apache.spark.sql._
import org.apache.spark.sql.functions._
import org.graphframes._
// Load from shared location.
val nodes = spark.sqlContext.read.load("/mnt/hackondata/wattpadNodes.parquet")
val edges = spark.sqlContext.read.load("/mnt/hackondata/wattpadEdges.parquet")
// Filter graphs.
val gTotal = GraphFrame(nodes, edges)
val gZero = GraphFrame(nodes.filter("age != 0"), edges)
val gTeens = GraphFrame(nodes.filter("age > 0 AND age < 15"), edges)
val gYoungAdults = GraphFrame(nodes.filter("age > 15 AND age < 19"), edges)
val gAdults = GraphFrame(nodes.filter("age > 19"), edges)
// Run PageRank
val totalResults = gTotal.pageRank.resetProbability(0.01).maxIter(20).run()
val zeroResults = gZero.pageRank.resetProbability(0.01).maxIter(20).run()
val teensResults = gTeens.pageRank.resetProbability(0.01).maxIter(20).run()
val youngAdultsResults = gYoungAdults.pageRank.resetProbability(0.01).maxIter(20).run()
val adultsResults = gAdults.pageRank.resetProbability(0.01).maxIter(20).run()
// Within DataBricks notebook display with:
display(totalResults.vertices.select("id", "pagerank").sort(desc("pagerank")))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment