Skip to content

Instantly share code, notes, and snippets.

@nburoojy
Created July 6, 2016 03:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nburoojy/2ef97e28ae09e64d941169bc2f51d5c4 to your computer and use it in GitHub Desktop.
Save nburoojy/2ef97e28ae09e64d941169bc2f51d5c4 to your computer and use it in GitHub Desktop.
SparkTour
Spark Tour
// Load sc and sqlContext
import com.civitaslearning.test.SharedSparkContext._
// Implicit sqlContext.artifactDataFrame
import com.civitaslearning.slate.SlateRedshiftInputFormat._
val slate = new com.civitaslearning.slate.Slate()
val slateArtifactId = "37607984"
val df = sqlContext.artifactDataFrame(slate.fetchArtifactById(slateArtifactId).obj)
df.show
val c = df.cache
scala> c.count
res9: Long = 5151560
c.groupBy("seg_id").avg("prediction").show
import org.apache.spark.sql.functions._
c.select(min("seg_id"), max("seg_id")).show
c.registerTempTable("c")
sqlContext.sql("select count(*) as count from c").show
val r = c.rdd
val row = r.take(1).head
val p = row.getDouble(row.fieldIndex("prediction"))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment