Skip to content

Instantly share code, notes, and snippets.

@MechCoder
Last active August 29, 2015 14:24
Show Gist options
  • Save MechCoder/fa75baf08e95fbc18955 to your computer and use it in GitHub Desktop.
Save MechCoder/fa75baf08e95fbc18955 to your computer and use it in GitHub Desktop.
SparkSQL stuff
import org.apache.spark.SparkContext
import org.apache.spark.sql.SQLContext
val a = Array(1, 2, 3, 4)
val b = Array("one", "two", "three", "four")
val sc = SparkContext.getOrCreate()
val sql = SQLContext.getOrCreate(sc)
val df = sc.parallelize(a.zip(b)).toDF("number", "word")
# Collects the dataframe
df.collect()
# Convert back to a RDD[(Int, String)]
df.map { case Row(i: Int, w: String) => (i, w)}
df.schema
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment