Skip to content

Instantly share code, notes, and snippets.

@liancheng
Created June 8, 2015 12:15
Show Gist options
  • Save liancheng/d5e15571cc45ad8ef540 to your computer and use it in GitHub Desktop.
Save liancheng/d5e15571cc45ad8ef540 to your computer and use it in GitHub Desktop.
case class HiveSampleData(ClientID: String, QueryTime: String, Market: String, DevicePlatform: String, DeviceMake: String, DeviceModel: String, State: String, Country: String, SessionId: Long, SessionPageViewOrder: Long)
val mobiletxt = sc.textFile("file:///tmp/a.csv")
mobiletxt.count()
// Import data within sc SparkContext and convert to DataFrame via .toDF()
val mobile = sc.textFile("file:///tmp/a.csv").map(_.split(",")).map(m => HiveSampleData(m(0), m(1), m(2), m(3), m(4), m(5), m(6), m(7), m(8).toLong, m(9).toLong)).toDF()
// Register table
mobile.registerTempTable("mobile")
// Query that results in permgen error
sqlContext.sql("SELECT COUNT(DISTINCT ClientID) FROM mobile").collect().take(10).foreach(println)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment