Skip to content

Instantly share code, notes, and snippets.

@tmcgrath
Created January 6, 2016 17:08
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tmcgrath/f7dbc76a8746b2f66b9b to your computer and use it in GitHub Desktop.
Save tmcgrath/f7dbc76a8746b2f66b9b to your computer and use it in GitHub Desktop.
Spark SQL with Scala using valid JSON input source example in Spark Console
todd-mcgraths-macbook-pro:spark-1.4.1-bin-hadoop2.4 toddmcgrath$ bin/spark-shell
2016-01-06 11:05:57.362 java[30505:1203] Unable to load realm info from SCDynamicStore
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.4.1
/_/
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_65)
Type in expressions to have them evaluated.
Type :help for more information.
Spark context available as sc.
SQL context available as sqlContext.
scala> val jsonRDD = sc.wholeTextFiles("baby_names.json").map(x => x._2)
jsonRDD: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[1] at map at <console>:21
scala> val namesJson = sqlContext.read.json(jsonRDD)
2016-01-06 11:06:40.197 java[30505:1203] Unable to load realm info from SCDynamicStore
namesJson: org.apache.spark.sql.DataFrame = [Count: string, County: string, First Name: string, Sex: string, Year: string]
scala> namesJson.registerTempTable("names")
scala> sqlContext.sql("select * from names").collect.foreach(println)
[272,KINGS,DAVID,M,2013]
[268,KINGS,JAYDEN,M,2013]
[219,QUEENS,JAYDEN,M,2013]
[219,KINGS,MOSHE,M,2013]
[216,QUEENS,ETHAN,M,2013]
@tmcgrath
Copy link
Author

tmcgrath commented Jan 6, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment