Skip to content

Instantly share code, notes, and snippets.

@fahadsiddiqui
Created April 8, 2017 15:55
Show Gist options
  • Save fahadsiddiqui/d0caba09c1490f5ae7280bf7b16d0627 to your computer and use it in GitHub Desktop.
Save fahadsiddiqui/d0caba09c1490f5ae7280bf7b16d0627 to your computer and use it in GitHub Desktop.
scala> val thisfile = sc.textFile("/home/fahad/e.json")
thisfile: org.apache.spark.rdd.RDD[String] = /home/fahad/e.json MapPartitionsRDD[102] at textFile at <console>:27
scala> val rdd = sc.parallelize((thisfile.collect().mkString.replace("},", "}}\n{").dropRight(1) + "}").split("\n"))
rdd: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[103] at parallelize at <console>:29
scala> val xy = sqlContext.read.json(rdd)
xy: org.apache.spark.sql.DataFrame = [emp-1: struct<age:bigint,name:string,sex:string>, emp-2: struct<age:bigint,name:string,sex:string>]
@fahadsiddiqui
Copy link
Author

here e.json is

{
 "emp-1": {
  "name": "f",
  "age": 1,
  "sex": "male"
 },
 "emp-2": {
  "name": "s",
  "age": 2,
  "sex": "female"
 }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment