Skip to content

Instantly share code, notes, and snippets.

@takezoe
Last active November 19, 2018 14:03
Show Gist options
  • Save takezoe/beb1047b267b5edc8c0cc687a9e274b4 to your computer and use it in GitHub Desktop.
Save takezoe/beb1047b267b5edc8c0cc687a9e274b4 to your computer and use it in GitHub Desktop.

Test data:

{"maker":"Apple", "products":[{"name": "iPhobe", "price": 100000}, {"name": "iPad", "price": 120000}]}
{"maker":"ASUS", "products":[{"name": "Zenfone", "price": 20000}]}

DataFrame:

scala> val df = ds.select(ds("maker"),explode(ds("products")).as("p"))
scala> df.select("maker", "p.name", "p.price").show
+-----+-------+------+
|maker|   name| price|
+-----+-------+------+
|Apple| iPhobe|100000|
|Apple|   iPad|120000|
| ASUS|Zenfone| 20000|
+-----+-------+------+

SQL:

scala> val ds = spark.read.json("/Users/takezoe/test.json")
scala> ds.createOrReplaceTempView("test")
scala> val df = spark.sql("SELECT maker, p.name, p.price FROM (SELECT maker, explode(products) as p FROM test)")
scala> df.show
+-----+-------+------+
|maker|   name| price|
+-----+-------+------+
|Apple| iPhobe|100000|
|Apple|   iPad|120000|
| ASUS|Zenfone| 20000|
+-----+-------+------+
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment