Skip to content

Instantly share code, notes, and snippets.

@skipperkongen
Last active August 20, 2017 13:10
Show Gist options
  • Save skipperkongen/982a3d7ef5427fb73c4804a58b07f726 to your computer and use it in GitHub Desktop.
Save skipperkongen/982a3d7ef5427fb73c4804a58b07f726 to your computer and use it in GitHub Desktop.
Load and query data from S3 in Zeppelin (Spark)
// Load two data paths
val df1 = spark.read.load("/path/to/data1") // e.g. parquet files
val df2 = spark.read.load("/path/to/data2") // e.g. parquet files
// Union into single dataframe
df1.createOrReplaceTempView("data1")
df2.createOrReplaceTempView("data2")
val df = spark.sql("""
SELECT * FROM data1
UNION ALL
SELECT * FROM data2
""")
df.registerOrReplaceTempView("data")
// Run a query
val query = spark.sql("SELECT count(*) FROM data")
query.show()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment