Skip to content

Instantly share code, notes, and snippets.

@jmrr
Last active June 23, 2022 20:04
Show Gist options
  • Star 6 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jmrr/59704142a608edad4e12 to your computer and use it in GitHub Desktop.
Save jmrr/59704142a608edad4e12 to your computer and use it in GitHub Desktop.
MySQL tables to parquet files on the Spark shell
val sqlContext = new org.apache.spark.sql.SQLContext(sc) // optional
val df = sqlContext.load("jdbc", Map(
"url" -> "jdbc:mysql://<ip.address.your.db>/<table>?user=<username>&password=<pwd>",
"dbtable" -> "<tablename>"))
df.select("<col1>","<col2>","<col3>").save("</path/to/parquet/file.parquet>","parquet")
//Alternatively, to save all the columns:
df.write.parquet("</path/to/parquet/file.parquet>")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment