Skip to content

Instantly share code, notes, and snippets.

@jaceklaskowski
Last active January 9, 2018 19:19
Show Gist options
  • Save jaceklaskowski/3baa42b9fccb14befdb198ddfecb4567 to your computer and use it in GitHub Desktop.
Save jaceklaskowski/3baa42b9fccb14befdb198ddfecb4567 to your computer and use it in GitHub Desktop.
Exercise: Creating Custom Format for DataFrameReader in Apache Spark
  1. Create a Scala/sbt project
  • Use IntelliJ IDEA
  1. Add libraryDependencies for Spark 2.0.0 (RC2)
  2. Create class mf.DefaultSource (or similar)
  3. publishLocal (or similar)
  4. ./bin/spark-shell --packages organization:spark-mf-format_2.11:1.0.0
  5. spark.read.format("mf").load("mojFormat.mf")

For the bravests:

  1. Publish to github
  2. Register on spark-packages.org
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment