Skip to content

Instantly share code, notes, and snippets.

@ppillay
Last active June 16, 2017 04:23
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ppillay/0eed47ce69c5afb093d84ae41833a3fd to your computer and use it in GitHub Desktop.
Save ppillay/0eed47ce69c5afb093d84ae41833a3fd to your computer and use it in GitHub Desktop.
Setup
val spark = SparkSession.builder.master("local").appName("flightDataApp").getOrCreate()
import spark.implicits._
val df = spark.read
.option("header", "true") //read headers
.option("mode", "DROPMALFORMED")
.option("inferSchema", "true")
.csv("flight-data/16157900_T_ONTIME.csv")
.na.fill(Map("delay" -> 0.0)) // replace null values in delay column with zero
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment