Skip to content

Instantly share code, notes, and snippets.

@pedroduartecosta
Created January 29, 2019 14:50
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pedroduartecosta/79444bd7b654be94e9239d9664dd0746 to your computer and use it in GitHub Desktop.
Save pedroduartecosta/79444bd7b654be94e9239d9664dd0746 to your computer and use it in GitHub Desktop.
val conf = new SparkConf().setAppName("predictor")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
val rawData = sqlContext.read.format("com.databricks.spark.csv")
.option("header", "true")
.option("inferSchema", "true")
.load(dataPath)
.withColumn("DelayOutputVar", col("ArrDelay").cast("double"))
.withColumn("DepDelayDouble", col("DepDelay").cast("double"))
.withColumn("TaxiOutDouble", col("TaxiOut").cast("double"))
.cache()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment