Skip to content

Instantly share code, notes, and snippets.

@josep2
Created July 9, 2018 18:17
Show Gist options
  • Save josep2/80b47967ef7e02430e7118bcdfb7cade to your computer and use it in GitHub Desktop.
Save josep2/80b47967ef7e02430e7118bcdfb7cade to your computer and use it in GitHub Desktop.
Example of naming a CSV import
import org.apache.spark.sql.SparkSession
import org.apache.spark.storage.StorageLevel
case class Panel(user_id: String, date_joined: String, zip_shipping: String, date_newest_receipt: String,
date_oldest_receipt: String, prop_30d_syncable: String, date_last_sync: String, isp: String,
syncable: Int)
object Demo extends App {
import sparkSession.implicits._
val file = sparkSession.sparkContext.textFile("PATH_TO_FILE") // Load the file as a text file
.map(_.split("\\,")) // Delimit by comma for a CSV
.map(line => Panel(line(0),line(1),line(3),line(4),line(5),line(6),line(7),line(8),line(9).toInt)) // Map all types apporpirately to the case class
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment