Skip to content

Instantly share code, notes, and snippets.

@dalazx
Last active February 19, 2018 19:15
Show Gist options
  • Save dalazx/8240e2ddc613b044ea3b13e465b23d53 to your computer and use it in GitHub Desktop.
Save dalazx/8240e2ddc613b044ea3b13e465b23d53 to your computer and use it in GitHub Desktop.
CSV to Parquet using Spark
// sbt console
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.SparkSession
val sparkConf = new SparkConf().setAppName("test").setMaster("local[*]")
val sparkContext = new SparkContext(sparkConf)
val sparkSession = SparkSession.builder.config(sparkContext.getConf).getOrCreate()
val sqlContext = sparkSession.sqlContext
val df = sqlContext.read.format("com.databricks.spark.csv")
.option("header", "true").option("inferSchema", "true")
.option("charset", "UTF-8").load("path/to/file.csv")
df.write.option("compressionCodec", "gzip").parquet("path/to/file.parquet")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment