Skip to content

Instantly share code, notes, and snippets.

@dgadiraju
Last active November 13, 2017 00:19
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dgadiraju/ffa812b963eb6a26b8fe42010fac37ac to your computer and use it in GitHub Desktop.
Save dgadiraju/ffa812b963eb6a26b8fe42010fac37ac to your computer and use it in GitHub Desktop.
// hadoop fs -copyFromLocal /data/nyse /user/dgadiraju/nyse
/*
spark-shell --master yarn \
--conf spark.ui.port=12345 \
--num-executors 4
*/
val nyse = sc.textFile("/user/dgadiraju/nyse").
coalesce(4).
map(stock => {
val s = stock.split(",")
(s(0), s(1), s(2).toFloat, s(3).toFloat, s(4).toFloat, s(5).toFloat, s(6).toInt)
}).
toDF("stockticker", "transactiondate", "openprice", "highprice", "lowprice", "closeprice", "volume")
sqlContext.setConf("spark.sql.shuffle.partitions", "4")
nyse.save("/user/dgadiraju/nyse_parquet", "parquet")
//nyse.write.parquet("spark-scala-")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment