Last active
June 27, 2018 01:45
-
-
Save vikas-gonti/1b70dfa509f1549346f9846f7fbf34dc to your computer and use it in GitHub Desktop.
Convert nyse data to parquet
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
//Convert nyse data (multiple files)to parquet (4 files output) | |
/* Data is available in local file system under /data/nyse (ls -ltr /data/nyse) | |
Fields (stockticker:string, transactiondate:string, openprice:float, highprice:float, lowprice:float, closeprice:float, volume:bigint) | |
Convert file format to parquet | |
Save it /user/<YOUR_USER_ID>/nyse_parquet*/ | |
/*spark-shell --master yarn \ | |
--conf spark.ui.port=12456 \ | |
--num-executors 4*/ | |
// data is in local server data/nyse/ | |
/* Move to HDFS | |
hadoop fs -mkdir user/gontiv/nysedata/ | |
hadoop fs -put /data/nyse/ user/gontiv/nysedata/ */ | |
val nyseRDD = sc.textFile("user/gontiv/nysedata/").coalesce(4). | |
map(rec => { | |
val t = rec.split(",") | |
(t(0), t(1),t(2).toFloat,t(3).toFloat,t(4).toFloat,t(5).toFloat,t(6).toInt) | |
}) | |
val nyseDF = nyseRDD.toDF("stockticker","transactiondate","openprice","highprice","lowprice","closeprice","volume") | |
sqlContext.setConf("spark.sql.shuffle.partitions", "4") | |
nyseDF.write.parquet("/user/gontiv/nyse_parquet") |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment