Skip to content

Instantly share code, notes, and snippets.

@pm-hwks
Created January 9, 2020 03:17
Show Gist options
  • Save pm-hwks/4280ce73d4f6a5e56e417951e9d24617 to your computer and use it in GitHub Desktop.
Save pm-hwks/4280ce73d4f6a5e56e417951e9d24617 to your computer and use it in GitHub Desktop.
[spark/Scala - Access S3 data] Access S3 data from spark #spark #scala #s3
// Set up AWS credentials
sc.hadoopConfiguration.set("fs.s3a.access.key", "AKI*****************")
sc.hadoopConfiguration.set("fs.s3a.secret.key", "kd8***********************************")
def wordcount() = {
// Read & process S3 file - word count
val abc_file = sc.textFile("s3a://prms-s3/data/abc.txt")
val counts = abc_file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _)
//counts.saveAsTextFile("s3a://s3-to-ec2/output")
counts.toDF().show()
}
def s3_csv() = {
val s1_csv_rdd = sc.textFile("s3a://prms-s3/data/s1.csv")
.map( line => line.split(","))
s1_csv_rdd.toDF().show()
}
// Call Word count
wordcount()
//call s3 csv load
s3_csv()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment