Skip to content

Instantly share code, notes, and snippets.

View nag9s's full-sized avatar

nag9s

View GitHub Profile
val path = "/public/yelp-dataset/yelp_review.csv"
val conf = sc.hadoopConfiguration
conf.set("textinputformat.record.delimiter", "\r")
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat
import org.apache.hadoop.io.LongWritable
import org.apache.hadoop.io.Text
val yelpReview = sc.newAPIHadoopFile(path, classOf[TextInputFormat], classOf[LongWritable], classOf[Text], conf)
yelpReview.count