Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
# We can submit Java or Scala based applications using this type of command
# For Python based applicatios it is slightly different.
spark-submit --master yarn \
--conf spark.ui.port=12901 \
--class org.apache.spark.examples.JavaWordCount \
/usr/lib/spark/lib/spark-examples.jar \
/user/itversity/randomtextwriter/part-m-00000
sc.textFile("/user/itversity/randomtextwriter/part-m-00000").
flatMap(_.split(" ")).
map((_, 1)).
reduceByKey(_ + _).
saveAsTextFile("/user/itversity/spark-shell/wordcount")
sc.textFile("/user/itversity/randomtextwriter/part-m-00000"). \
flatMap(lambda line: line.split(" ")). \
map(lambda word: (word, 1)). \
reduceByKey(lambda x, y: x + y). \
saveAsTextFile("/user/itversity/pyspark/wordcount")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.