Skip to content

Instantly share code, notes, and snippets.

@dgadiraju
Last active August 6, 2019 16:31
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dgadiraju/65128e88405c9b80e8bc34d3e878c6c3 to your computer and use it in GitHub Desktop.
Save dgadiraju/65128e88405c9b80e8bc34d3e878c6c3 to your computer and use it in GitHub Desktop.
# We can submit Java or Scala based applications using this type of command
# For Python based applicatios it is slightly different.
spark-submit --master yarn \
--conf spark.ui.port=12901 \
--class org.apache.spark.examples.JavaWordCount \
/usr/lib/spark/lib/spark-examples.jar \
/user/itversity/randomtextwriter/part-m-00000
spark2-shell \
--master yarn \
--conf spark.ui.port=12901 \
--conf spark.dynamicAllocation.enabled=false \
--num-executors 4
sc.textFile("/user/itversity/randomtextwriter/part-m-00000").
flatMap(_.split(" ")).
map((_, 1)).
reduceByKey(_ + _).
saveAsTextFile("/user/itversity/spark-shell/wordcount")
pyspark2 \
--master yarn \
--conf spark.ui.port=12901 \
--conf spark.dynamicAllocation.enabled=false \
--num-executors 4
sc.textFile("/user/itversity/randomtextwriter/part-m-00000"). \
flatMap(lambda line: line.split(" ")). \
map(lambda word: (word, 1)). \
reduceByKey(lambda x, y: x + y). \
saveAsTextFile("/user/itversity/pyspark/wordcount")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment