Skip to content

Instantly share code, notes, and snippets.

@dgadiraju
Created November 13, 2017 01:04
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dgadiraju/8ede0da8d49c8cfa1066cb82df48860e to your computer and use it in GitHub Desktop.
Save dgadiraju/8ede0da8d49c8cfa1066cb82df48860e to your computer and use it in GitHub Desktop.
/*
spark-shell --master yarn \
--conf spark.ui.port=12456 \
--num-executors 10 \
--executor-memory 3G \
--executor-cores 2 \
--packages com.databricks:spark-avro_2.10:2.0.1
*/
val lines = sc.textFile("/public/randomtextwriter")
val words = lines.flatMap(line => line.split(" "))
val tuples = words.map(word => (word, 1))
val wordCount = tuples.reduceByKey((total, value) => total + value, 8)
val wordCountDF = wordCount.toDF("word", "count")
import com.databricks.spark.avro._
wordCountDF.write.avro("/user/dgadiraju/solutions/solution05/wordcount")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment