Skip to content

Instantly share code, notes, and snippets.

@sepulworld
Created November 22, 2017 22:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sepulworld/51fbd6757d3571df0ecb19cc0c1c7403 to your computer and use it in GitHub Desktop.
Save sepulworld/51fbd6757d3571df0ecb19cc0c1c7403 to your computer and use it in GitHub Desktop.
Word Count Spark
from __future__ import print_function
from pyspark import SparkContext
import sys
if __name__ == "__main__":
if len(sys.argv) != 3:
print("Usage: wordcount ", file=sys.stderr)
exit(-1)
sc = SparkContext(appName="WordCount")
text_file = sc.textFile(sys.argv[1])
counts = text_file.flatMap(lambda line: line.split(" ")).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a + b)
counts.saveAsTextFile(sys.argv[2])
sc.stop()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment