Skip to content

Instantly share code, notes, and snippets.

@corneliouzbett
Created March 15, 2019 14:00
Show Gist options
  • Save corneliouzbett/13d3e88d599b11e3f13220210363f2c4 to your computer and use it in GitHub Desktop.
Save corneliouzbett/13d3e88d599b11e3f13220210363f2c4 to your computer and use it in GitHub Desktop.
It is a simple python code using apache spark to count words in text file
from pyspark import SparkContext, SparkConf
def display_words(words):
for w, we in words.items():
print("{} : {}".format(w, we))
if __name__ == "__main__":
conf = SparkConf().setAppName("word count").setMaster("local[2]")
sc = SparkContext(conf = conf)
lines = sc.textFile("in/word_count.text")
total_lengths = lines.map(lambda s: len(s)).reduce(lambda a,b: a+b)
words = lines.flatMap(lambda line: line.split(" "))
wordCounts = words.countByValue()
display_words(wordCounts)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment