Skip to content

Instantly share code, notes, and snippets.

@darkjh
Last active August 29, 2015 14:05
Show Gist options
  • Save darkjh/8b358648ec1d99e3f35a to your computer and use it in GitHub Desktop.
Save darkjh/8b358648ec1d99e3f35a to your computer and use it in GitHub Desktop.
import dpark
lines = dpark.textFile('./bible.txt', numSplits=4)
lines = lines.map(lambda line: line.strip())
words = lines.flatMap(lambda line: line.split()).map(lambda x: (x, 1))
wc = words.reduceByKey(lambda x, y: x + y)
wc.saveAsCSVFile('/tmp/dpark_result/', dialect='excel-tab', compress=True)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment