Skip to content

Instantly share code, notes, and snippets.

@milindjagre
Created December 2, 2016 04:22
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save milindjagre/8632b803e6570863be55b34864462f6f to your computer and use it in GitHub Desktop.
Save milindjagre/8632b803e6570863be55b34864462f6f to your computer and use it in GitHub Desktop.
from pyspark import SparkConf, SparkContext
conf = SparkConf().setMaster("local").setAppName("Filter")
sc = SparkContext(conf = conf)
lines = sc.textFile("hdfs://localhost:54310/numeric_input.txt")
input_strings = sc.parallelize(["Hello World", "Hi"])
splitted_strings = input_strings.map(lambda line:line.split(" ")).collect()
for str in splitted_strings:
print "----------"
print str
print "----------"
splitted_strings = input_strings.flatMap(lambda line:line.split(" ")).collect()
for str in splitted_strings:
print "----------"
print str
print "----------"
@milindjagre
Copy link
Author

This python file demonstrates the difference between map() and flatMap()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment