Skip to content

Instantly share code, notes, and snippets.

@milindjagre
Last active November 15, 2016 12:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save milindjagre/3cfc7cc4203b36728c08eacd95c0c158 to your computer and use it in GitHub Desktop.
Save milindjagre/3cfc7cc4203b36728c08eacd95c0c158 to your computer and use it in GitHub Desktop.
from pyspark import SparkConf, SparkContext
conf = SparkConf().setMaster("local").setAppName("Filter")
sc = SparkContext(conf = conf)
def containsMilind(s):
return "Milind" in s
lines = sc.textFile("hdfs://localhost:54310/input.txt")
lambda_function = lines.filter(lambda x: "Milind" in x)
linecount = lambda_function.count()
i = 1
for line in lambda_function.take(linecount):
print "-------"
print "LAMBDA FUNCTION LINE " , i , " " + line
i = i+1
print "-------"
user_function = lines.filter(containsMilind)
linecount = user_function.count()
i = 1
for line in user_function.take(linecount):
print "-------"
print "USER FUNCTION LINE " , i , " " + line
i = i+1
print "-------"
sc.stop()
@milindjagre
Copy link
Author

This python file demonstrates the two ways in which we can define function in spark python api and how to use those functions independently on the same input RDD

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment