Last active
November 15, 2016 12:39
-
-
Save milindjagre/3cfc7cc4203b36728c08eacd95c0c158 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from pyspark import SparkConf, SparkContext | |
conf = SparkConf().setMaster("local").setAppName("Filter") | |
sc = SparkContext(conf = conf) | |
def containsMilind(s): | |
return "Milind" in s | |
lines = sc.textFile("hdfs://localhost:54310/input.txt") | |
lambda_function = lines.filter(lambda x: "Milind" in x) | |
linecount = lambda_function.count() | |
i = 1 | |
for line in lambda_function.take(linecount): | |
print "-------" | |
print "LAMBDA FUNCTION LINE " , i , " " + line | |
i = i+1 | |
print "-------" | |
user_function = lines.filter(containsMilind) | |
linecount = user_function.count() | |
i = 1 | |
for line in user_function.take(linecount): | |
print "-------" | |
print "USER FUNCTION LINE " , i , " " + line | |
i = i+1 | |
print "-------" | |
sc.stop() |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This python file demonstrates the two ways in which we can define function in spark python api and how to use those functions independently on the same input RDD