Skip to content

Instantly share code, notes, and snippets.

@milindjagre
Created November 7, 2016 01:32
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save milindjagre/be9fbd62e2f82e7c106f79edecee0925 to your computer and use it in GitHub Desktop.
Save milindjagre/be9fbd62e2f82e7c106f79edecee0925 to your computer and use it in GitHub Desktop.
from pyspark import SparkConf, SparkContext
conf = SparkConf().setMaster("local").setAppName("Filter")
sc = SparkContext(conf = conf)
lines = sc.textFile("hdfs://localhost:54310/input.txt")
filter_lines = lines.filter(lambda x: "Milind" in x)
linecount = filter_lines.count()
i = 1
for line in filter_lines.take(linecount):
print "-------"
print "LINE " , i , " " + line
i = i+1
print "-------"
sc.stop()
@milindjagre
Copy link
Author

This python code filters the file input.txt on "Milind" keyword.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment