Skip to content

Instantly share code, notes, and snippets.

@eloipuertas
Last active May 13, 2020 11:37
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save eloipuertas/8a0ac594fc57f0340c877efa23dec6c3 to your computer and use it in GitHub Desktop.
Save eloipuertas/8a0ac594fc57f0340c877efa23dec6c3 to your computer and use it in GitHub Desktop.
findWords.py
from pyspark import SparkContext
from pyspark.sql.types import *
sc = SparkContext ('local[*]','pyspark')
src = "pride_and_prejudice.txt"
lines = sc.textFile(src)
no_lines = lines.filter( lambda x:x.lower().find("no") != -1)
print (no_lines.count())
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment