Skip to content

Instantly share code, notes, and snippets.

@wfaria
Created August 23, 2018 17:24
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save wfaria/a8ded9f699e10c2eb4fb9b89db43e960 to your computer and use it in GitHub Desktop.
Save wfaria/a8ded9f699e10c2eb4fb9b89db43e960 to your computer and use it in GitHub Desktop.
PySpark test code
from pyspark import SparkContext
dataFile = "./sbin/start-master.sh"
sc = SparkContext("spark://ip-XXX-XX-X-XX.sa-east-1.compute.internal:7077", "Simple App")
textRdd = sc.textFile(dataFile)
print "Number of lines: ", textRdd.count()
print "Number of lines with 8080: ", textRdd.filter(lambda x : '8080' in x).count()
sc.stop()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment