Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

View wfaria's full-sized avatar

Waldecir Faria wfaria

View GitHub Profile
@wfaria
wfaria / PySparkTest.py
Created August 23, 2018 17:24
PySpark test code
from pyspark import SparkContext
dataFile = "./sbin/start-master.sh"
sc = SparkContext("spark://ip-XXX-XX-X-XX.sa-east-1.compute.internal:7077", "Simple App")
textRdd = sc.textFile(dataFile)
print "Number of lines: ", textRdd.count()
print "Number of lines with 8080: ", textRdd.filter(lambda x : '8080' in x).count()
sc.stop()