To run this, install pyspark via:
pip install pyspark
(either globally or in a virtualenv)
You'll also need Java 8 installed and may need to set the JAVA_HOME
environment variable.
Once that's done, to run the example here, save myfile.csv to your /tmp directory. Then you can run
spark-submit query_csv.py
to run your Spark application and print the results.