Skip to content

Instantly share code, notes, and snippets.

@jamiekt
Last active January 5, 2017 15:08
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jamiekt/47f266584e9554dad21f7f6d96b6c5f0 to your computer and use it in GitHub Desktop.
Save jamiekt/47f266584e9554dad21f7f6d96b6c5f0 to your computer and use it in GitHub Desktop.
PySpark starter for ten
echo "from pyspark import SparkContext, HiveContext, SparkConf" > sparking.py
echo "conf = SparkConf().setAppName('sparking')" >> sparking.py
echo 'conf.set("spark.sql.parquet.binaryAsString", "true")' >> sparking.py
echo "sc = SparkContext(conf=conf)" >> sparking.py
echo "sqlContext = HiveContext(sc)" >> sparking.py
echo "l = [('Alice', 1)]" >> sparking.py
echo "rdd = sc.parallelize(l)" >> sparking.py
echo "for x in rdd.take(10):" >> sparking.py
echo " print x" >> sparking.py
spark-submit --master yarn --deploy-mode cluster --supervise --name "sparking" sparking.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment