Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
PySpark starter for ten
echo "from pyspark import SparkContext, HiveContext, SparkConf" > sparking.py
echo "conf = SparkConf().setAppName('sparking')" >> sparking.py
echo 'conf.set("spark.sql.parquet.binaryAsString", "true")' >> sparking.py
echo "sc = SparkContext(conf=conf)" >> sparking.py
echo "sqlContext = HiveContext(sc)" >> sparking.py
echo "l = [('Alice', 1)]" >> sparking.py
echo "rdd = sc.parallelize(l)" >> sparking.py
echo "for x in rdd.take(10):" >> sparking.py
echo " print x" >> sparking.py
spark-submit --master yarn --deploy-mode cluster --supervise --name "sparking" sparking.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment