Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
PySpark starter for ten
echo "from pyspark import SparkContext, HiveContext, SparkConf" >
echo "conf = SparkConf().setAppName('sparking')" >>
echo 'conf.set("spark.sql.parquet.binaryAsString", "true")' >>
echo "sc = SparkContext(conf=conf)" >>
echo "sqlContext = HiveContext(sc)" >>
echo "l = [('Alice', 1)]" >>
echo "rdd = sc.parallelize(l)" >>
echo "for x in rdd.take(10):" >>
echo " print x" >>
spark-submit --master yarn --deploy-mode cluster --supervise --name "sparking"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment