Skip to content

Instantly share code, notes, and snippets.

@nithyadurai87
Created July 28, 2018 15:18
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nithyadurai87/171b16a6f4b4c4390f58c77ea1956580 to your computer and use it in GitHub Desktop.
Save nithyadurai87/171b16a6f4b4c4390f58c77ea1956580 to your computer and use it in GitHub Desktop.
spark-notes.txt - for spark tutorial
wget redrockdigimark.com/apachemirror/spark/spark-2.3.1/spark-2.3.1-bin-hadoop2.7.tgz
tar -xzvf spark-2.3.1-bin-hadoop2.7.tgz
sudo mv spark-2.3.1-bin-hadoop2.7 /usr/local/bigdata
sudo mv spark-2.3.1-bin-hadoop2.7 spark
export SPARK_HOME=/usr/local/bigdata/spark
export PATH=$PATH:/usr/local/bigdata/spark/bin
rdd1 = sc.textFile ("file:///home/shrini/women.txt")
rdd1.collect()
rdd1.count()
rdd1.distinct().count()
rdd1.countByValue()
rdd1.first()
rdd1.take(3)
rdd1.takeSample(False,4,1)
rdd1.sample(True,0.5,3).collect()
rdd1.map(lambda i : i.split(" ")).take(3)
rdd1.flatMap(lambda i : i.split(" ")).take(3)
rdd1.filter(lambda i : ("Chennai" in i)).count()
rdd1.getNumPartitions()
sudo mv log4j.properties.template log4j.properties
/usr/local/bigdata/spark/bin/spark-submit ~/counting.py
/usr/local/bigdata/spark/bin/spark-submit ~/populations.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment