Skip to content

Instantly share code, notes, and snippets.

@mwinkle
Created January 5, 2016 15:14
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mwinkle/f11407dfebbba952adc3 to your computer and use it in GitHub Desktop.
Save mwinkle/f11407dfebbba952adc3 to your computer and use it in GitHub Desktop.
su hdfs
hadoop fs –mkdir /user/root
hadoop fs –chmod 777 /user/root
hadoop fs –chmod 777 /user/guest
exit
wget http://www.gutenberg.org/files/50831/50831-0.txt
hadoop fs -copyFromLocal ./50831-0.txt /user/guest/
hadoop fs -put ./install.log /user/guest/install.log
pyspark...
text_file = sc.textFile("hdfs://sandbox.hortonworks.com/user/guest/install.log")
counts = text_file.flatMap( lambda line: line.split(" ")) \
.map(lambda word: (word, 1) ) \
.reduceByKey(lambda a, b : a + b)
counts.saveAsTextFile("hdfs://sandbox.hortonworks.com/user/guest/output1.txt")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment