Skip to content

Instantly share code, notes, and snippets.

@robsondepaula
Created August 1, 2018 17:15
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save robsondepaula/4fc522b078745a55bc2f27f4272c6c87 to your computer and use it in GitHub Desktop.
Save robsondepaula/4fc522b078745a55bc2f27f4272c6c87 to your computer and use it in GitHub Desktop.
ssh -p 2222 training@localhost
[ ] Using small sample from data
- create it:
head -50 ../data/purchases.txt > testfile
- pass it to mapper script
cat testfile | ./mapper.py
- do all the map reduce in one command
cat testfile | ./mapper.py | sort | ./reducer.py
- run all on a cluster
hs mapper.py reducer.py myinput output2
[ ] Check progress on hadoop job tracker
localhost:50030
[ ] Check map reduce output and print it
hadoop fs -ls output2
hadoop fs -cat output2/part-00000
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment