Skip to content

Instantly share code, notes, and snippets.

@greeness
Created November 16, 2011 04:36
Show Gist options
  • Save greeness/1369247 to your computer and use it in GitHub Desktop.
Save greeness/1369247 to your computer and use it in GitHub Desktop.
dumbo running command line using cache file in hdfs
dumbo start demo_dumbo.py -hadoop /usr/lib/hadoop -input shares -output video_demos -outputformat text -files hdfs://ec2-xxx-xx-xx-xx.compute-1.amazonaws.com:8020/user/ubuntu/users/part-m-00000
### piece of code in demo_dumbo.py
for line in file('part-m-00000'):
print line
# ----------------
dumbo start demo_dumbo.py -hadoop /usr/lib/hadoop -input shares -output video_demos -outputformat text -files hdfs://ec2-xxx-xx-xx-xx.compute-1.amazonaws.com:8020/user/ubuntu/users
### piece of code in demo_dumbo.py
import glob
for filename in glob.glob('users/part*'):
for line in file(filename):
print line
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment