Skip to content

Instantly share code, notes, and snippets.

@TylerBrock
Created June 6, 2012 20:38
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save TylerBrock/2884606 to your computer and use it in GitHub Desktop.
Save TylerBrock/2884606 to your computer and use it in GitHub Desktop.
Twitter MapReduce with MongoDB+Hadoop Streaming Adapter
curl https://stream.twitter.com/1/statuses/sample.json -u<login>:<password> | mongoimport -d test -c live
#!/usr/bin/env ruby
require 'mongo-hadoop'
MongoHadoop.map do |document|
{ :_id => document['user']['time_zone'], :count => 1 }
end
#!/usr/bin/env ruby
require 'mongo-hadoop'
MongoHadoop.reduce do |key, values|
count = sum = 0
values.each do |value|
count += 1
sum += value['num']
end
{ :_id => key, :average => sum / count }
end
hadoop jar mongo-hadoop-streaming-assembly*.jar -mapper mapper.rb -reducer reducer.rb -inputURI mongodb://127.0.0.1/twitter.in -outputURI mongodb://127.0.0.1/twitter.out
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment