Skip to content

Instantly share code, notes, and snippets.

@neilfws
Created January 6, 2010 11:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save neilfws/270228 to your computer and use it in GitHub Desktop.
Save neilfws/270228 to your computer and use it in GitHub Desktop.
Archive a FriendFeed feed in MongoDB
namespace :db do
require "mongo"
require "json/pure"
require "open-uri"
feed = ENV['feed']
db = Mongo::Connection.new.db('friendfeed')
col = db.collection('entries')
desc "Seed database with feed entries"
task :seed do
0.step(9900, 100) do |n|
j = JSON.parse(open("http://friendfeed-api.com/v2/feed/#{feed}?start=#{n}&num=100").read)
break if j['entries'].count == 0
entries = j['entries']
j.delete('entries')
j['updated_at'] = Time.now
entries.each do |entry|
entry['_id'] = "#{j['sup_id']}/#{entry['id']}"
entry.delete('id')
entry['feed'] = j
col.save(entry)
end
puts "Processed entries #{n} - #{n + 99}"
sleep(3)
end
puts "Done: database contains #{col.count} entries."
end
end
@neilfws
Copy link
Author

neilfws commented Aug 13, 2010

  1. Re-written as a rake task; save as "Rakefile" and run as "rake db:seed feed=FEED_ID".
  2. entry ID alone not sufficient as unique key (may appear in several feeds); so prepended feed sup_id.

@neilfws
Copy link
Author

neilfws commented Dec 19, 2010

Changed step back to 9900; don't think anything above this returns more results.

@neilfws
Copy link
Author

neilfws commented Feb 1, 2011

Added a sleep() to this version of code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment