Skip to content

Instantly share code, notes, and snippets.

@neilfws
Created January 6, 2010 11:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save neilfws/270228 to your computer and use it in GitHub Desktop.
Save neilfws/270228 to your computer and use it in GitHub Desktop.
Archive a FriendFeed feed in MongoDB
#!/usr/bin/ruby
require "rubygems"
require "mongo"
require "json/pure"
require "open-uri"
# db config
db = Mongo::Connection.new.db('friendfeed')
col = db.collection('lifesci')
# fetch json
0.step(9900, 100) {|n|
f = open("http://friendfeed-api.com/v2/feed/the-life-scientists?start=#{n}&num=100").read
j = JSON.parse(f)
break if j['entries'].count == 0
j['entries'].each do |entry|
if col.find({:_id => entry['id']}).count == 0
entry[:_id] = entry['id']
entry.delete('id')
col.save(entry)
end
end
puts "Processed entries #{n} - #{n + 99}", "Database contains #{col.count} documents."
}
puts "No more entries to process. Database contains #{col.count} documents."
@neilfws
Copy link
Author

neilfws commented Aug 13, 2010

  1. Re-written as a rake task; save as "Rakefile" and run as "rake db:seed feed=FEED_ID".
  2. entry ID alone not sufficient as unique key (may appear in several feeds); so prepended feed sup_id.

@neilfws
Copy link
Author

neilfws commented Dec 19, 2010

Changed step back to 9900; don't think anything above this returns more results.

@neilfws
Copy link
Author

neilfws commented Feb 1, 2011

Added a sleep() to this version of code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment