Skip to content

Instantly share code, notes, and snippets.

@kitplummer
Created January 9, 2013 17:53
Show Gist options
  • Save kitplummer/4495228 to your computer and use it in GitHub Desktop.
Save kitplummer/4495228 to your computer and use it in GitHub Desktop.
Fetch from githubarchive.org and drop into MongoDB...
require 'open-uri'
require 'zlib'
require 'yajl'
require 'mongo'
require 'date'
require 'iconv'
include Mongo
@client = MongoClient.new('localhost', 27017)
@db = @client['gh_ark']
@coll = @db['events']
@coll.remove
sd = Date.parse('2012-01-03')
ed = Date.parse('2012-01-04')
# Loop over the days
sd.upto(ed) do |date|
url = "http://data.githubarchive.org/#{date}-23.json.gz"
puts "Getting #{url}"
gz = open(url)
js = Zlib::GzipReader.new(gz).read
ic = Iconv.new('UTF-8//IGNORE', 'UTF-8')
js = ic.iconv(js + ' ')[0..-2]
Yajl::Parser.parse(js) do |event|
puts "inserting events for #{date}..."
puts event
@coll.insert(event)
end
end
puts "There were #{@coll.count} events."
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment