Skip to content
Create a gist now

Instantly share code, notes, and snippets.

load githubarchive data into sqlite3 database
# $> ruby load.rb
require 'yajl'
require 'zlib'
require 'sqlite3'
require 'open-uri'
input = ARGV.shift
# create the SQLite table schema
@schema = open('')
@schema = Yajl::Parser.parse(
@keys = {|r| r['name']}
# map GitHub JSON schema to flat CSV space based
# on provided Big Query column schema
def flatmap(h, e, prefix = '')
e.each do |k,v|
if v.is_a?(Hash)
flatmap(h, v, prefix+k+"_")
key = prefix+k
next if !@keys.include? key
case v
when TrueClass then h[key] = 1
when FalseClass then h[key] = 0
next if v.nil?
h[key] = v unless v.is_a? Array
# Create table schema
create_table = "create table if not exists events ( \n"
@schema.each do |column|
create_table += case column['type']
"#{column['name']} integer, \n"
when 'STRING'
"#{column['name']} text, \n"
create_table = create_table.chomp(", \n") + ");"
# load the data
db ="github.sqlite")
gz = open(input)
js =
Yajl::Parser.parse(js) do |event|
row = flatmap({}, event)
keys, values = row.keys, row.values
db.execute "INSERT INTO events(#{keys.join(',')}) VALUES (#{(['?'] * keys.size).join(',')})", *values
# Run a query...
p db.execute("select count(*) from events")
bagrow commented Dec 6, 2012

Does this script still work for the latest json schema? I think it's broken/outdated. I've been trying to figure out what's going on, but I don't know ruby. Wish py-yajl supported streams… :(

bagrow commented Dec 6, 2012

Ahh there are newlines between the stream elements now, this is new and great…!


How can I construct a mysql insert statement for this:

db.execute "INSERT INTO events(#{keys.join(',')}) VALUES (#{(['?'] * keys.size).join(',')})", *values

the mysql-ruby module seems cannot do the same.


i run this script but it doesn't load the payload_commit info of pushevent. any suggestion?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.