Skip to content

Instantly share code, notes, and snippets.

@allenwlee
Last active August 29, 2015 13:57
Show Gist options
  • Save allenwlee/9636720 to your computer and use it in GitHub Desktop.
Save allenwlee/9636720 to your computer and use it in GitHub Desktop.

GitHub Archive leaderboard challenge

GitHub Archive provides stats for projects on GitHub via a RESTful HTTP service.

Create a command-line program that lists the most active repositories for a given time range. It should support the following interface:

  gh_repo_stats [--after DATETIME] [--before DATETIME] [--event EVENT_NAME] [-n COUNT]
gh_repo_stats --after 2012-11-01T13:00:00Z --before 2012-11-02T03:12:14-03:00 --event PushEvent --count 42

sakai-mirror/melete - 168 events
runningforworldpeace/feeds - 103 events
chapuni/llvm-project-submodule - 98 events
chapuni/llvm-project - 98 events
Frameset91/untitled0815 - 94 events
josmera01/juanrueda-internation - 78 events
artmig/artmig.github.com - 76 events
mozilla/mozilla-central - 69 events
bcomdlc/bcom-homepage-archive - 68 events
sakai-mirror/ambrosia - 68 events
sakai-mirror/test-center - 65 events
sakai-mirror/mneme - 65 events
klange/tales-of-darwinia - 63 events
esc/bottlepaste - 62 events
ehsan/mozilla-history-tools - 58 events
all4senses/Gv - 54 events
herry13/nuri - 51 events
SeqAlignViz/seqalignviz.github.com - 44 events
illcreative/Get-Home-Safe-NYC - 44 events
incxnt/incxnt.github.com - 41 events
daknok/Filmpje - 40 events
ChildOfWar/MIU - 40 events
aleontiev/Turntable.FM-Squared - 40 events
DigiZeit/dwa - 37 events
DigiZeit/dwa-debug - 37 events
danielcooper/radarsite - 37 events
navanjr/kts - 36 events
githubtrainer/poems - 35 events
wikimedia/mediawiki-extensions - 35 events
DigiZeit/dwa-pro - 35 events
gerritwm/MediaWiki - 34 events
cloudweekhec/stack - 34 events
andyisimprovised/goatmachine - 34 events
GreenplumChorus/chorus - 33 events
ros-gbp/nodelet_core-release - 33 events
honovation/veil - 33 events
McGill-CSB/PHYLO - 32 events
ceph/ceph - 31 events
aaronlbloom/dwa - 31 events
dotCMS/dotCMS - 31 events
RC5Group6/research-camp-5 - 30 events
ShreyaPandita/oftest - 30 events

Going further

  • There are 18 published Event Types. How would you manage them? What would you do if GitHub added more Event Types?
  • What factors impact performance? What would you do to improve them?
  • The example shows one type of output report. How would you add additional reporting formats?
  • If you had to implement this using only one gem, which would it be? Why?
class Archiver
require 'google/api_client'
require 'google/api_client/client_secrets'
require 'google/api_client/auth/installed_app'
require 'bigquery'
require 'json'
def initialize
print_line
puts "Welcome to the Github Archive Leaderboard"
puts "Find the most active repositories for a given time range"
puts "Type in the following (example):"
puts "2014-03-16 13:00:00,2014-03-18 15:12:14,PushEvent,10"
print_line
parse_input
end
def parse_input
print_prompt
input = gets.chomp.split(',')
@after = input[0]
@before = input[1]
@event_name = input[2]
@count = input[3]
get_data
end
def get_data
opts = {}
opts['client_id'] = '992331757661-9t3f55b02btia14okppskbi35bgn91eb.apps.googleusercontent.com'
opts['service_email'] = '992331757661-9t3f55b02btia14okppskbi35bgn91eb@developer.gserviceaccount.com'
opts['key'] = '39ebfef9a32360852aa95a4b900bf9fc59303ebb-privatekey.p12'
opts['project_id'] = '992331757661'
bq = BigQuery.new(opts)
@hash = bq.query(
"SELECT repository_name, count(repository_name) as pushes, repository_description, repository_url
FROM [githubarchive:github.timeline]
WHERE type='#{@event_name}'
AND repository_language='Ruby'
AND PARSE_UTC_USEC(created_at) >= PARSE_UTC_USEC('#{@after}')
AND PARSE_UTC_USEC(created_at) < PARSE_UTC_USEC('#{@before}')
GROUP BY repository_name, repository_description, repository_url
ORDER BY pushes DESC
LIMIT #{@count}"
)
output
end
def output
repos = @hash['rows']
repos.each do |repo|
puts "#{repo['f'][3]['v'][19..-1]} - #{repo['f'][1]['v']} events"
end
end
def print_prompt
puts "Type here:"
end
def print_line
puts "----------------------------------------"
end
end
Archiver.new

Run:

$ ruby github_archive.rb 

Prompt:

----------------------------------------
Welcome to the Github Archive Leaderboard
Find the most active repositories for a given time range
Type in the following (example):
2014-03-16 13:00:00,2014-03-18 15:12:14,PushEvent,10
----------------------------------------
Type here:

User types in:

2014-03-16 13:00:00,2014-03-18 15:12:14,PushEvent,10

Output:

W, [2014-03-18T23:33:37.405502 #20264]  WARN -- : Google::APIClient - Please provide :application_name and :application_version when initializing the client
lingohub/github-hooks-test - 119 events
Ortask/mutator - 113 events
CocoaPods/Specs - 109 events
Homebrew/homebrew - 85 events
BrewTestBot/homebrew - 61 events
YusukeAoki/webap_materials - 56 events
zunda/emoticommits - 45 events
moneyadviceservice/frontend - 43 events
avondohren/project-git - 41 events
opf/openproject - 41 events
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment