Skip to content

Instantly share code, notes, and snippets.

@scottburton11
Forked from tsnow/README.md
Last active December 14, 2015 04:28
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save scottburton11/5028085 to your computer and use it in GitHub Desktop.
Save scottburton11/5028085 to your computer and use it in GitHub Desktop.

Spend 5 literal minutes thinking:

  • How much time do you have to devote to this exercise?
  • How can you verify that your solution is correct?

Then:

  1. Fork this gist and clone it locally.
  2. Send tsnowhite@taximagic.com and aharris@taximagic.com a link to your forked gist with "[gh_repo_stats]" in the subject. Provide an estimate of how long you'll be working on your solution in man-hours and an estimate of when you will be finished. Feel free to ask any questions about the assignment, as well.
  3. Complete the exercise, pushing your changes to the gist repo as you go.
  4. Send tsnowhite@taximagic.com and aharris@taximagic.com a link to your forked gist, along with the actual amount of time you spent on the problem, and a discussion of your solution. If you choose to use external dependencies besides ruby, please provide a Rakefile which allows one to run rake install to install the dependencies.
source :rubygems
gem 'yajl-ruby'
GEM
remote: http://rubygems.org/
specs:
yajl-ruby (1.1.0)
PLATFORMS
ruby
DEPENDENCIES
yajl-ruby
#!/usr/bin/env ruby
require 'bundler/setup'
require 'yajl'
require 'zlib'
require 'uri'
require 'time'
require 'open-uri'
require 'optparse'
class DateRange
include Enumerable
attr_reader :from, :to
def initialize(from, to)
@from, @to = from, to
end
def hour_blocks
return [hour_block(from)] if seconds_between == 0
@hour_blocks ||= (seconds_between/3600).ceil.times.map do |i|
hour_block(from + (i * 3600))
end
end
def seconds_between
to - from
end
def hour_block(time)
time.strftime("%Y-%m-%d-") + time.hour.to_s
end
def each
hour_blocks.each {|block| yield block }
end
end
module Github
module Events
class Event
attr_reader :attrs
def initialize(attrs)
@attrs = attrs
end
def event_name
attrs['type']
end
def created_at
Time.parse(attrs['created_at'])
end
end
module Repository
def repository
attrs['repository']
end
def owner_name
repository['owner']
end
def repo_name
repository['name']
end
def key
"#{owner_name}/#{repo_name}"
end
end
class PushEvent < Event
include Repository
end
class PullRequestEvent < Event
include Repository
end
# Had I time, or were so inclined, I would define an Event subclass for each
# type at http://developer.github.com/v3/activity/events/types that responds
# to #key.
# Why are we doing this? Because, regardless of what Github claims, the JSON
# schema for each event type is not all that similar. Since we only ever filter
# by one event type, it's reasonable to expect different result types.
class Index < Hash
def <<(event)
if self[event.key]
self[event.key][:count] += 1
self[event.key][:events] << event
else
self[event.key] = {
:count => 1,
:events => [event]
}
end
end
def sort
to_a.sort { |a,b| a[1][:count] <=> b[1][:count] }
end
end
end
end
class Github::Stats
# WAT - http://data.githubarchive.org expects keys in Mountain time zone,
# and returns events in it as well.
GITHUB_ARCHIVE_TOTALLY_ARBITRARY_TIMEZONE_WAT = "-07:00"
attr_reader :options, :errors
def initialize(options)
@options = options
@errors = []
end
def to
Time.parse(options[:to]).localtime(GITHUB_ARCHIVE_TOTALLY_ARBITRARY_TIMEZONE_WAT)
end
def from
Time.parse(options[:from]).localtime(GITHUB_ARCHIVE_TOTALLY_ARBITRARY_TIMEZONE_WAT)
end
def limit
options[:limit].to_i
end
def base_uri
URI.parse("http://data.githubarchive.org/")
end
def date_range
DateRange.new(from, to)
end
def event_index
@event_index ||= Github::Events::Index.new
end
def gather
date_range.each do |block|
uri = base_uri
uri.path = "/#{block}.json.gz"
gzip = Zlib::GzipReader.new(open(uri)).read
Yajl::Parser.parse(gzip) do |hash|
if hash['type'] == options[:event_type]
event = event_class.new(hash)
event_index << event if (from <= event.created_at && event.created_at < to)
end
end
end
end
def report
event_index.sort.reverse.take(limit).each do |event|
reporter.puts "#{event[0].to_s} - #{event[1][:count]} events"
end
end
attr_writer :reporter
def reporter
@reporter ||= STDOUT
end
# Let's just say every input is valid, and if the program blows up, it's user error.
def valid?
true
end
def event_class
@event_class ||= Github::Events::const_get(options[:event_type])
end
end
options = {}
OptionParser.new do |opts|
opts.banner = "Usage gh_repo_stats --after 2012-11-01T13:00:00Z --before 2012-11-02T03:12:14-03:00 --event PushEvent --count 42"
opts.on("--after DATE", "Query dates on or after DATE, an ISO-8601 formatted date string") do |a|
options[:from] = a
end
opts.on("--before DATE", "Query dates before DATE, an ISO-8601 formatted date string") do |b|
options[:to] = b
end
opts.on("--event TYPE", "Filter event type; see http://developer.github.com/v3/activity/events/types/#gistevent for details") do |e|
options[:event_type] = e
end
opts.on("--count N", "Report on the top (n) results") do |c|
options[:limit] = c
end
end.parse!
stats = Github::Stats.new(options)
if stats.valid?
stats.gather
stats.report
else
stats.errors.each do |error|
puts error
end
end

Create a ruby script utilizing http://www.githubarchive.org/ which has the following options and output:

   > ./gh_repo_stats -h
   gh_repo_stats [--after DATETIME] [--before DATETIME] [--event EVENT_NAME] [-n COUNT]

Here's some example interactions:

   > ./gh_repo_stats --after 2012-10-12T10:00:00-08:00 --before 2012-10-12T11:00:00-08:00 \
                     --event PushEvent -n 20
   hughht5/Guido - 30 events
   gordonbrander/French-Toast-Assets - 19 events
   iblanky/iblanky.github.com - 18 events
   honielui/hackathon_hnb - 17 events
   MilkZoft/codejobs - 16 events
   danberindei/infinispan - 12 events
   gilgomesp/site - 11 events
   HKCodeCamp/bartr - 11 events
   josephwilk/tlearn-rb - 10 events
   soapboxCommunications/Young-Final - 10 events
   sakai-mirror/melete - 10 events
   fabiantheblind/auto-typo-adbe-id - 9 events
   demobox/jclouds-maven-site-1.5.2 - 9 events
   davecoa/opendataday_workshop - 9 events
   freebsd/freebsd-ports - 9 events
   joonasrouhiainen/studio4-election - 9 events
   bunuelo/funk2 - 9 events
   Certainist/To_aru_Library - 9 events
   lanticezdd/uni - 8 events
   pfinette/finettedotcom - 8 events
   >
   > ./gh_repo_stats --after 2013-01-12T10:00:01Z --before 2013-01-12T11:00:00Z \
                     --event WatchEvent -n 2
   airbnb/javascript - 19 events
   piranha/gostatic - 4 events
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment