Skip to content

Instantly share code, notes, and snippets.

Jeremy B. Merrill jeremybmerrill

Block or report user

Report or block jeremybmerrill

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@jeremybmerrill
jeremybmerrill / PhotostreamJob.rb
Created Aug 10, 2012
PhotostreamJob demonstrates inheritance as a way of dynamically assigning a Resque job to a queue
View PhotostreamJob.rb
class PhotostreamJob < BaseJob
@queue = :photostreamphotos
end
@jeremybmerrill
jeremybmerrill / Resque.rake
Created Aug 10, 2012
Sample resque.rake for workers on multiple servers
View Resque.rake
require 'resque/tasks'
require 'resque_scheduler'
require 'resque_scheduler/tasks'
require 'resque_scheduler/server'
rails_root = ENV['RAILS_ROOT'] || File.dirname(__FILE__) + '/../..'
rails_env = ENV['RAILS_ENV'] || 'development'
resque_config = YAML.load_file(rails_root + '/config/resque.yml') #contains Redis's location on the network for different Rails environments
Resque.redis = resque_config[rails_env]
@jeremybmerrill
jeremybmerrill / BaseJob.rb
Created Aug 10, 2012
BaseJob -- inherited from by jobs with different queues
View BaseJob.rb
class BaseJob
def self.perform()
#Do the job.
end
end
@jeremybmerrill
jeremybmerrill / astrazeneca.rb
Last active Dec 20, 2015
How to scrape AstraZeneca's ASP.net disclosure page with Upton
View astrazeneca.rb
require 'upton'
class AstraZenecaScraper < Upton::Scraper
ROWS_PER_PAGE = 50
def initialize(index_url_array, site_meta)
@sleep_time_between_requests = 15
@site_meta = site_meta
@total_pages = @site_meta[:total_pages]
@az_time_period_identifier = @site_meta[:az_time_period_identifier]
@jeremybmerrill
jeremybmerrill / count_scraper.rb
Created Sep 8, 2013
Scrape the Los Angeles Review of Books for contributors and the authors of reviewed books, then classify those by gender by pronouns in their biographies (or statistical probability, if it's clear)
View count_scraper.rb
require 'upton'
require 'date'
require 'guess'
GLOBAL_VERBOSE = true
# - any lowercased pronoun is okay
# - capitalized pronouns are okay unless they're in a book title, which is a series of capitalized words;
# that is, capitalized pronouns are okay if there are zero alphabetic characters between them and a sentence-final punct
FEMALE_REGEXES = [/ she[\.,\s!?\' ]/, / her[\.,\s!?\' ]/,
@jeremybmerrill
jeremybmerrill / gender.rb
Last active Dec 24, 2015
first pass at ruby version of global name data
View gender.rb
require 'csv'
require 'set'
class Gender
def initialize(options={})
countries = Set.new([:us, :uk])
@threshold = options[:threshold] || 0.99
@names_counts = {}
@jeremybmerrill
jeremybmerrill / tabula_basic.rb
Created Jan 18, 2014
A snippet to extract spreadsheet data from a PDF using Tabula's tabula-extractor
View tabula_basic.rb
require 'tabula'
pdf_file_path = "czechmaybe.pdf"
outfilename = "czechmaybe.csv"
out = open(outfilename, 'w')
extractor = Tabula::Extraction::ObjectExtractor.new(pdf_file_path, [5] ) #:all ) # 1..2643
extractor.extract.each do |pdf_page|
pdf_page.spreadsheets.each do |spreadsheet|
@jeremybmerrill
jeremybmerrill / compstat.rb
Last active Jan 3, 2016
scrape a folder of NYPD CompStat PDFs to CSVs.
View compstat.rb
require 'tabula'
require 'fileutils'
folder_name = "compstat"
output_folder_name = "compstat_csvs"
#########################################################################
#########################################################################
FileUtils.mkdir_p(output_folder_name + "/")
@jeremybmerrill
jeremybmerrill / edc.rb
Last active Jan 3, 2016
Script to output the four tables from page 1 and page 3 of an NYC EDC report using Tabula.
View edc.rb
require 'tabula'
require 'fileutils'
folder_name = "EDC"
output_folder_name = "EDCcsvs"
#########################################################################
#########################################################################
FileUtils.mkdir_p(output_folder_name + "/")
View keybase.md

Keybase proof

I hereby claim:

  • I am jeremybmerrill on github.
  • I am jeremybmerrill (https://keybase.io/jeremybmerrill) on keybase.
  • I have a public key whose fingerprint is 441A 05CC B462 AF95 45FA 95B5 CDF7 BBEF F5A7 B374

To claim this, I am signing this object:

You can’t perform that action at this time.