Skip to content

Instantly share code, notes, and snippets.

View jeremybmerrill's full-sized avatar

Jeremy B. Merrill jeremybmerrill

View GitHub Profile
@jeremybmerrill
jeremybmerrill / astrazeneca.rb
Last active December 20, 2015 17:19
How to scrape AstraZeneca's ASP.net disclosure page with Upton
require 'upton'
class AstraZenecaScraper < Upton::Scraper
ROWS_PER_PAGE = 50
def initialize(index_url_array, site_meta)
@sleep_time_between_requests = 15
@site_meta = site_meta
@total_pages = @site_meta[:total_pages]
@az_time_period_identifier = @site_meta[:az_time_period_identifier]
@jeremybmerrill
jeremybmerrill / BaseJob.rb
Created August 10, 2012 22:41
BaseJob -- inherited from by jobs with different queues
class BaseJob
def self.perform()
#Do the job.
end
end
@jeremybmerrill
jeremybmerrill / PhotostreamJob.rb
Created August 10, 2012 22:39
PhotostreamJob demonstrates inheritance as a way of dynamically assigning a Resque job to a queue
class PhotostreamJob < BaseJob
@queue = :photostreamphotos
end
@jeremybmerrill
jeremybmerrill / Resque.rake
Created August 10, 2012 22:40
Sample resque.rake for workers on multiple servers
require 'resque/tasks'
require 'resque_scheduler'
require 'resque_scheduler/tasks'
require 'resque_scheduler/server'
rails_root = ENV['RAILS_ROOT'] || File.dirname(__FILE__) + '/../..'
rails_env = ENV['RAILS_ENV'] || 'development'
resque_config = YAML.load_file(rails_root + '/config/resque.yml') #contains Redis's location on the network for different Rails environments
Resque.redis = resque_config[rails_env]
@jeremybmerrill
jeremybmerrill / ocr_pdf.rb
Created September 24, 2015 20:09
ocr a pdf
#! /usr/bin/env ruby
require 'pdfshaver'
# brew install ghostscript imagemagick #yikes
# brew install tesseract --HEAD # needs >=3.04
ARGV.each do |pdf|
puts pdf
pdf_basename = pdf.gsub(".pdf", '')
@jeremybmerrill
jeremybmerrill / bpl-locations.geojson
Created January 2, 2015 22:37
Brooklyn Public Library locations in GeoJSON format
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@jeremybmerrill
jeremybmerrill / cyberlist.yml
Created December 23, 2014 02:51
Most common `cyber X` word-pairs in certain New York Times, ProPublica, Washington Post, Politico articles and congress speeches since sometime in 2013
---
cyber security: 384
cyber threats: 87
cyber attacks: 826
cyber businesses: 1
cyber protection: 1
cyber espionage: 10
cyber surveillance: 1
cyber theft: 2
cyber area: 1
@jeremybmerrill
jeremybmerrill / tabula-win-tester.rb
Last active August 29, 2015 14:09
Tracking down the commit that broke tabula-0.96 for Windows (issue #203)
#!/usr/bin/env ruby
# Tabula 0.9.6 for Windows fails: Unknown Error (206) - __UNKNOWN_CONSTANT__
# for more discussion see https://github.com/tabulapdf/tabula/issues/203
require 'open3'
revisions = ["3286e6ce0a2ee9eee617b0b1b23c3ada1effcbac",
"41307d4c0085a8b3bad2797549bf6b94a8a765c3",
"45404fa07fb3ddaf2280ec03b7d2f84e20396e88", #fails UNKNOWN_CONSTANT
@jeremybmerrill
jeremybmerrill / keybase.md
Created October 16, 2014 02:21
keybase verification

Keybase proof

I hereby claim:

  • I am jeremybmerrill on github.
  • I am jeremybmerrill (https://keybase.io/jeremybmerrill) on keybase.
  • I have a public key whose fingerprint is 1B27 B244 205A C0B3 AAED 0F02 7780 C469 4F62 1BA0

To claim this, I am signing this object:

@jeremybmerrill
jeremybmerrill / gist:753a4dea25170b1513fb
Created August 18, 2014 22:04
house points for hubot
# Description:
# Give, Take and List User Points
#
# Dependencies:
# None
#
# Configuration:
# None
#
# Commands: