Skip to content

Instantly share code, notes, and snippets.

Jeremy B. Merrill jeremybmerrill

Block or report user

Report or block jeremybmerrill

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@jeremybmerrill
jeremybmerrill / atlantacrime2012until2017.small.csv
Last active Feb 28, 2018
resources for an observable notebook
View atlantacrime2012until2017.small.csv
We can't make this file beautiful and searchable because it's too large.
offense_id,occur_date,UC2 Literal,neighborhood,npu
110171050,01/14/2012,LARCENY-NON VEHICLE,Sweet Auburn,M
110181057,08/22/2011,LARCENY-NON VEHICLE,Glenrose Heights,Z
112032439,07/22/2011,AUTO THEFT,Downtown,M
112152334,08/03/2011,AUTO THEFT,Perkerson,X
113491709,12/07/2011,LARCENY-FROM VEHICLE,Hills Park,D
120010023,01/01/2012,AGG ASSAULT,The Villages at Carver,Y
120010069,12/31/2011,LARCENY-FROM VEHICLE,Old Fourth Ward,M
120010072,12/31/2011,LARCENY-FROM VEHICLE,English Avenue,L
120010086,01/01/2012,LARCENY-FROM VEHICLE,Morningside/Lenox Park,F
@jeremybmerrill
jeremybmerrill / demo.html
Created Oct 3, 2016
a demonstration of the shenanigans caused by the isTrusted attribute
View demo.html
<!DOCTYPE html>
<html>
<head>
<title>How To Cause Trouble With Events' isTrusted Attribute</title>
<meta charset="UTF-8">
<script
src="http://code.jquery.com/jquery-2.2.4.min.js"
integrity="sha256-BbhdlvQf/xTY9gja0Dq3HiwQF8LaCRTXxZKRutelT44="
crossorigin="anonymous"></script>
</head>
@jeremybmerrill
jeremybmerrill / airplanes.sql
Created Mar 13, 2016
color lines by gradient
View airplanes.sql
create table flight_segments as
SELECT hexid,start_time,end_time,callsign,point,
-- take a substring if the length reamining in the segment is greater than 5280 feet (1609.34 m)
-- otherwise take the remainder
ST_LineSubstring(geom, 1609.34*n/length,
CASE
WHEN 1609.34*(n+1) < length THEN 1609.34*(n+1)/length
ELSE 1
END) as geom
FROM
@jeremybmerrill
jeremybmerrill / code_tos.rb
Created Jan 20, 2016
quick-and-dirty code things, CSV backend
View code_tos.rb
require 'sinatra'
require 'csv'
$csv_read_path = "my_thing.uncoded.csv"
$csv_write_path = "my_thing.coded.csv"
$data = CSV.read($csv_read_path, {:headers => true})
def write_csv!
CSV.open($csv_write_path, 'wb') do |csv|
@jeremybmerrill
jeremybmerrill / edc.rb
Last active Jan 3, 2016
Script to output the four tables from page 1 and page 3 of an NYC EDC report using Tabula.
View edc.rb
require 'tabula'
require 'fileutils'
folder_name = "EDC"
output_folder_name = "EDCcsvs"
#########################################################################
#########################################################################
FileUtils.mkdir_p(output_folder_name + "/")
@jeremybmerrill
jeremybmerrill / compstat.rb
Last active Jan 3, 2016
scrape a folder of NYPD CompStat PDFs to CSVs.
View compstat.rb
require 'tabula'
require 'fileutils'
folder_name = "compstat"
output_folder_name = "compstat_csvs"
#########################################################################
#########################################################################
FileUtils.mkdir_p(output_folder_name + "/")
@jeremybmerrill
jeremybmerrill / tabula_basic.rb
Created Jan 18, 2014
A snippet to extract spreadsheet data from a PDF using Tabula's tabula-extractor
View tabula_basic.rb
require 'tabula'
pdf_file_path = "czechmaybe.pdf"
outfilename = "czechmaybe.csv"
out = open(outfilename, 'w')
extractor = Tabula::Extraction::ObjectExtractor.new(pdf_file_path, [5] ) #:all ) # 1..2643
extractor.extract.each do |pdf_page|
pdf_page.spreadsheets.each do |spreadsheet|
@jeremybmerrill
jeremybmerrill / gender.rb
Last active Dec 24, 2015
first pass at ruby version of global name data
View gender.rb
require 'csv'
require 'set'
class Gender
def initialize(options={})
countries = Set.new([:us, :uk])
@threshold = options[:threshold] || 0.99
@names_counts = {}
@jeremybmerrill
jeremybmerrill / count_scraper.rb
Created Sep 8, 2013
Scrape the Los Angeles Review of Books for contributors and the authors of reviewed books, then classify those by gender by pronouns in their biographies (or statistical probability, if it's clear)
View count_scraper.rb
require 'upton'
require 'date'
require 'guess'
GLOBAL_VERBOSE = true
# - any lowercased pronoun is okay
# - capitalized pronouns are okay unless they're in a book title, which is a series of capitalized words;
# that is, capitalized pronouns are okay if there are zero alphabetic characters between them and a sentence-final punct
FEMALE_REGEXES = [/ she[\.,\s!?\' ]/, / her[\.,\s!?\' ]/,
@jeremybmerrill
jeremybmerrill / astrazeneca.rb
Last active Dec 20, 2015
How to scrape AstraZeneca's ASP.net disclosure page with Upton
View astrazeneca.rb
require 'upton'
class AstraZenecaScraper < Upton::Scraper
ROWS_PER_PAGE = 50
def initialize(index_url_array, site_meta)
@sleep_time_between_requests = 15
@site_meta = site_meta
@total_pages = @site_meta[:total_pages]
@az_time_period_identifier = @site_meta[:az_time_period_identifier]
You can’t perform that action at this time.