This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'upton' | |
class AstraZenecaScraper < Upton::Scraper | |
ROWS_PER_PAGE = 50 | |
def initialize(index_url_array, site_meta) | |
@sleep_time_between_requests = 15 | |
@site_meta = site_meta | |
@total_pages = @site_meta[:total_pages] | |
@az_time_period_identifier = @site_meta[:az_time_period_identifier] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'upton' | |
require 'date' | |
require 'guess' | |
GLOBAL_VERBOSE = true | |
# - any lowercased pronoun is okay | |
# - capitalized pronouns are okay unless they're in a book title, which is a series of capitalized words; | |
# that is, capitalized pronouns are okay if there are zero alphabetic characters between them and a sentence-final punct | |
FEMALE_REGEXES = [/ she[\.,\s!?\' ]/, / her[\.,\s!?\' ]/, |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'csv' | |
require 'set' | |
class Gender | |
def initialize(options={}) | |
countries = Set.new([:us, :uk]) | |
@threshold = options[:threshold] || 0.99 | |
@names_counts = {} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'tabula' | |
pdf_file_path = "czechmaybe.pdf" | |
outfilename = "czechmaybe.csv" | |
out = open(outfilename, 'w') | |
extractor = Tabula::Extraction::ObjectExtractor.new(pdf_file_path, [5] ) #:all ) # 1..2643 | |
extractor.extract.each do |pdf_page| | |
pdf_page.spreadsheets.each do |spreadsheet| |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'tabula' | |
require 'fileutils' | |
folder_name = "EDC" | |
output_folder_name = "EDCcsvs" | |
######################################################################### | |
######################################################################### | |
FileUtils.mkdir_p(output_folder_name + "/") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'sinatra' | |
require 'csv' | |
$csv_read_path = "my_thing.uncoded.csv" | |
$csv_write_path = "my_thing.coded.csv" | |
$data = CSV.read($csv_read_path, {:headers => true}) | |
def write_csv! | |
CSV.open($csv_write_path, 'wb') do |csv| |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
create table flight_segments as | |
SELECT hexid,start_time,end_time,callsign,point, | |
-- take a substring if the length reamining in the segment is greater than 5280 feet (1609.34 m) | |
-- otherwise take the remainder | |
ST_LineSubstring(geom, 1609.34*n/length, | |
CASE | |
WHEN 1609.34*(n+1) < length THEN 1609.34*(n+1)/length | |
ELSE 1 | |
END) as geom | |
FROM |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<!DOCTYPE html> | |
<html> | |
<head> | |
<title>How To Cause Trouble With Events' isTrusted Attribute</title> | |
<meta charset="UTF-8"> | |
<script | |
src="http://code.jquery.com/jquery-2.2.4.min.js" | |
integrity="sha256-BbhdlvQf/xTY9gja0Dq3HiwQF8LaCRTXxZKRutelT44=" | |
crossorigin="anonymous"></script> | |
</head> |
We can't make this file beautiful and searchable because it's too large.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
offense_id,occur_date,UC2 Literal,neighborhood,npu | |
110171050,01/14/2012,LARCENY-NON VEHICLE,Sweet Auburn,M | |
110181057,08/22/2011,LARCENY-NON VEHICLE,Glenrose Heights,Z | |
112032439,07/22/2011,AUTO THEFT,Downtown,M | |
112152334,08/03/2011,AUTO THEFT,Perkerson,X | |
113491709,12/07/2011,LARCENY-FROM VEHICLE,Hills Park,D | |
120010023,01/01/2012,AGG ASSAULT,The Villages at Carver,Y | |
120010069,12/31/2011,LARCENY-FROM VEHICLE,Old Fourth Ward,M | |
120010072,12/31/2011,LARCENY-FROM VEHICLE,English Avenue,L | |
120010086,01/01/2012,LARCENY-FROM VEHICLE,Morningside/Lenox Park,F |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# | |
# on mac, replace TABGOESHERE with a tab by typing Ctrl-V then the Tab key | |
# | |
mysql -u USERNAME --database=dbname --host=HOST --batch -e "select * from tablename" | | |
sed 's/TABGOESHERE/","/g'| sed 's/^/"/g' | sed 's/$/"/g' | sed 's/\n//g' > destination.csv | |