Skip to content

Instantly share code, notes, and snippets.

View czuriaga's full-sized avatar

Candido Zuriaga czuriaga

View GitHub Profile
#! /bin/bash
# Generate a file per page
for i in {1..448}; do pdftotext -f ${i} -l ${i} mueller-report-searchable.pdf mueller_page_${i}.txt; done
# Double interban quotes
for i in {1..448}; do perl -pi -e 's/\"/\"\"/g' mueller_page_${i}.txt; done
# Set CSV header
echo "page","text" > mueller_pages.csv
# Generate CSV rows, including page number and page text enclosed by quotes
for i in {1..448}; do echo ${i}',"'`cat mueller_page_${i}.txt`'"' >> mueller_pages.csv; done
We can't make this file beautiful and searchable because it's too large.
"Business ID","Business name","Address","City","State","Postal code","Latitude","Longitude","Phone number","Inspection","Inspection score","Score type","Inspection date","Inspection type","Violation description"
"114","GOOD MONG KOK","1039 STOCKTON ST ","San Francisco","CA","94108","37.795594","-122.408204","","Yes","65","Poor","2012-07-13","routine","Unclean or unsanitary food contact surfaces"
"114","GOOD MONG KOK","1039 STOCKTON ST ","San Francisco","CA","94108","37.795594","-122.408204","","Yes","65","Poor","2012-07-13","routine","Unclean or degraded floors walls or ceilings"
"114","GOOD MONG KOK","1039 STOCKTON ST ","San Francisco","CA","94108","37.795594","-122.408204","","Yes","65","Poor","2012-07-13","routine","Unapproved or unmaintained equipment or utensils"
"114","GOOD MONG KOK","1039 STOCKTON ST ","San Francisco","CA","94108","37.795594","-122.408204","","Yes","65","Poor","2012-07-13","routine","Unclean hands or improper use of gloves"
"114","GOOD MONG KOK","1039 STOCKTON ST ","San Francisco","CA"
@czuriaga
czuriaga / bind_oracle.sql
Created May 21, 2012 15:24
BigML-Oracle
SET LINESIZE 1000
SET PAGESIZE 9999
SET NUMWIDTH 20
SET TRIMSPOOL ON
SET TRIMOUT ON
SET VERIFY OFF
SET SERVEROUTPUT ON
SET UNDERLINE OFF
SET FEEDBACK OFF
SET HEAD OFF