This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def processRecord( RecordClass, row ): | |
language_dict = { | |
"en": "eng", | |
"fr": "fre", | |
"eng": "eng" | |
} | |
record = RecordClass("mphillips") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"001": "Anderson County", | |
"003": "Andrews County", | |
"005": "Angelina County", | |
"007": "Aransas County", | |
"009": "Archer County", | |
"011": "Armstrong County", | |
"013": "Atascosa County", | |
"015": "Austin County", | |
"017": "Bailey County", |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
for dirname in * | |
do | |
#echo moving into $dirname | |
cd "$dirname" | |
for pdf in *.pdf | |
do | |
#echo extract pages in $pdf | |
pagesfrompdf=`pdfinfo "$pdf" | grep -a "Pages:" | awk '{print $2}'` | |
echo $pagesfrompdf | |
pages=$(($(tr -d '\r' <<< $pagesfrompdf) - 1)) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
for i in * | |
do | |
echo "processing $i =======" | |
echo "creating pdfs for" | |
find "$i" -maxdepth 1 -name "*.tif" | xargs -i -n 1 -P 6 convert -density 200x200 -compress jpeg -quality 80 "{}" "{}".pdf | |
echo "compressing pdfs" | |
find "$i" -maxdepth 1 -name "*.tif.pdf" | xargs -i -n 1 -P 5 gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dNOPAUSE -dQUIET -dBATCH -sOutputFile={}_out.pdf {} | |
cd "$i" | |
echo "concatinating pdfs" | |
pdftk *_out.pdf cat output temp.pdf |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
for i in * | |
do | |
echo "creating pdfs" | |
find "$i" -maxdepth 1 -name "*.jpg" | xargs -i -n 1 -P 6 convert -density 200x200 -compress jpeg -quality 80 "{}" "{}".pdf | |
echo "compressing pdfs" | |
find "$i" -maxdepth 1 -name "*.jpg.pdf" | xargs -i -n 1 -P 5 gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dNOPAUSE -dQUIET -dBATCH -sOutputFile={}_out.pdf {} | |
cd "$i" | |
echo "concatinating pdfs" | |
pdftk *_out.pdf cat output temp.pdf | |
echo "linerizing pdf" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# coding=UTF-8 | |
import sys | |
import re | |
import json | |
from dateutil import parser | |
from bs4 import BeautifulSoup | |
if len(sys.argv) != 2: | |
print "usage: parse_bill_html.py <bill_html>" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<form method="GET" action="http://texashistory.unt.edu/explore/partners/TSU/browse/"> | |
<fieldset style="display: inline-block"> | |
<legend>Tarleton State University - Portal to Texas History</legend> | |
<input type="text" name="q" size="80"/> | |
<input type="hidden" name="t" value="fulltext" /> | |
<input type="submit" value="submit" /> | |
</fieldset> | |
</form> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
gs -dUseCIEColor -dFAPIDEBUG -sDEVICE=jpeg -o out-%d.jpg -sFONTPATH=/Users/mphillips/Desktop/auditors/msfonts-master/fonts/ -sFontMap="/CenturyGothic,Bold (/Users/mphillips/Desktop/auditors/msfonts-master/fonts/gothicb.ttf) ;" -r400x400 13-704.pdf |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"placenames": [ | |
"Afghanistan", | |
"Africa", | |
"Albania", | |
"Algeria", | |
"Algeria - Alger Department - Algiers", | |
"America", | |
"American Samoa", | |
"Angola", |
This file has been truncated, but you can view the full file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"placenames": { | |
"Afghanistan": 196, | |
"Africa": 379, | |
"Albania": 4, | |
"Algeria": 16, | |
"Algeria - Alger Department - Algiers": 1, | |
"American Samoa": 8, | |
"Angola": 6, | |
"Antarctica": 29, |
OlderNewer