Skip to content

Instantly share code, notes, and snippets.

View vphill's full-sized avatar

Mark Phillips vphill

View GitHub Profile
@vphill
vphill / dspace_metadata_mapping
Created September 3, 2013 21:38
Mapping a 115 column spreadsheet in 115 lines... (I skip some unneeded fields)
def processRecord( RecordClass, row ):
language_dict = {
"en": "eng",
"fr": "fre",
"eng": "eng"
}
record = RecordClass("mphillips")
@vphill
vphill / tx_fips_to_county.json
Last active April 1, 2021 23:37
Mapping from FIPS codes to county names for the state of Texas
{
"001": "Anderson County",
"003": "Andrews County",
"005": "Angelina County",
"007": "Aransas County",
"009": "Archer County",
"011": "Armstrong County",
"013": "Atascosa County",
"015": "Austin County",
"017": "Bailey County",
@vphill
vphill / processPDF.sh
Created April 24, 2014 03:00
Bash script that takes a set of folders with a single pdf in each folder and creates the format used at UNT for importing into Aubrey
for dirname in *
do
#echo moving into $dirname
cd "$dirname"
for pdf in *.pdf
do
#echo extract pages in $pdf
pagesfrompdf=`pdfinfo "$pdf" | grep -a "Pages:" | awk '{print $2}'`
echo $pagesfrompdf
pages=$(($(tr -d '\r' <<< $pagesfrompdf) - 1))
for i in *
do
echo "processing $i ======="
echo "creating pdfs for"
find "$i" -maxdepth 1 -name "*.tif" | xargs -i -n 1 -P 6 convert -density 200x200 -compress jpeg -quality 80 "{}" "{}".pdf
echo "compressing pdfs"
find "$i" -maxdepth 1 -name "*.tif.pdf" | xargs -i -n 1 -P 5 gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dNOPAUSE -dQUIET -dBATCH -sOutputFile={}_out.pdf {}
cd "$i"
echo "concatinating pdfs"
pdftk *_out.pdf cat output temp.pdf
@vphill
vphill / jpg2pdf.sh
Created April 24, 2014 03:02
Bash script used to created pdf files for the Texas State Documents Scanning process
for i in *
do
echo "creating pdfs"
find "$i" -maxdepth 1 -name "*.jpg" | xargs -i -n 1 -P 6 convert -density 200x200 -compress jpeg -quality 80 "{}" "{}".pdf
echo "compressing pdfs"
find "$i" -maxdepth 1 -name "*.jpg.pdf" | xargs -i -n 1 -P 5 gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dNOPAUSE -dQUIET -dBATCH -sOutputFile={}_out.pdf {}
cd "$i"
echo "concatinating pdfs"
pdftk *_out.pdf cat output temp.pdf
echo "linerizing pdf"
# coding=UTF-8
import sys
import re
import json
from dateutil import parser
from bs4 import BeautifulSoup
if len(sys.argv) != 2:
print "usage: parse_bill_html.py <bill_html>"
@vphill
vphill / gist:26ae8bf38267e4557de0
Created August 19, 2014 11:48
Tarleton Search Box
<form method="GET" action="http://texashistory.unt.edu/explore/partners/TSU/browse/">
<fieldset style="display: inline-block">
<legend>Tarleton State University - Portal to Texas History</legend>
<input type="text" name="q" size="80"/>
<input type="hidden" name="t" value="fulltext" />
<input type="submit" value="submit" />
</fieldset>
</form>
@vphill
vphill / gist:37cb4f9b21a43cfa9d37
Created October 14, 2014 11:26
Ghostscript pdf to jpeg
gs -dUseCIEColor -dFAPIDEBUG -sDEVICE=jpeg -o out-%d.jpg -sFONTPATH=/Users/mphillips/Desktop/auditors/msfonts-master/fonts/ -sFontMap="/CenturyGothic,Bold (/Users/mphillips/Desktop/auditors/msfonts-master/fonts/gothicb.ttf) ;" -r400x400 13-704.pdf
@vphill
vphill / untl_placenames_2014-11-20.json
Last active August 29, 2015 14:10
Coverage Locations in the UNT Libraries Digital Collections as of 2014-11-20
{
"placenames": [
"Afghanistan",
"Africa",
"Albania",
"Algeria",
"Algeria - Alger Department - Algiers",
"America",
"American Samoa",
"Angola",
This file has been truncated, but you can view the full file.
{
"placenames": {
"Afghanistan": 196,
"Africa": 379,
"Albania": 4,
"Algeria": 16,
"Algeria - Alger Department - Algiers": 1,
"American Samoa": 8,
"Angola": 6,
"Antarctica": 29,