Skip to content

Instantly share code, notes, and snippets.

View vphill's full-sized avatar

Mark Phillips vphill

View GitHub Profile
We can't make this file beautiful and searchable because it's too large.
United States - Oklahoma - Oklahoma County - Oklahoma City 35.467560 -97.516430
United States - Texas - Dallas County - Dallas 32.783060 -96.806670
United States - Texas 31.250440 -99.250610
United States 39.760000 -98.500000
United States - Texas - Tarrant County - Fort Worth 32.725410 -97.320850
United States - Texas - Denton County - Denton 33.214840 -97.133070
United States - Texas - Bexar County - San Antonio 29.424120 -98.493630
United States - Texas - Taylor County - Abilene 32.448740 -99.733140
United States - Oklahoma - Cleveland County - Norman 35.222570 -97.439480
United States - Texas - Harris County - Houston 29.763280 -95.363270
@vphill
vphill / tfc_2_csv.py
Created March 27, 2023 18:37
Script to convert OAI-PMH repository for the TFC to CSV
"""untl_breaker script for processing OAI-PMH 2.0 Repository XML Files"""
import argparse
import sys
from xml.etree import ElementTree
import csv
UNTL_NAMESPACE = "{http://digital2.library.unt.edu/untl/}"
UNTL_NSMAP = {"untl": UNTL_NAMESPACE}

File used for captions. https://texashistory.unt.edu/ark:/67531/metapth845109/

File length = 00:15:50.52

Computer Device Model Time
libcpt01 CPU medium 16m11s
libcpt01 NVIDIA GeForce GTX 1080 8GB small.en 1m24s
libcpt01 NVIDIA GeForce GTX 1080 8GB medium 4m35s
libcpt10 NVIDIA GeForce GTX 1080 8GB large N/A
"""script for converting NOVAXCHANGE tape files into record block files"""
import sys
def iterate_stream(stream, delimiter, max_read_size=1024 * 4):
""" Reads `delimiter` separated strings or bytes from `stream`. """
empty = '' if isinstance(delimiter, str) else b''
chunks = []
delimiter_len = len(delimiter)
00000000 50 00 56 4f 4c 31 30 32 38 30 30 32 20 20 20 20 |P.VOL1028002 |
00000010 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 | |
*
00000050 20 31 50 00 50 00 48 44 52 31 30 32 38 30 30 32 | 1P.P.HDR1028002|
00000060 2d 30 30 30 30 30 30 30 30 30 30 30 32 38 30 30 |-000000000002800|
00000070 32 30 30 30 31 30 30 30 31 30 30 30 31 30 30 30 |2000100010001000|
00000080 30 35 30 30 36 20 30 30 30 30 30 20 30 30 30 30 |05006 00000 0000|
00000090 30 30 4e 4f 56 41 58 43 48 41 4e 47 45 20 20 20 |00NOVAXCHANGE |
000000a0 20 20 20 20 20 20 50 00 50 00 48 44 52 32 46 31 | P.P.HDR2F1|
000000b0 36 33 38 34 30 34 30 39 36 20 30 20 20 20 20 20 |638404096 0 |
00000000 50 00 56 4f 4c 31 30 32 38 30 30 30 20 20 20 20 |P.VOL1028000 |
00000010 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 | |
*
00000050 20 31 50 00 50 00 48 44 52 31 30 32 38 30 30 30 | 1P.P.HDR1028000|
00000060 2d 30 30 30 30 30 30 30 30 30 30 30 32 38 30 30 |-000000000002800|
00000070 30 30 30 30 31 30 30 30 31 30 30 30 31 30 30 30 |0000100010001000|
00000080 30 35 30 30 36 20 30 30 30 30 30 20 30 30 30 30 |05006 00000 0000|
00000090 30 30 4e 4f 56 41 58 43 48 41 4e 47 45 20 20 20 |00NOVAXCHANGE |
000000a0 20 20 20 20 20 20 50 00 50 00 48 44 52 32 46 31 | P.P.HDR2F1|
000000b0 36 33 38 34 30 34 30 39 36 20 30 20 20 20 20 20 |638404096 0 |

Keybase proof

I hereby claim:

  • I am vphill on github.
  • I am vphill (https://keybase.io/vphill) on keybase.
  • I have a public key ASDhDNBIIwjmaMYPtjU37t9hgxZV5HBAjpZjmN8b0LGM6Ao

To claim this, I am signing this object:

mapping = {
'ATT-': 'ADJ',
'==LOC': 'ADP',
'-ADV': 'ADV',
'adv': 'ADV',
'ADVR': 'ADV',
'==DDET': 'DET',
'interj': 'INTJ',
'n': 'NOUN',
'n:Any': 'NOUN',
@vphill
vphill / tags
Created December 5, 2018 01:32
1 -APX
1 -BE
1 -INTEND
1 -NES
1 -PROHB
1 -SOLCT
1 -SUP
1 ==CTE
1 =INT
1 =pdet
def lemmatize(token_list):
"""very simple implementation"""
out_tokens = []
for t in token_list:
if t.endswith('ies'):
t = t[:-3] + 'y'
elif t.endswith("'s"):
t = t[:-2]