Skip to content

Instantly share code, notes, and snippets.

View epoz's full-sized avatar

Etienne Posthumus epoz

View GitHub Profile
@epoz
epoz / dbtxt_to_csv.py
Created October 19, 2018 11:42
Dumps a collection of dbtxt files to a single CSV file, including text of IC field expanded
from __future__ import print_function
# Export details from a collection of dmp files found at a certain path to a csv file
import os
import iconclass
import sys
from progress.bar import Bar
import textbase
print('Reading files...')
@epoz
epoz / example.py
Created October 12, 2018 14:32
Parse some XML files and get data example
import os
import xml.etree.ElementTree as ET
from progress.bar import Bar
buf = []
errors = []
def g(filepath, doc, path):
elem = doc.find(path)
if elem is None:
@epoz
epoz / planodo.py
Last active June 23, 2021 11:54
Create a huge tiled (potentially gigapixel) image from a bunch of loose images.
import os
import sys
import re
import math
import random
import PIL.Image
import warnings
import json
from progress.bar import Bar
import textbase
@epoz
epoz / expand.py
Created September 14, 2015 10:08
How to expand texts for a given set of ICONCLASS codes
codes = ['31D11222', '34B11', '45(+26)', '45C1', '45D12', '48C7341']
codes = [urllib.quote(x) for x in codes]
paths = set()
for obj in json.loads(urllib2.urlopen('http://iconclass.org/json/?notation='+'&notation='.join(codes)).read()):
paths.update(obj.get('p'))
paths.add(obj.get('n'))
txts = []
kws = set()
for p in json.loads(urllib2.urlopen('http://iconclass.org/json/?notation='+'&notation='.join(paths)).read()):
txts.append(p.get('txt').get('de', u''))
@epoz
epoz / gist:f4a1024d89616df6fac5
Last active August 29, 2015 14:03
Elasticsearch Python client adaptor to be used with Django Paginator
class ElasticSearchPaginatorListException(Exception):
pass
class ElasticSearchPaginatorList(object):
def __init__(self, client, *args, **kwargs):
self.client = client
self.args = args
self.kwargs = kwargs
self._count = None
@epoz
epoz / old_planodo.py
Last active March 22, 2016 11:20
Planodo turn a bunch of files into a big zoomable image
#!./bin/python
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import A4
from reportlab.lib.units import cm
from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont
from PIL import Image
import PIL
import os
import json
# Working out number of quires from a STCN collation
examples = [
{coll:'[*]2 2*-4*4 A-3Q4 2*2 `︠LO`3Q2 3R-5S4 5T2 5V-5Y4, 2A-G4 2H2 2I4 (3Q4 blank; lacks 3*4, blank?)',
url:'http://picarta.pica.nl/xslt/DB=3.11/XMLPRS=Y/PPN?PPN=318093766',
req:121,
# A-Z 23, A-Z 23, A-Q 16, Q 1, R-Z 7, A-Z 23, A-S 18, T 1, V-Y 3, A-G 7, H 1, I 1,
},
{coll:'A-V8 W8 X-Z8', # if W found, then noted 'loose' like A-V8 W8 X-Z8
req:24}
]
@epoz
epoz / STCN raw parser
Last active December 27, 2015 19:39
Read a STCN http://picarta.pica.nl/xslt/DB=3.11/ raw data dump, parse it, and spit it out as a columnar tab-separated-value file that can more easily be opened in Excel
'''
Read in a STCN data dump file, and convert it to a CSV file (delimited with tabs)
The data looks something like this:
SET: S0 [10000] TTL: 5 PPN: 339722142 PAG: 1 .
Ingevoerd: 1996:31-01-12 Gewijzigd: 1996:07-02-12 09:12:25 Status: 1996:31-01-12
0500 Aav
@epoz
epoz / gimmesrc.py
Created October 1, 2012 19:02
Retrieves the full source of a title from Wikisource
#!/usr/bin/env python
# Example: python gimmesrc.py De_Cive > txt
import sys, urllib, urllib2
URL = 'http://en.wikisource.org/w/index.php?action=raw&title='
if __name__ == '__main__':
title = sys.argv[1]
title_parts = []
@epoz
epoz / gist:3760964
Created September 21, 2012 11:26
Markdown Watcher and auto regenerater
#!/usr/bin/env python
'''
Markdown Watcher and auto regenerater
While sitting in an aeroplane, I found myself editing a bunch of Markdown
files and needing to regenerate the HTML and preview in a browser.
It was tedious re-typing the 'markdown' command every time, so I made
this little script to watch the *.markdown files and create the corresponding
.html flavour if the modification date of the markdown file is newer or the
html does not exist yet.