Skip to content

Instantly share code, notes, and snippets.

View scraperdragon's full-sized avatar

Dragon Dave McKee scraperdragon

  • Durham, United Kingdom
View GitHub Profile
@scraperdragon
scraperdragon / pdftables.bas
Last active August 29, 2015 14:24
Visual Basic PDF Tables demo
'--- https://support.microsoft.com/en-us/kb/195763
' NB: remove PtrSafe if old Excel
Private Declare PtrSafe Function GetTempPath Lib "kernel32" _
Alias "GetTempPathA" (ByVal nBufferLength As Long, _
ByVal lpBuffer As String) As Long
'--- https://support.microsoft.com/en-us/kb/195763
' NB: remove PtrSafe if old Excel
Private Declare PtrSafe Function GetTempFileName Lib "kernel32" _
Alias "GetTempFileNameA" (ByVal lpszPath As String, _
@scraperdragon
scraperdragon / writtenkitten.html
Created May 19, 2015 16:58
writtenkitten.html
<html>
<head>
<script type="text/javascript" src="https://www.dropbox.com/static/api/2/dropins.js" id="dropboxjs" data-app-key="w7qbnscwwlxgtz0"></script>
<script type="text/javascript" src="https://cdn.rawgit.com/Nijikokun/5192472/raw/4c80b2c2688841ffb086f8c2b3f57520b0bd817d/base64-utf8.module.js"></script>
</head>
<body>
<a href="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUA
AAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO
9TXL0Y4OHwAAAABJRU5ErkJggg==" data-filename="reddot.png" alt="Red dot" id=save class="dropbox-saver" onclick='save.href="data:text/text;base64,"+base64.encode(foo.value);save.dataset.filename = "mytext"'></a>
<input type='text
@scraperdragon
scraperdragon / h2so4.py
Created October 16, 2014 09:55
text to unicode subscript / superscript
# -*- coding: utf-8
from __future__ import unicode_literals
import warnings
import unicodedata
def build_lookup(macro, micro):
assert len(macro) == len(micro), (len(macro), len(micro))
output = {}
for pair in zip(macro, micro):
if pair[1] != ' ':

html5lib - ihatexml.py - line 254: DataLossWarning: Coercing non-XML name

In my case this was caused by an malformed attribute: ``

@scraperdragon
scraperdragon / html5.py
Created September 2, 2014 10:18
HTML5 parser which may be compatible with lxml
import xml.etree.ElementTree as etree
import html5lib
def fromstring(s):
tb = html5lib.getTreeBuilder("lxml", implementation=etree)
p = html5lib.HTMLParser(tb, namespaceHTMLElements=False)
return p.parse(s)
@scraperdragon
scraperdragon / batchsaver.py
Created December 2, 2013 16:49
Paul's BatchSaver
class BatchSaver(object):
def __init__(self, max_queue=2000, save_callback=None):
self.queue = []
self.max_queue = max_queue
self.callback = save_callback
import atexit
atexit.register(self.save_now)
def push(self, row):
self.queue.append(row)
@scraperdragon
scraperdragon / silent
Created December 2, 2013 11:37
Run a process which spams focus windows silently, like Selenium. And view it if you want to.
Xvfb :99 -ac &
export DISPLAY=:99
# your command here
killall Xvfb
@scraperdragon
scraperdragon / installtemplate
Created October 7, 2013 10:27
Install Scraper Template
#!/bin/bash
mkdir -p ~/BAK && mv ~/tool ~/incoming ~/http ~/BAK
git clone http://bitbucket.org/scraperwikids/data-services-scraper-template
./data-services-scraper-template/install.sh .
rm -rf ./data-services-scraper-template
echo -n "Remote repository: "
read REMOTE
git remote add origin $REMOTE
git push -u origin --all
./tool/first_run.sh
pip install line_profiler
kernprof.py -l worldbank.py
python -m line_profiler worldbank.py.lprof
# put @profile before the functions you care about
//jsonData contains data in the appropriate format
var json_table = new google.visualization.Table(document.getElementById('table_div_json'))
var json_data = new google.visualization.DataTable(jsonData, 0.6);
json_table.draw(json_data, {showRowNumber: true});