Skip to content

Instantly share code, notes, and snippets.

Dragon Dave McKee scraperdragon

Block or report user

Report or block scraperdragon

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
scraperdragon / pdftables.bas
Last active Aug 29, 2015
Visual Basic PDF Tables demo
View pdftables.bas
' NB: remove PtrSafe if old Excel
Private Declare PtrSafe Function GetTempPath Lib "kernel32" _
Alias "GetTempPathA" (ByVal nBufferLength As Long, _
ByVal lpBuffer As String) As Long
' NB: remove PtrSafe if old Excel
Private Declare PtrSafe Function GetTempFileName Lib "kernel32" _
Alias "GetTempFileNameA" (ByVal lpszPath As String, _
View writtenkitten.html
<script type="text/javascript" src="" id="dropboxjs" data-app-key="w7qbnscwwlxgtz0"></script>
<script type="text/javascript" src=""></script>
<a href="
9TXL0Y4OHwAAAABJRU5ErkJggg==" data-filename="reddot.png" alt="Red dot" id=save class="dropbox-saver" onclick='save.href="data:text/text;base64,"+base64.encode(foo.value);save.dataset.filename = "mytext"'></a>
<input type='text
scraperdragon /
Created Oct 16, 2014
text to unicode subscript / superscript
# -*- coding: utf-8
from __future__ import unicode_literals
import warnings
import unicodedata
def build_lookup(macro, micro):
assert len(macro) == len(micro), (len(macro), len(micro))
output = {}
for pair in zip(macro, micro):
if pair[1] != ' ':
View gist:6696b4797d7abb3d599c

html5lib - - line 254: DataLossWarning: Coercing non-XML name

In my case this was caused by an malformed attribute: <td colspan=""4"">

scraperdragon /
Created Sep 2, 2014
HTML5 parser which may be compatible with lxml
import xml.etree.ElementTree as etree
import html5lib
def fromstring(s):
tb = html5lib.getTreeBuilder("lxml", implementation=etree)
p = html5lib.HTMLParser(tb, namespaceHTMLElements=False)
return p.parse(s)
class BatchSaver(object):
def __init__(self, max_queue=2000, save_callback=None):
self.queue = []
self.max_queue = max_queue
self.callback = save_callback
import atexit
def push(self, row):
scraperdragon / silent
Created Dec 2, 2013
Run a process which spams focus windows silently, like Selenium. And view it if you want to.
View silent
Xvfb :99 -ac &
export DISPLAY=:99
# your command here
killall Xvfb
scraperdragon / installtemplate
Created Oct 7, 2013
Install Scraper Template
View installtemplate
mkdir -p ~/BAK && mv ~/tool ~/incoming ~/http ~/BAK
git clone
./data-services-scraper-template/ .
rm -rf ./data-services-scraper-template
echo -n "Remote repository: "
git remote add origin $REMOTE
git push -u origin --all
View gist:5715431
pip install line_profiler -l
python -m line_profiler
# put @profile before the functions you care about
View gist:5083607
//jsonData contains data in the appropriate format
var json_table = new google.visualization.Table(document.getElementById('table_div_json'))
var json_data = new google.visualization.DataTable(jsonData, 0.6);
json_table.draw(json_data, {showRowNumber: true});
You can’t perform that action at this time.