Skip to content

Instantly share code, notes, and snippets.

Dragon Dave McKee scraperdragon

Block or report user

Report or block scraperdragon

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@scraperdragon
scraperdragon / pdftables.bas
Last active Aug 29, 2015
Visual Basic PDF Tables demo
View pdftables.bas
'--- https://support.microsoft.com/en-us/kb/195763
' NB: remove PtrSafe if old Excel
Private Declare PtrSafe Function GetTempPath Lib "kernel32" _
Alias "GetTempPathA" (ByVal nBufferLength As Long, _
ByVal lpBuffer As String) As Long
'--- https://support.microsoft.com/en-us/kb/195763
' NB: remove PtrSafe if old Excel
Private Declare PtrSafe Function GetTempFileName Lib "kernel32" _
Alias "GetTempFileNameA" (ByVal lpszPath As String, _
View writtenkitten.html
<html>
<head>
<script type="text/javascript" src="https://www.dropbox.com/static/api/2/dropins.js" id="dropboxjs" data-app-key="w7qbnscwwlxgtz0"></script>
<script type="text/javascript" src="https://cdn.rawgit.com/Nijikokun/5192472/raw/4c80b2c2688841ffb086f8c2b3f57520b0bd817d/base64-utf8.module.js"></script>
</head>
<body>
<a href="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUA
AAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO
9TXL0Y4OHwAAAABJRU5ErkJggg==" data-filename="reddot.png" alt="Red dot" id=save class="dropbox-saver" onclick='save.href="data:text/text;base64,"+base64.encode(foo.value);save.dataset.filename = "mytext"'></a>
<input type='text
@scraperdragon
scraperdragon / h2so4.py
Created Oct 16, 2014
text to unicode subscript / superscript
View h2so4.py
# -*- coding: utf-8
from __future__ import unicode_literals
import warnings
import unicodedata
def build_lookup(macro, micro):
assert len(macro) == len(micro), (len(macro), len(micro))
output = {}
for pair in zip(macro, micro):
if pair[1] != ' ':
View gist:6696b4797d7abb3d599c

html5lib - ihatexml.py - line 254: DataLossWarning: Coercing non-XML name

In my case this was caused by an malformed attribute: <td colspan=""4"">

@scraperdragon
scraperdragon / html5.py
Created Sep 2, 2014
HTML5 parser which may be compatible with lxml
View html5.py
import xml.etree.ElementTree as etree
import html5lib
def fromstring(s):
tb = html5lib.getTreeBuilder("lxml", implementation=etree)
p = html5lib.HTMLParser(tb, namespaceHTMLElements=False)
return p.parse(s)
View batchsaver.py
class BatchSaver(object):
def __init__(self, max_queue=2000, save_callback=None):
self.queue = []
self.max_queue = max_queue
self.callback = save_callback
import atexit
atexit.register(self.save_now)
def push(self, row):
self.queue.append(row)
@scraperdragon
scraperdragon / silent
Created Dec 2, 2013
Run a process which spams focus windows silently, like Selenium. And view it if you want to.
View silent
Xvfb :99 -ac &
export DISPLAY=:99
# your command here
killall Xvfb
@scraperdragon
scraperdragon / installtemplate
Created Oct 7, 2013
Install Scraper Template
View installtemplate
#!/bin/bash
mkdir -p ~/BAK && mv ~/tool ~/incoming ~/http ~/BAK
git clone http://bitbucket.org/scraperwikids/data-services-scraper-template
./data-services-scraper-template/install.sh .
rm -rf ./data-services-scraper-template
echo -n "Remote repository: "
read REMOTE
git remote add origin $REMOTE
git push -u origin --all
./tool/first_run.sh
View gist:5715431
pip install line_profiler
kernprof.py -l worldbank.py
python -m line_profiler worldbank.py.lprof
# put @profile before the functions you care about
View gist:5083607
//jsonData contains data in the appropriate format
var json_table = new google.visualization.Table(document.getElementById('table_div_json'))
var json_data = new google.visualization.DataTable(jsonData, 0.6);
json_table.draw(json_data, {showRowNumber: true});
You can’t perform that action at this time.