Skip to content

Instantly share code, notes, and snippets.

View scraperdragon's full-sized avatar

Dragon Dave McKee scraperdragon

  • Durham, United Kingdom
View GitHub Profile
@scraperdragon
scraperdragon / chrome2requests.py
Created August 22, 2012 11:25
Convert Chrome headers to Python's Requests dictionary
dict([[h.partition(':')[0], h.partition(':')[2]] for h in rawheaders.split('\n')])
@scraperdragon
scraperdragon / batchsaver.py
Created December 2, 2013 16:49
Paul's BatchSaver
class BatchSaver(object):
def __init__(self, max_queue=2000, save_callback=None):
self.queue = []
self.max_queue = max_queue
self.callback = save_callback
import atexit
atexit.register(self.save_now)
def push(self, row):
self.queue.append(row)
@scraperdragon
scraperdragon / silent
Created December 2, 2013 11:37
Run a process which spams focus windows silently, like Selenium. And view it if you want to.
Xvfb :99 -ac &
export DISPLAY=:99
# your command here
killall Xvfb
@scraperdragon
scraperdragon / installtemplate
Created October 7, 2013 10:27
Install Scraper Template
#!/bin/bash
mkdir -p ~/BAK && mv ~/tool ~/incoming ~/http ~/BAK
git clone http://bitbucket.org/scraperwikids/data-services-scraper-template
./data-services-scraper-template/install.sh .
rm -rf ./data-services-scraper-template
echo -n "Remote repository: "
read REMOTE
git remote add origin $REMOTE
git push -u origin --all
./tool/first_run.sh
pip install line_profiler
kernprof.py -l worldbank.py
python -m line_profiler worldbank.py.lprof
# put @profile before the functions you care about
//jsonData contains data in the appropriate format
var json_table = new google.visualization.Table(document.getElementById('table_div_json'))
var json_data = new google.visualization.DataTable(jsonData, 0.6);
json_table.draw(json_data, {showRowNumber: true});
@scraperdragon
scraperdragon / gist:5081659
Last active December 14, 2015 11:48
Make R talk SQLite.
install.packages("RSQLite") [note: compiles SQLite]
library(RSQLite)
drv <- dbDriver("SQLite")
con <- dbConnect(drv, "demo.sqlite")
dbListTables(con)
dbListFields(con, "table_name")
@scraperdragon
scraperdragon / run
Last active December 14, 2015 09:19
Run command with no output if no error.
#!/bin/bash
x=$(date +%Y%m%dT%H%M%S)
mkdir -p ~/log
$@ > ~/log/$x 2>&1
error=$?
if [ $error != 0 ]
then
echo "Error code: $error"
cat ~/log/$x
curl --data "type=error" https://x.scraperwiki.com/api/status > /dev/null 2>&1
@scraperdragon
scraperdragon / gist:3634011
Last active December 1, 2015 16:53
Let python redirect unicode stdout to files w/out crashing. Requires LANG=C.UTF-8 or similar in .profile
import codecs
import sys
sys.stdout = codecs.getwriter('utf-8')(sys.__stdout__)
@scraperdragon
scraperdragon / gist:4260142
Created December 11, 2012 16:38
Parse date cleanly, fail if partial
def parsedate(datestring, silent=False):
import dateutil.parser
import re
if not datestring: return None
if re.match('\d{4}-\d{2}-\d{2}', datestring): return datestring
info=dateutil.parser.parserinfo(dayfirst=True)
value=dateutil.parser.parser(info)._parse(datestring)
if value==None: return None
retval=[value.year, value.month, value.day]
nones = retval.count(None)