Skip to content

Instantly share code, notes, and snippets.

View scraperdragon's full-sized avatar

Dragon Dave McKee scraperdragon

  • Durham, United Kingdom
View GitHub Profile
@scraperdragon
scraperdragon / gist:5081659
Last active December 14, 2015 11:48
Make R talk SQLite.
install.packages("RSQLite") [note: compiles SQLite]
library(RSQLite)
drv <- dbDriver("SQLite")
con <- dbConnect(drv, "demo.sqlite")
dbListTables(con)
dbListFields(con, "table_name")
@scraperdragon
scraperdragon / run
Last active December 14, 2015 09:19
Run command with no output if no error.
#!/bin/bash
x=$(date +%Y%m%dT%H%M%S)
mkdir -p ~/log
$@ > ~/log/$x 2>&1
error=$?
if [ $error != 0 ]
then
echo "Error code: $error"
cat ~/log/$x
curl --data "type=error" https://x.scraperwiki.com/api/status > /dev/null 2>&1
@scraperdragon
scraperdragon / gist:4260142
Created December 11, 2012 16:38
Parse date cleanly, fail if partial
def parsedate(datestring, silent=False):
import dateutil.parser
import re
if not datestring: return None
if re.match('\d{4}-\d{2}-\d{2}', datestring): return datestring
info=dateutil.parser.parserinfo(dayfirst=True)
value=dateutil.parser.parser(info)._parse(datestring)
if value==None: return None
retval=[value.year, value.month, value.day]
nones = retval.count(None)
@scraperdragon
scraperdragon / gist:4001096
Created November 2, 2012 12:36
Autoretry on requests
import requests
requests.defaults.defaults['max_retries'] = 5
# ... rest of code ...
@scraperdragon
scraperdragon / gist:3946977
Created October 24, 2012 16:04
Import scraperwiki.json (coffeescript)
fs=require 'fs'
settings = fs.readFileSync 'scraperwiki.json'
settings = JSON.parse settings
@scraperdragon
scraperdragon / double-encoding-fixes.py
Created September 24, 2012 10:38 — forked from robertklep/double-encoding-fixes.py
Functions to detect/fix double-encoded UTF-8 strings in Python
import re
# functions to detect/fix double-encoded UTF-8 strings
# Based on http://blogs.perl.org/users/chansen/2010/10/coping-with-double-encoded-utf-8.html
DOUBLE_ENCODED = re.compile("""
\xC3 (?: [\x82-\x9F] \xC2 [\x80-\xBF] # U+0080 - U+07FF
| \xA0 \xC2 [\xA0-\xBF] \xC2 [\x80-\xBF] # U+0800 - U+0FFF
| [\xA1-\xAC] \xC2 [\x80-\xBF] \xC2 [\x80-\xBF] # U+1000 - U+CFFF
| \xAD \xC2 [\x80-\x9F] \xC2 [\x80-\xBF] # U+D000 - U+D7FF
| [\xAE-\xAF] \xC2 [\x80-\xBF] \xC2 [\x80-\xBF] # U+E000 - U+FFFF
@scraperdragon
scraperdragon / gist:3665070
Created September 7, 2012 10:51
Make an identifier out of a string
def makeidentifier(s):
import string
s=s.strip().replace(' ','_')
valid_chars = "_%s%s" % (string.ascii_letters, string.digits)
out=''.join(c for c in s if c in valid_chars)
if len(out)==0:
return '_'
else:
return out
@scraperdragon
scraperdragon / gist:3634011
Last active December 1, 2015 16:53
Let python redirect unicode stdout to files w/out crashing. Requires LANG=C.UTF-8 or similar in .profile
import codecs
import sys
sys.stdout = codecs.getwriter('utf-8')(sys.__stdout__)
@scraperdragon
scraperdragon / gist:3621256
Created September 4, 2012 13:41
Get the value of a SELECT dropdown box in LXML
def get_select_value(node):
# node is an LXML element (SELECT tag)
try:
return node.cssselect("option[selected='selected']")[0].text
except IndexError:
return node.cssselect("option")[0].text
@scraperdragon
scraperdragon / chrome2requests.py
Created August 22, 2012 11:25
Convert Chrome headers to Python's Requests dictionary
dict([[h.partition(':')[0], h.partition(':')[2]] for h in rawheaders.split('\n')])