Skip to content

Instantly share code, notes, and snippets.

View rufuspollock's full-sized avatar
Nothing, nowhere and all of it

Rufus Pollock rufuspollock

Nothing, nowhere and all of it
View GitHub Profile
rufuspollock /
Last active November 15, 2016 15:58
PDF 2 XXX. Tools, libraries and tutorials for converting PDFs to something more machine usable

Additions wanted - please just fork and add.


  • Parsing PDFs by Thomas Levine
  • [Get Started With Scraping – Extracting Simple Tables from PDF Documents][scoda-simple-tables]

Generic (PDF -> text)

rufuspollock / london-spend-csvs-grepping-for-headings.txt
Last active December 19, 2015 10:29
Analysis of where the "header" rows actually appears in GLA spend data CSVs. Result of running this script For details of files see
2010-11-P01.csv:4:Vendor,Expense Description,Amount,Doc No,,,^M
2010-11-P02.csv:6:Vendor,Expense Description,Amount,Doc No,,,^M
2010-11-P03.csv:6:Document No","Amount
2010-11-P04-500.csv:1:Vendor ID,Vendor Name,Cost Element,Expenditure Account Code Description,SAP Document No,Amount £,Clearing Date^M
2010-11-P05-500.csv:1:Vendor ID,Vendor Name,Cost Element,Expenditure Account Code Description,SAP Document No,Amount £,Clearing Date^M
2010-11-P06-500.csv:1:Vendor ID,Vendor Name,Cost Element,Expenditure Account Code Description,SAP Document No,Amount £,Clearing Date^M
2010-11-P07-500.csv:1:Vendor ID,Vendor Name,Cost Element,Expenditure Account Code Description,SAP Document No,Amount £,Clearing Date^M
2010-11-P08-500.csv:1:Vendor ID,Vendor Name,Cost Element,Expenditure Account Code Description,SAP Document No,Amount £,Clearing Date^M
2010-11-P09-500.csv:1:Vendor ID,Vendor Name,Cost Element,Expenditure Account Code Description,SAP Document No,Amount £,Clearing Date^M
2010-11-P10-500.csv:1:Vendor ID,Vendor Name,Cos
rufuspollock / pdf2json-tryout.js
Created July 7, 2013 17:37
Trying out pdf2json
var nodeUtil = require("util"),
PFParser = require("pdf2json")
var pdfParser = new PFParser();
pdfParser.on("pdfParser_dataReady", function(data) {
rufuspollock / geocode.js
Created September 21, 2013 17:24
Example of Javascript Geocode function using Mapquest Nominatim API
// Geocoding using Mapquest Nominatim API
// Documentation for the API:
// Here's an example query:
// geocode function
// :param place: is a place name like "Detroit" or "London"
// :callback: function receiving arguments (error, {lon: ..., lat: ...})
function geocode(place, callback) {
rufuspollock / sluggify.js
Last active December 24, 2015 19:49
Generate a slug (url-usable string) from a title or other string
// convert a title to a slug
// lowercase, replace ' ' by '-' and remove everything that is not alphanumeric, underscore or dash
var slug = title
.replace(/ /g, '-')
.replace(/--+/g, '-')
.replace(/[^\w-]+/g, '')
rufuspollock /
Created March 1, 2014 16:49
Automated use of CKAN DataPusher from python code
import urlparse
import json
import requests
# set your api key for this work
apikey = 'XXXXX'
datapusher_url = ''
ckan_url = ''
# gold prices
res_id = 'b9aae52b-b082-4159-b46f-7bb9c158d013'
rufuspollock /
Created March 6, 2014 19:54
Hackney Spending Cleanup - README is empty

README is empty

rufuspollock /
Created March 9, 2014 18:58
Global Fossil Fuel CO2 Emissions 1751-Present - README is empty

README is empty

rufuspollock /
Created May 4, 2014 21:49
Scrape CKAN Extensions on Github
'''Run this script and it will export a list of all CKAN extensions on github
(guessed by repo name containing ckanext) to json and csv files in in this directory
import urllib
import json
import csv
jsonfp = 'extensions-gh.json'
csvfp = 'extensions-gh.csv'