Rufus Pollock rufuspollock

## buttondown.css
/*
    Buttondown
    A Markdown/MultiMarkdown/Pandoc HTML output CSS stylesheet
    Author: Ryan Gray
    Date: 15 Feb 2011
    Revised: 21 Feb 2012

    General style is clean, with minimal re-definition of the defaults or
    overrides of user font settings. The body text and header styles are
    left alone except title, author and date classes are centered. A Pandoc TOC

## csv2sqlite.py
#!/usr/bin/env python
# A simple Python script to convert csv files to sqlite (with type guessing)
#
# @author: Rufus Pollock
# Placed in the Public Domain
import csv
import sqlite3

def convert(filepath_or_fileobj, dbpath, table='data'):
    if isinstance(filepath_or_fileobj, basestring):

## readme.md

      
              1 file
            
          
              2 forks
            
          
              1 comment
            
          
              21 stars
            
          
                max-mapper
                / readme.md
            
            
              Last active
              October 20, 2020 03:21
            
              
                node modules for converting PDFs into other formats
              
          
    Wrappers


hummus - c++ pdf manipulator
mimeograph - api on a conglomeration of tools (poppler, tesseract, imagemagick etc)
pdftotextjs - wrapper around pdftotext
pdf-text-extract - another wrapper around pdftotext
pdf-extract - wrapper around pdftotext, pdftk, tesseract, ghostscript
pdfutils - poppler wrapper
scissors - pdftk, ghostscript wrapper w/ high level api
textract - pdftotext wrapper


## pdf2xxx.md

      
              1 file
            
          
              1 fork
            
          
              0 comments
            
          
              7 stars
            
          
                rufuspollock
                / pdf2xxx.md
            
            
              Last active
              November 15, 2016 15:58
            
              
                PDF 2 XXX. Tools, libraries and tutorials for converting PDFs to something more machine usable
              
          
    Additions wanted - please just fork and add.
Tutorials


Parsing PDFs by Thomas Levine
[Get Started With Scraping – Extracting Simple Tables from PDF Documents][scoda-simple-tables]

Generic (PDF -> text)


## london-spend-csvs-grepping-for-headings.txt
2010-11-P01.csv:4:Vendor,Expense Description,Amount,Doc No,,,^M
2010-11-P02.csv:6:Vendor,Expense Description,Amount,Doc No,,,^M
2010-11-P03.csv:6:Document No","Amount
2010-11-P04-500.csv:1:Vendor ID,Vendor Name,Cost Element,Expenditure Account Code Description,SAP Document No,Amount £,Clearing Date^M
2010-11-P05-500.csv:1:Vendor ID,Vendor Name,Cost Element,Expenditure Account Code Description,SAP Document No,Amount £,Clearing Date^M
2010-11-P06-500.csv:1:Vendor ID,Vendor Name,Cost Element,Expenditure Account Code Description,SAP Document No,Amount £,Clearing Date^M
2010-11-P07-500.csv:1:Vendor ID,Vendor Name,Cost Element,Expenditure Account Code Description,SAP Document No,Amount £,Clearing Date^M
2010-11-P08-500.csv:1:Vendor ID,Vendor Name,Cost Element,Expenditure Account Code Description,SAP Document No,Amount £,Clearing Date^M
2010-11-P09-500.csv:1:Vendor ID,Vendor Name,Cost Element,Expenditure Account Code Description,SAP Document No,Amount £,Clearing Date^M
2010-11-P10-500.csv:1:Vendor ID,Vendor Name,Cos

## grid.css.sass
// Main containers
.container
  @include outer-container

// Rows
.row
  @include row()


// A basic column without a defined width or height

## README.md

      
              5 files
            
          
              1 fork
            
          
              0 comments
            
          
              1 star
            
          
                rufuspollock
                / README.md
            
            
              Created
              March 6, 2014 19:54
            
              
                Hackney Spending Cleanup - README is empty
              
          
    README is empty

  
## humanitarian-datastore-data-api-examples.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              1 star
            
          
                rufuspollock
                / humanitarian-datastore-data-api-examples.md
            
            
              Last active
              August 29, 2015 14:03
            
              
                Humanitarian dataset example queries
              
          
    HDX Common Humanitarian Dataset data into CKAN instance (we used datahub.io for convenience).
http://datahub.io/dataset/hdx-common-humanitarian-dataset
We've loaded (indicator) value table and indicator table separately in the CKAN DataStore
(we have not bothered loading dataset table for the present) and we've also created a python
script to automate this (which can also serve as an example of how to work with CKAN API).
Setting this up was pretty fast (most of the work was actually tidying up the data and then making
some scripts to make this repeatable and testable).

  
## gist:ca4ac7d2511ee41237b9
// replace this with your CKAN website
var ckanSite = 'http://datahub.io'

var sql = 'Your SQL goes here';

// =================
// Using jQuery only
// =================

var data = encodeURIComponent(JSON.stringify({sql: sql}));

## schema_proposal.yaml
# This is an alternate proposal for a metadata structure for OpenSpending
# data models. The most significant change is that data is modelled in a
# way that highlights logical connections between fields, rather based on
# columns. This also means that column naming conventions are not needed.
#
# This proposal uses YAML to represent the model, but implementations
# would probably use JSON instead.
# The proposed format is currently supported by spendb and cubepress.
#
# The following is a data model for a fictitious budget/spending dataset.
	/*
	Buttondown
	A Markdown/MultiMarkdown/Pandoc HTML output CSS stylesheet
	Author: Ryan Gray
	Date: 15 Feb 2011
	Revised: 21 Feb 2012

	General style is clean, with minimal re-definition of the defaults or
	overrides of user font settings. The body text and header styles are
	left alone except title, author and date classes are centered. A Pandoc TOC
	#!/usr/bin/env python
	# A simple Python script to convert csv files to sqlite (with type guessing)
	#
	# @author: Rufus Pollock
	# Placed in the Public Domain
	import csv
	import sqlite3

	def convert(filepath_or_fileobj, dbpath, table='data'):
	if isinstance(filepath_or_fileobj, basestring):
	2010-11-P01.csv:4:Vendor,Expense Description,Amount,Doc No,,,^M
	2010-11-P02.csv:6:Vendor,Expense Description,Amount,Doc No,,,^M
	2010-11-P03.csv:6:Document No","Amount
	2010-11-P04-500.csv:1:Vendor ID,Vendor Name,Cost Element,Expenditure Account Code Description,SAP Document No,Amount £,Clearing Date^M
	2010-11-P05-500.csv:1:Vendor ID,Vendor Name,Cost Element,Expenditure Account Code Description,SAP Document No,Amount £,Clearing Date^M
	2010-11-P06-500.csv:1:Vendor ID,Vendor Name,Cost Element,Expenditure Account Code Description,SAP Document No,Amount £,Clearing Date^M
	2010-11-P07-500.csv:1:Vendor ID,Vendor Name,Cost Element,Expenditure Account Code Description,SAP Document No,Amount £,Clearing Date^M
	2010-11-P08-500.csv:1:Vendor ID,Vendor Name,Cost Element,Expenditure Account Code Description,SAP Document No,Amount £,Clearing Date^M
	2010-11-P09-500.csv:1:Vendor ID,Vendor Name,Cost Element,Expenditure Account Code Description,SAP Document No,Amount £,Clearing Date^M
	2010-11-P10-500.csv:1:Vendor ID,Vendor Name,Cos
	// Main containers
	.container
	@include outer-container

	// Rows
	.row
	@include row()


	// A basic column without a defined width or height
	// replace this with your CKAN website
	var ckanSite = 'http://datahub.io'

	var sql = 'Your SQL goes here';

	// =================
	// Using jQuery only
	// =================

	var data = encodeURIComponent(JSON.stringify({sql: sql}));
	# This is an alternate proposal for a metadata structure for OpenSpending
	# data models. The most significant change is that data is modelled in a
	# way that highlights logical connections between fields, rather based on
	# columns. This also means that column naming conventions are not needed.
	#
	# This proposal uses YAML to represent the model, but implementations
	# would probably use JSON instead.
	# The proposed format is currently supported by spendb and cubepress.
	#
	# The following is a data model for a fictitious budget/spending dataset.