- hummus - c++ pdf manipulator
- mimeograph - api on a conglomeration of tools (poppler, tesseract, imagemagick etc)
- pdftotextjs - wrapper around pdftotext
- pdf-text-extract - another wrapper around pdftotext
- pdf-extract - wrapper around pdftotext, pdftk, tesseract, ghostscript
- pdfutils - poppler wrapper
- scissors - pdftk, ghostscript wrapper w/ high level api
- textract - pdftotext wrapper
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
# A simple Python script to convert csv files to sqlite (with type guessing) | |
# | |
# @author: Rufus Pollock | |
# Placed in the Public Domain | |
import csv | |
import sqlite3 | |
def convert(filepath_or_fileobj, dbpath, table='data'): | |
if isinstance(filepath_or_fileobj, basestring): |
Additions wanted - please just fork and add.
- Parsing PDFs by Thomas Levine
- [Get Started With Scraping – Extracting Simple Tables from PDF Documents][scoda-simple-tables]
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2010-11-P01.csv:4:Vendor,Expense Description,Amount,Doc No,,,^M | |
2010-11-P02.csv:6:Vendor,Expense Description,Amount,Doc No,,,^M | |
2010-11-P03.csv:6:Document No","Amount | |
2010-11-P04-500.csv:1:Vendor ID,Vendor Name,Cost Element,Expenditure Account Code Description,SAP Document No,Amount £,Clearing Date^M | |
2010-11-P05-500.csv:1:Vendor ID,Vendor Name,Cost Element,Expenditure Account Code Description,SAP Document No,Amount £,Clearing Date^M | |
2010-11-P06-500.csv:1:Vendor ID,Vendor Name,Cost Element,Expenditure Account Code Description,SAP Document No,Amount £,Clearing Date^M | |
2010-11-P07-500.csv:1:Vendor ID,Vendor Name,Cost Element,Expenditure Account Code Description,SAP Document No,Amount £,Clearing Date^M | |
2010-11-P08-500.csv:1:Vendor ID,Vendor Name,Cost Element,Expenditure Account Code Description,SAP Document No,Amount £,Clearing Date^M | |
2010-11-P09-500.csv:1:Vendor ID,Vendor Name,Cost Element,Expenditure Account Code Description,SAP Document No,Amount £,Clearing Date^M | |
2010-11-P10-500.csv:1:Vendor ID,Vendor Name,Cos |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// Main containers | |
.container | |
@include outer-container | |
// Rows | |
.row | |
@include row() | |
// A basic column without a defined width or height |
README is empty
HDX Common Humanitarian Dataset data into CKAN instance (we used datahub.io for convenience).
http://datahub.io/dataset/hdx-common-humanitarian-dataset
We've loaded (indicator) value table and indicator table separately in the CKAN DataStore (we have not bothered loading dataset table for the present) and we've also created a python script to automate this (which can also serve as an example of how to work with CKAN API).
Setting this up was pretty fast (most of the work was actually tidying up the data and then making some scripts to make this repeatable and testable).
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// replace this with your CKAN website | |
var ckanSite = 'http://datahub.io' | |
var sql = 'Your SQL goes here'; | |
// ================= | |
// Using jQuery only | |
// ================= | |
var data = encodeURIComponent(JSON.stringify({sql: sql})); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# This is an alternate proposal for a metadata structure for OpenSpending | |
# data models. The most significant change is that data is modelled in a | |
# way that highlights logical connections between fields, rather based on | |
# columns. This also means that column naming conventions are not needed. | |
# | |
# This proposal uses YAML to represent the model, but implementations | |
# would probably use JSON instead. | |
# The proposed format is currently supported by spendb and cubepress. | |
# | |
# The following is a data model for a fictitious budget/spending dataset. |
OlderNewer