Skip to content

Instantly share code, notes, and snippets.

@rossjones
rossjones / gist:4761582
Created February 12, 2013 11:00
Content neg in CKAN
wget --header="Accept:application/rdf+xml" http://demo.ckan.org/dataset/gold-prices
@rossjones
rossjones / datahub-despam.py
Last active December 15, 2015 02:29
Simple (but slow) script to remove groups from datahub.io
import re
import sys
import ckanclient
import dateutil.parser
import datetime
em = re.compile('.*@(.*)')
spammy = ["yahoo.com", "hotmail.com", 'mindpowerup.com',
"yahoo.co.uk", 'acumenwit.com', 'hotmail.fr',
@rossjones
rossjones / core.clj
Created April 2, 2013 20:10
Start of hangman
(ns hangman.core
(:gen-class)
(:require [clojure.string :as str]))
(defn char_to_draw
"Chooses whether to draw a _ or a char"
[ch guesses]
(if (is_char_in_words ch guesses)
(str ch " ")
(str "_ ")
@rossjones
rossjones / lovematch.py
Created April 16, 2013 20:01
Our first simple Python program for PyPool
#!/usr/bin/env python
# coding: utf-8
# lovematch asks for two names and determines their compatibility
# IPO imminent
import sys
VOWELS = "aeiou"
def get_score_for(person):
common = ['e','t', 'i', 's', 'o', 'n', 'h', 'r', 'a', 'f', 'u', 'l', 'd', 'g', 'm', 'w', 'p', 'y', 'c', 'b', 'v', 'k', 'x', 'j', 'q', 'z']
@rossjones
rossjones / import-from-classic.py
Created July 3, 2013 19:17
Simple tool to pull your code and data from ScraperWiki classic and overwrite the current tool
#!/usr/bin/env python
import urllib,requests
from lxml.html import fromstring
SCRAPER_NAME = "smr"
code = 'http://classic.scraperwiki.com/editor/raw/{0}'.format(SCRAPER_NAME)
db = 'https://classic.scraperwiki.com/scrapers/export_sqlite/{0}/'.format(SCRAPER_NAME)
lang_page = fromstring(requests.get('http://classic.scraperwiki.com/scrapers/{0}'.format(SCRAPER_NAME)).content)
@rossjones
rossjones / xls_template.php
Last active December 19, 2015 16:28
Creates an XLS file containing the column titles we desire, and adding a drop-down list to the first 500 cells in the Owner column. Doesn't enforce validation of date in the last column, we'll use dateutil for that.
<?php
error_reporting(E_ALL);
date_default_timezone_set('Europe/London');
$publisher = $_GET["publisher_name"];
// TODO: Validate publisher name
/* For the given publisher, return the entire list of sub-publishers
as a flat array of names (including the one provided). */
function get_subpublishers_for($name) {
@rossjones
rossjones / brute_force_ods.py
Created July 18, 2013 14:09
Fetch the rows from the first sheet in an ODS file by loading the whole tree into memory. Ideally should sax parse this, but not really much better with our test case (10Mb content.xml).
#!/usr/bin/env python
from lxml import etree
TABLE_NS = u"urn:oasis:names:tc:opendocument:xmlns:table:1.0"
TEXT_NS = u"urn:oasis:names:tc:opendocument:xmlns:text:1.0"
def get_rows_from_file(doc):
nodes = doc.xpath("//t:table[1]", namespaces={"t": TABLE_NS})
if nodes:
@rossjones
rossjones / slightly_less_brute_force.py
Created July 18, 2013 14:12
Lots of code, low memory usage(ish, in comparison to the 4gb odfpy was using), about 9 seconds for the test file. Still ends up loading lots into memory.
#!/usr/bin/env python
from lxml import etree
import mmap
def ram_used(where):
import resource
print "func:{0}:{1}".format(where, resource.getrusage(resource.RUSAGE_SELF).ru_maxrss/1024)
def _fast_iter(context, func):
#!/usr/bin/env python
"""
1. Connect to sqlite.
2. Write out data column into name.json
1. For each name, create a folder with that name, write the code into that folder.
2. Download the data file
3. Move the name.json into the folder (it's done)
"""