Skip to content

Instantly share code, notes, and snippets.

@davidread
davidread / gist:979a7d63f6d813052255
Created November 3, 2014 13:34
Metadata provenance
Example dataset extra field "metadata_provenance"
This dataset originates at Barnet Open Data, is then harvested to London Datastore and then harvested onto data.gov.uk. When it is viewed in data.gov.uk it has this metadata_provenance:
[
{
"activity_occurred": "2014-10-21T09:04:19.753433",
"activity": "harvest",
"harvest_source_url": "https://open.barnet.gov.uk/",
"harvest_source_title": "Barnet Open Data",
<a href="fdfd"></a>
import nose.tools
import ckan.model as model
assert_equals = nose.tools.assert_equals
assert_not_equals = nose.tools.assert_not_equals
Resource = model.Resource
class TestResource(object):
@davidread
davidread / pgbouncer.ini
Created March 19, 2015 14:02
data.gov.uk pgbouncer config /etc/pgbouncer/pgbouncer.ini
;; database name = connect string
[databases]
ckan = host=127.0.0.1 dbname=ckan user=dgu password=REDACTED pool_size=40
;;dgudatastore = host=127.0.0.1 dbname=dgudatastore user=dgu password=REDACTED pool_size=10
;;dgudatastore_ro = host=127.0.0.1 dbname=dgudatastore user=dguro password=REDACTED pool_size=10
dgucelery = host=127.0.0.1 dbname=dgucelery user=dgu password=redacted pool_size=5
;; Configuation section
;; Configuation section
[pgbouncer]
dread@dmbp:~$ git clone git@github.com:datagovuk/dgu-vagrant-puppet v-test
Cloning into 'v-test'...
remote: Counting objects: 1742, done.
remote: Compressing objects: 100% (9/9), done.
remote: Total 1742 (delta 2), reused 0 (delta 0), pack-reused 1733
Receiving objects: 100% (1742/1742), 40.88 MiB | 2.02 MiB/s, done.
Resolving deltas: 100% (836/836), done.
Checking connectivity... done.
dread@dmbp:~$ cd v-test
dread@dmbp:~/v-test$ git checkout togo
>>> requests.get('http://uk-air.defra.gov.uk/datastore/pcb/2010_TOMPs_PCB_Data.xlsx', headers={'User-agent': 'python-requests'})
<Response [403]>
>>> requests.get('http://uk-air.defra.gov.uk/datastore/pcb/2010_TOMPs_PCB_Data.xlsx', headers={'User-agent': 'python-request'})
<Response [200]>
@davidread
davidread / gist:0a34b59bffa86eb37812
Created December 11, 2015 20:53
Using Met Office weather open data on Azure Data Market
1. Register for Microsoft account at https://signup.live.com/signup
2. Email verification - click link (logs you in)
3. Register on Azure Data Market at https://datamarket.azure.com/register
4. Go to: https://datamarket.azure.com/dataset/datagovuk/metofficeweatheropendata click 'Sign up' then 'agree', then 'sign up'.
Use Web i/f to download CSV:
At: https://datamarket.azure.com/dataset/explore/0f2cba12-e5cf-4c6d-83c9-83114d44387a click 'Explore', 'Three Hourly Forecast' and 'Download Excel (CSV)'
or by API:
Go to: https://datamarket.azure.com/account for account key
$ paster govuk_publications --config=/var/ckan/ckan.ini scrape
...
After 2387/2387 pages:
Publications:
Created: 94880 ['consultations/gda-of-hitachi-ge-nuclear-energy-ltds-uk-advanced-boiling-water-reactor', 'consultations/postgraduate-doctoral-loans', 'consulta...
Unchanged: 484 ['publications/the-ombudsmans-annual-report-and-accounts-2015-16', 'publications/rg1-8nh-kingfisher-colours-limited-environmental-permit-applicati...
Updated: 83 ['publications/oil-and-gas-public-statements-relating-to-2014-operations', 'statistics/tabulation-tool-questionnaire-statistical-notice', 'publicat...
Error - Incomplete publication - title: 7 ['statistics/womens-smoking-status-at-time-of-delivery-in-england-october-2014-to-december-2014', 'statistics/summary-hospital-level-mortality-indic...
Error - Publication redirect: 1 ['publications/preventing-illegal-working-guidance-for-employers-october-2013']
@davidread
davidread / gist:6a2148797e9ad3807f3cd167b29a05c2
Created November 18, 2016 15:37
Top domains in data.gov.uk resources
SELECT substring( R.url from '.*://([^/]*)' ) as hostname, count(substring( R.url from '.*://([^/]*)' )) from resource R
JOIN resource_group RG ON R.resource_group_id = RG.id
JOIN Package P ON P.id = RG.package_id
WHERE R.state='active' and P.state='active'
GROUP BY hostname
ORDER BY count desc;
hostname | count
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------
www.gov.uk | 20151
www.ons.gov.uk
@davidread
davidread / gist:af650b9227e5ed54d891dbeeb0e2c254
Last active December 6, 2016 11:51
Running 'paster db init -c test-core.ini' has this error, following running the tests. For more info and the fix, see: https://github.com/ckan/ckan/issues/3354
(ckan)vagrant@precise64:/vagrant/src/ckan$ paster --plugin=ckan db init -c test-core.ini
Traceback (most recent call last):
File "/home/vagrant/ckan/bin/paster", line 11, in <module>
sys.exit(run())
File "/home/vagrant/ckan/local/lib/python2.7/site-packages/paste/script/command.py", line 102, in run
invoke(command, command_name, options, args[1:])
File "/home/vagrant/ckan/local/lib/python2.7/site-packages/paste/script/command.py", line 141, in invoke
exit_code = runner.run(args)
File "/home/vagrant/ckan/local/lib/python2.7/site-packages/paste/script/command.py", line 236, in run
result = self.command()