Skip to content

Instantly share code, notes, and snippets.

@charlesroper
Last active August 29, 2015 14:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save charlesroper/a8a920b4d4c20f4c9e1c to your computer and use it in GitHub Desktop.
Save charlesroper/a8a920b4d4c20f4c9e1c to your computer and use it in GitHub Desktop.
LaSER-opendata-talk.md

LaSER LRC Technical Meeting notes and links

By Charles Roper
charles.roper@sxbrc.org.uk
http://sxbrc.org.uk
http://twitter.com/charlesroper

2014-09-11 GIGL, London

Table of Contents

Useful Links

Websites

Articles

Books / Papers

Guides

  • The ODI guides - lots of useful guides about various aspects of open data
  • The Open Data Handbook - A handbook available as HTML or PDF that discusses the legal, social and technical aspects of open data

Online GIS / Spatial IT

  • OpenGeo Suite
  • CartoDB
  • CartoDB Vision - "The future of geo isn’t a single app with hundreds of buttons. The future of geo is hundreds apps with a single button." Note, CartoDB is developed by a company called Vizzuality, who specialise in ciziten science and environmental data and mapping. I first became aware of them at the eBiosphere conference.
  • Mapbox

Other useful online tools/systems

  • Storymaps
  • Mode - Collaborative online data upload and analysis
  • Ordnance Survey On-demand - This is OS's commercial subscription WMS/WFS service. Much of the data here is open data and freely available, but OS still sells a convenient, easy-to-use, plug-and-play service to save the considerable hassle associated with manual download and update of their various datasets. Look at the pricing page for pricing structure ideas.
  • ScapeToad - Cartogram software

INSPIRE

Linked Data


Notes for Charles Roper's talk

Slides available on SpeakerDeck.

What is Open Data?

"Open means anyone can freely access, use, modify, and share for any purpose (subject, at most, to requirements that preserve provenance and openness)." -- http://opendefinition.org/

  • The work must be available under an open license, preferably a well established standard license like Creative Commons, Open Data Commons or Open Government License.
  • Must be easily and cheaply accessible; i.e., via internet download.
  • Must be in a standard open format; i.e., easy to open and use.
  • May or may not require attribution and/or 'share alike'
  • Unlike other recent buzzwords like "big data" and "the cloud", open data is a movement that has a very specific meaning and that's a good thing. It provides a focus and limits marketing spin.

The problem with biodiversity data

  • Peter Desmet and other researchers collectively known as Datafable analysed the licenses of all 11000+ datasets (~416 million records) registered on GBIF.
  • Only 10% of those datasets (26% of the occurrences) have any license at all, rendering them practically useless
  • Only 1.4% of all datasets however (2% of all occurrences) are published with a standard license.
  • The net results is that only a tiny proportion of data on GBIF is practically useful.

Concrete example

  • Peter downloaded all 13000+ records of American Bullfrog data from GBIF and wanted to plot them on a map for a blog post he was working on. See here.
  • Technically, it's easy to do, but the terms we agree to when using GBIF state we MUST observe the terms of the original provider.
  • In the case of the Bullfrogs, that involves carefully reading 65(!) license statements. Of those statements, only 4 are standard CC licenses leaving a whopping 61 bespoke licenses to inspect.
  • After considerable work to investigate the licenses, only 4% of the data may be used in a commercial context. If you're a journalist, have adverts on your blog, are running a blog on behalf of your business, this 4% is all you're allowed to use.
  • 28% of data may be used in non-commercial setting.
  • The remaining 72% cannot be used without first contacting 52 individual institutions and either seeking clarification or asking permission.
  • THIS DOES NOT SCALE
  • The results is that people will either ignore the data or ignore the fine print. Either way, this is undesirable.

"Obscurity is a far greater danger than piracy" -- Tim O'Reilly

Source article from 2002: Piracy is Progressive Taxation, and Other Thoughts on the Evolution of Online Distribution.

That is, the greater challenge is not that people will take our data an use it without permission or payment, but that people do not know our data exists. O'Reilly argues that piracy is a kind of progressive tax: the more exposure and more need for data, the more it will get illegitimately used, but also legitimate use also goes up. We should be trying so solve the problem of obscurity rather than limiting use.

Example: at a recent planning conference, lots of the planners had never heard of us, didn't know how to access our data, and wouldn't know what to do with it or how to interpret it if they did. This is an opportunity!

Clients will generally do the right thing in exchange for a fair price and a convenient service. Look at iTunes, Spotify and Netflix vs the illegal file sharing networks.

  • O'Reilly is a great example of this. Despite it being incredibly easy to download pirated copies of their books from the likes of it-ebooks.info, their online ebook subscription service - which gives access to their full catalogue - is a huge success. Even more interestingly, books they publish either in full or in part online as web pages or as part of the documentation associated with programming languages, the regular printed books are some of their most popular. People will pay for convenience, packaging and quality.
  • iTunes is now the biggest music retails in the world.

"Free" is eventually replaced by higher-quality paid service. Again, look at something like Spotify, Ordnance Survey, BGS.

  • This is known as the freemium business model.

Open data offers us a tool to answer these challenges and reach these goals:

  • It can increase exposure and enhance reputation.
  • It invites greater participation and catalyse further data generation.
  • It invites scrutiny.
  • It is far better to have data out there and being used that sitting unused in obscurity.

Record Centres should invest in data-as-a-service and enhanced products.

  • We should make more of the unique services and skills we can offer.
  • Open data will shed light on these services and demonstrate capability.
  • We can make use of our own open data - and others - in services such as planning screening.
  • Crucially, we could open a large proportion of our data while keeping the "premium" data back in order to fund operations and reinvestment - freemium
  • Package the data so that it is easy to use and hard to ignore.
  • Make the premium stuff exceptionally high quality; add value.
  • Require that freely available data is share-alike; clients can 'buy' private use if they don't want to share their work.
  • Innovate: there are over 8000 government datasets available on data.gov.uk with a value estimated at £16 billion to the economy. Make use of this data. Become part of that ecosystem.
  • A business case guide from The ODI.

At the very least have a position, a policy or a strategy to incorporate open data into your business.

  • What can safely I open? Experiment!
  • At the start of every project, consider if the output can be open data either in part or in whole.
  • Go through The ODI's self-certification process.
  • Work with small amounts of data first. Target, trial, repeat.
  • Get with the movement! There is kudos attached, recognition, funding pots available a common vocabulary, understanding and set of values. It's a way of positively promoting data and its value.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment