Skip to content

Instantly share code, notes, and snippets.

@tmcw
Created December 13, 2012 18:39
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save tmcw/4278586 to your computer and use it in GitHub Desktop.
Save tmcw/4278586 to your computer and use it in GitHub Desktop.
open data chatter

How I would do an open data program if I were not just a lil' hacker dude.

The principles of open data:

  • the program needs to make the agency look good, so it can keep getting funding for it because it's expensive
  • the open data tech needs to match the internal tech or everyone is constantly porting and will give up since it's lots of work
  • the open data infrastructure must choose a spot on the continuum between service and resource. should other services rely on it to be online all the time, or just for it to let them download tons of copies of things?

Lots of Copies Keep Stuff Safe

The biggest thing open data programs get wrong because

  • They have an overinflated idea of 'authenticity'
  • They have an overinflated fear of 'inauthentic' documents or inappropriate uses harming them.

But the point is that at some point most open data programs see a bad fate

  • Their tech doesn't scale with the real world. WMS
  • Their data and tech isn't modern enough to use. Old excel files.
  • They run out of funding and go offline.

The solution to this is many copies. To make this possible, you need to make entire open data programs cloneable. This can be easy for simple open data pushes: make the entire site & database static & available as a multi-gigabyte download.

In a lot of cases the so-called free so-called market will (via invisible hands, or thiel or something) make lots of copies and pay to keep them safe for you. See Census data. Where this isn't true - in open data pushes that aren't publicly popular yet, you can use good ol fashioned mirrors and open source tactics here. A copy on S3, one on your servers, and so on.

To address that inauthenticity problem, you can use checksums and so on to let skeptics check to see whether data is real. But it's mostly a theoretical nervous-people problem.

The Government Open Data License Agency

This doesn't exist yet but should. Government agencies who have data that's not quite classfied but also not quite public domain (aka, data made by contractors, thanks invisible market hand) always get nervous about their data.

There needs to be an intra-government resource made up of lots of lawyers who both help in the decisions of what license to choose and they write new licenses if necessary, which make these kinds of decisions less taxing. (get the pun)

No New Technology

The most successful open data program is the one that doesn't require new technology. We're not there yet - there are plenty of needs that are totally unaddressed, but let's not forget that software is what we might call 'hella expensive to develop' and easily a drain on time and money.

For instance, you have water fountain locations for the US. The full download is two gigs, so you want subsets. Use ArcGIS or QGIS to bake them, and it takes two hours and deploying it is free. Write a slicer-dicer and it costs a ton and is a new service to keep online.

@smathermather
Copy link

Assets FedIT has that augment Open Data

  • Size in negotiating contracts + cost sharing for imagery and other expensive initiatives
  • Hierarchy and discoverability
  • Capacity to guide IP choices (patent avoidance, open source, etc.)
  • Create infrastructure (not just data sharing) to facilitate public good.

@feomike
Copy link

feomike commented Dec 15, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment