tommorris/gist:338692

## gistfile1.txt
Received: by 10.223.112.7 with HTTP; Sat, 20 Mar 2010 07:17:28 -0700 (PDT)
Date: Sat, 20 Mar 2010 14:17:28 +0000
Subject: Data.gov.uk format verification preliminary results
From: Tom Morris <tom@tommorris.org>
To: uk-government-data-developers@googlegroups.com
Content-Type: text/plain; charset=ISO-8859-1

Sorry it has taken so long, but here are the aggregate results of the
data.gov.uk format verification exercise.

HTML - 252
XML - 5
Word - 4
RTF - 1
OpenOffice - 1
Something odd - 85
JSON - 9
Nothing there! - 190
CSV - 12
Multiple formats - 1211
PDF - 468
RDF - 10
Excel - 408
TOTAL - 2656

Sadly, this is over-optimistic. I've manually checked some of the data
that has been categorised as JSON and RDF. Most of it is not actually
correctly categorised - either people clicked, say, 'RDF' when they
meant to click 'PDF', or they have seen an RSS or Atom feed and
categorised it as RDF.

What this admittedly imperfect dataset is basically saying is that the
vast majority of the 'data' on data.gov.uk is not actually
machine-readable data but human-readable documents.

I'll publish the complete dataset later.

--
Tom Morris
<http://tommorris.org/>
	Received: by 10.223.112.7 with HTTP; Sat, 20 Mar 2010 07:17:28 -0700 (PDT)
	Date: Sat, 20 Mar 2010 14:17:28 +0000
	Subject: Data.gov.uk format verification preliminary results
	From: Tom Morris <tom@tommorris.org>
	To: uk-government-data-developers@googlegroups.com
	Content-Type: text/plain; charset=ISO-8859-1

	Sorry it has taken so long, but here are the aggregate results of the
	data.gov.uk format verification exercise.

	HTML - 252
	XML - 5
	Word - 4
	RTF - 1
	OpenOffice - 1
	Something odd - 85
	JSON - 9
	Nothing there! - 190
	CSV - 12
	Multiple formats - 1211
	PDF - 468
	RDF - 10
	Excel - 408
	TOTAL - 2656

	Sadly, this is over-optimistic. I've manually checked some of the data
	that has been categorised as JSON and RDF. Most of it is not actually
	correctly categorised - either people clicked, say, 'RDF' when they
	meant to click 'PDF', or they have seen an RSS or Atom feed and
	categorised it as RDF.

	What this admittedly imperfect dataset is basically saying is that the
	vast majority of the 'data' on data.gov.uk is not actually
	machine-readable data but human-readable documents.

	I'll publish the complete dataset later.

	--
	Tom Morris
	<http://tommorris.org/>