Skip to content

Instantly share code, notes, and snippets.

View andycochran's full-sized avatar

Andy Cochran andycochran

View GitHub Profile
@chriswhong
chriswhong / idea.md
Created July 1, 2016 20:08
Idea for git-powered distributed dataset management

The Problem:

If you follow the open data scene, you'll often hear about how the "feedback loop" for making corrections, comments, or asking questions about datasets is either fuzzy, disjointed, or nonexistent. If I know for a fact that something in a government dataset is wrong, how do I get that record fixed? Do I call 311? Will the operator even know what I am talking about if I say I want to make a correction to a single record in a public dataset? There's DAT. There's storing your data as a CSV in github. These approaches work, but are very much developer-centric. (pull requests and diffs are hard to wrap your head around if you spend your day analyzing data in excel or desktop GIS. The fact of the matter is that most of the people managing datasets in government organizations are not DBAs, data scientists, or programmers.

Idea:

It's basically git for data plus a simple UI for exploration, management, and editing. Users would have to use Github SSO to edit in the UI, and behind the scenes

@andrewxhill
andrewxhill / pluto_reverse_geocode.sql
Created August 14, 2013 15:59
PLUTO reverse geocode SQL
CREATE TYPE pluto_reverse_geocode_result AS (address text, ownername text,distance float);
CREATE OR REPLACE FUNCTION pluto_reverse_geocode(float,float,int) RETURNS SETOF pluto_reverse_geocode_result
AS '
WITH subq as (SELECT address,the_geom,ownername
FROM nyc_mappluto_13v1
ORDER BY the_geom <-> CDB_LatLng($1,$2) LIMIT 20)
SELECT address,ownername,
ST_Distance(the_geom::geography, CDB_LatLng($1,$2)::geography) as distance
FROM subq
WHERE ST_Distance(the_geom::geography, CDB_LatLng($1,$2)::geography) < $3