Skip to content

Instantly share code, notes, and snippets.

View datapolitan's full-sized avatar

Datapolitan datapolitan

View GitHub Profile
@chriswhong
chriswhong / idea.md
Created July 1, 2016 20:08
Idea for git-powered distributed dataset management

The Problem:

If you follow the open data scene, you'll often hear about how the "feedback loop" for making corrections, comments, or asking questions about datasets is either fuzzy, disjointed, or nonexistent. If I know for a fact that something in a government dataset is wrong, how do I get that record fixed? Do I call 311? Will the operator even know what I am talking about if I say I want to make a correction to a single record in a public dataset? There's DAT. There's storing your data as a CSV in github. These approaches work, but are very much developer-centric. (pull requests and diffs are hard to wrap your head around if you spend your day analyzing data in excel or desktop GIS. The fact of the matter is that most of the people managing datasets in government organizations are not DBAs, data scientists, or programmers.

Idea:

It's basically git for data plus a simple UI for exploration, management, and editing. Users would have to use Github SSO to edit in the UI, and behind the scenes