This page: http://j.mp/onboardinghack
Make a news app. In 2 days.
- Telling a story with software, and software that generates stories
- Readers, not just users — help people find their story in larger stories
- Impact
Apps generating stories http://projects.propublica.org/schools/schools/63441005650#63441007350,63441007845,63441007352,63441001276,63441005602
http://shaw.al.s3.amazonaws.com/opennews/polltracker.png
stories generating apps generating stories: https://speakerdeck.com/a_l/caf-seminar-quito-2013?slide=41
- Acquire data
- Clean/bulletproof data
- find stories/trends in data
- look for related datasets
- Do additional reporting
- Import data
- Design and build app
- Create graphics for app “lede”
- Deploy app
- 2 days, 3 people, EditorsLab Hackathon: http://projects.propublica.org/graphics/heartsaver
- 1 day, 5ish people, NYT Hack Day: http://happystance.s3-website-us-east-1.amazonaws.com/
- We’re all generalists (designer/developer/reporter)
- Fakey Agile
- We use tickets on sprawling projects and deadlines
- An app has a “captain,” but pulls in others as necessary
- Rigorous editorial and fact-checking process
- Adaptive to responsive, as app requires, but not religious
- Bylines!
- Find data on public data site such as data.gov
- Request data from an agency, and receive it in a usable format
- Scrape data from a public website and store it in your favorite format
- Request data from an agency, and transform it from a hostile format
- Create your own dataset because the data you want does not exist
- Find the nerds, not the PR office
- Look in the metadata
- FOIA
Key idea: Look for "preposterousness"
- Counts and totals
- Limits of Excel & MySQL
- Absurd max/mins
- Blanks vs. nulls
- Misspellings
- Data Types (ask for a record layout)
- Bad geocoding, duplicate city names
- Check against reports and hard copies
- Call, don't assume
- Do random spot checks
Jen LaFleur's guide to Bulletproofing https://github.com/propublica/guides/blob/master/data-bulletproofing.md
rails new yourapp
If you use tabletop + Google Docs, be careful of Google’s arbitrary login walls.
- Try to mimic the schema of the data you’re using
- Find a record layout
- Take note of decimal precision
- ZIP codes are strings; some start in 0
- Latlongs don’t need 15 units of precision; at that point you’re mapping atoms.
Example record layout: http://shaw.al.s3.amazonaws.com/opennews/nfhl-record-layout.png
Migrations: http://guides.rubyonrails.org/migrations.html
Rake: http://rake.rubyforge.org/
Keep track of your changes with git: http://git-scm.com/
- Rake importer example, which assumes column names are the same as column names in your database: https://gist.github.com/ashaw/eb9769bcfc86ca0663da
- What can be joined on your dataset? Sometimes the news is in the join.
Examples:
-
SBA data + flood zone data = news: http://projects.propublica.org/sandy-sba/
-
FEC IDs add relevance to two apps:
-
When news apps collide, more news happens! http://www.propublica.org/article/top-medicare-prescribers-rake-in-speaking-fees-from-drugmakers
Just because you can join doesn’t mean you should. Focused apps are better than sprawling ones.
- The “far”
- Look at maximums/minimums, clumps, outliers
- Look at correlations
- Geographic trends
- Break down by states or brand
- Let people sort
- Show people what to look for in individual views
Varieties of “far”
- Big picture: http://projects.propublica.org/schools/
- Example results: http://projects.propublica.org/graphics/backscatter
- Part in whole: http://www.propublica.org/article/income-inequality-near-you
Inspiration: http://collection.marijerooze.nl/
Tools:
- TableSorter: http://tablesorter.com/
- All news apps should have “nears” and “fars”
- Telescoping “nears”
- You’re on deadline — whiteboard sketch your views, instead of Photoshopping (http://37signals.com/svn/posts/1061-why-we-skip-photoshop)
- Always include a search box if you can, even if it’s just a “filterer” (e.g. http://projects.propublica.org/alec-contributions/)
- Avoid “Here’s some data, see what you can find in it” (e.g. http://www.nola.com/politics/index.ssf/2013/11/database_search_louisiana_camp.html#incart_river)
- Don’t give your users shit work http://zachholman.com/posts/shit-work/?utm_source=bronto
- Most “blue box” apps have these endpoints:
- items/index
- items/show
- search
- geo/show
- Use canonical IDs if you have them.
- FIPS are your friends
- otherwise, use friendly_id: https://github.com/norman/friendly_id
- Use AP/NYT style! https://github.com/ascheink/nytimes-style
- Don’t make population maps: http://www.ericson.net/content/2011/10/when-maps-shouldnt-be-maps/
- ProPublica news apps style guide: https://github.com/propublica/guides/blob/master/news-apps.md
- Apps have a few lines of “guff” by way of explanation. If you need more than that, you may want to refine your UI to be more intuitive.
- Move excess guff into tooltips
- Only get as complex as you need to. Don’t use elasticsearch or sphinx if you can get away with LIKE
def search
q = ActiveRecord::Base::sanitize(params[:q])
drugs = Drug.where("name LIKE ?", "%#{params[:q]}%").limit(50);
render :json => drugs.to_json
end
- AWS + Capistrano: http://capistranorb.com/
- Dump/load your local database. If it’s SQLite,
scp
it! - Heroku: https://www.heroku.com/
- Introduction
- Scrum/discussion of ideas
- Claim specialties (1-2 people each). Possibles:
- App/backend
- Design/JS/CSS
- Front page “lede” graphic/far view
- Reporting, data analysis, guff-writing
- Other specialties?
- Working time
- End of day scrum
- Scrum
- Working time
- Gather around lunchtime for editing/critique
- Working time
- Deployment — if we finish
- Presentation