mitchthorson/NICAR16.md

## NICAR16.md

      
    Raw
  

              NICAR16.md
            
          
    #NICAR 2016
##Interactive News Spreadsheet
http://depot.thethrust.net/

Creating tools to take you 80-90% of the way there, frees you up to create the bespoke content that is hard, rather than the bespoke content that is easy.
Making code sharable is hard. Requires and extra layer of work to scrub it. Is it worth it?
Open sourcing can force you to be honest with your own development process.
data is available in interactive by default…people can cite your work already. why not make it clearly available in a public way.

##New things with old data

improve process lessons toll on people

##Data for breaking news

consider breaking news/disaster scenarios ahead of time: wildfires, tornadoes, hurricanes, floods, bridge collapses, building collapse, train plane boat crash, active shooters, terror, mining, chemical spills, power outages. Build tools when its quiet.

##Deep Dives
Failure Factories


had to overcome conventional wisdom with technology and data
buy in was very important
required a different approach
testing conventional wisdom with data in order to knock it down
asking what types of things NORMALLY impact performance, and tested each factor with data
beyond traditional data: called other districts, checked promises that were broken
database out of kid's stories
not one big analysis, tiny analysis after tiny analysis
technology helped: used database of kids in an interactive story...

working with big geodata (without messing up)


data 4.4 million uber pickup lat/lngs
geocode 93 million lat/long pickups counting the number that fell into each NY Census tract.
wanted to turn around quickly...failed
first mistake: Python. 23 weeks to run. know when your data needs a database.
estimate how long your code will take to run by testing small chunks
mistake 2: reinvent GIS should have used PostGIS
generate shape files in QGIS
put them in postgres and assigned each point a census tract
mistake 3 projection sloppiness
mistake 4 know how your tools work. PGCOPY better than ogr2ogr and passing files back and forth from tool to tool
mistake #5 didn't index.
mistake #6 too much corner cutting. didn't normalize data formats, ignored messy date time formatting, deleted columns that didn't matter (at the time), all corner cutting was fine for first story, but not for future analysis
STRUCTURE your data for any question
normalize your data
know the right tool, spend time learning the basics
don't invent something new
beware sunk costs that aren't panning out
visually validate your data

Dataviz for all, mobile, accessible


working for mobile changes your plan from the beginning
designers are used to working mobile first, but reporters and editors need to think that way too
stop doing charts that inherently don't work on mobile
choose design patterns that work on both (single column layouts)
keep it light
scrollmaster.js ?
GIF the dataviz
make mobile a part of the process
SVG crowbar from NYT http://nytimes.github.io/svg-crowbar/
make interactions lazy (scroll v tap)

Which chart should i use and why? information design for the human brain!

http://paldhous.github.io/NICAR/2016/infodesign.html

visualization: encoding data by visual cues
our brains don't treat all visual cues equally.
accuracy: length (aligned), length, slope, angle, area, color intensity, volume, color hue
multiple data sets over time: dotted line chart but no more than 5
many data sets (58 counties in CA), put counties on Y-axis, encode value with color intensity
move up and down the hierarchy, but think about it
color brewer, color oracle

Agate

guide: https://github.com/mattwaite/JOUR407-Data-Journalism/blob/master/Examples/FebruaryHeatWave.ipynb
Admin tools

John Schleuss LA TIMES


examples: bingo cards (oscars, debate)
a lot was initially done in Google Sheets, convert to JSON
ended up adding a new app to their django-powered graphics-rig with an editing interface
quicker-turn stuff sticks with Google sheets, ongoing stuff might lean towards django-powered

Lindsey Cook US News and World Report


tabletop.js
google form into google-docs as janky user-submitted content
using Git lab GUI to commit to raw JSON files

Gregor Aisch NYT


so many people, needed to know who was working on what
Preview: shows each project, who updated, when, version history, pulls from git server
can even preview old versions
command line tool to create new things preview create $TEMPLATE
preview publish pushes to CMS
CMS allows for free-form HTML via an API, free-form asset can be embedded or used as its own URL
USE google docs (not spreadsheets) with a custom markup language (ArchieML) pulled into graphic as JSON by preview command-line-tool
example of one of these pages: http://www.nytimes.com/interactive/2016/03/02/us/super-tuesday-results-delegates.html
can preview changes in the preview server automatically

Parsing Prickly PDF's

https://github.com/jsfenfen/parsing-prickly-pdfs

LA precinct election example: https://github.com/jsfenfen/parsing-prickly-pdfs/blob/master/examples/la-precinct-bulletin/la-precinct-bulletin.ipynb

Command Line Graphics

repo: https://github.com/jonkeegan/command-line-graphics
slides: https://docs.google.com/presentation/d/1YEP9VJM16foortYfbaLrcwCR8X8V_XA5uEWPLoDjJ9Y/edit#slide=id.p

command line is worth it when it comes to visuals
ImageMagick for avengers cover analysis
ImageMagic = command line photoshop
resize images convert -resize 200x *.jpg
ffmpeg
node exif package extracts metadata from images
http://source.opennews.org/en-US/articles/cassini/

Machine Learning basics

https://github.com/cjdd3b/nicar2016

2 flavors: Supervised, and unsupervised learning
consider precision v. recall when evaluating models
k-fold cross-validaation