#NICAR 2016
##Interactive News Spreadsheet
- Creating tools to take you 80-90% of the way there, frees you up to create the bespoke content that is hard, rather than the bespoke content that is easy.
- Making code sharable is hard. Requires and extra layer of work to scrub it. Is it worth it?
- Open sourcing can force you to be honest with your own development process.
- data is available in interactive by default…people can cite your work already. why not make it clearly available in a public way.
##New things with old data
- improve process lessons toll on people
##Data for breaking news
- consider breaking news/disaster scenarios ahead of time: wildfires, tornadoes, hurricanes, floods, bridge collapses, building collapse, train plane boat crash, active shooters, terror, mining, chemical spills, power outages. Build tools when its quiet.
##Deep Dives
- had to overcome conventional wisdom with technology and data
- buy in was very important
- required a different approach
- testing conventional wisdom with data in order to knock it down
- asking what types of things NORMALLY impact performance, and tested each factor with data
- beyond traditional data: called other districts, checked promises that were broken
- database out of kid's stories
- not one big analysis, tiny analysis after tiny analysis
- technology helped: used database of kids in an interactive story...
- data 4.4 million uber pickup lat/lngs
- geocode 93 million lat/long pickups counting the number that fell into each NY Census tract.
- wanted to turn around quickly...failed
- first mistake: Python. 23 weeks to run. know when your data needs a database.
- estimate how long your code will take to run by testing small chunks
- mistake 2: reinvent GIS should have used PostGIS
- generate shape files in QGIS
- put them in postgres and assigned each point a census tract
- mistake 3 projection sloppiness
- mistake 4 know how your tools work. PGCOPY better than ogr2ogr and passing files back and forth from tool to tool
- mistake #5 didn't index.
- mistake #6 too much corner cutting. didn't normalize data formats, ignored messy date time formatting, deleted columns that didn't matter (at the time), all corner cutting was fine for first story, but not for future analysis
- STRUCTURE your data for any question
- normalize your data
- know the right tool, spend time learning the basics
- don't invent something new
- beware sunk costs that aren't panning out
- visually validate your data
- working for mobile changes your plan from the beginning
- designers are used to working mobile first, but reporters and editors need to think that way too
- stop doing charts that inherently don't work on mobile
- choose design patterns that work on both (single column layouts)
- keep it light
- scrollmaster.js ?
- GIF the dataviz
- make mobile a part of the process
- SVG crowbar from NYT http://nytimes.github.io/svg-crowbar/
- make interactions lazy (scroll v tap)
http://paldhous.github.io/NICAR/2016/infodesign.html
- visualization: encoding data by visual cues
- our brains don't treat all visual cues equally.
- accuracy: length (aligned), length, slope, angle, area, color intensity, volume, color hue
- multiple data sets over time: dotted line chart but no more than 5
- many data sets (58 counties in CA), put counties on Y-axis, encode value with color intensity
- move up and down the hierarchy, but think about it
- color brewer, color oracle
guide: https://github.com/mattwaite/JOUR407-Data-Journalism/blob/master/Examples/FebruaryHeatWave.ipynb
- examples: bingo cards (oscars, debate)
- a lot was initially done in Google Sheets, convert to JSON
- ended up adding a new app to their django-powered graphics-rig with an editing interface
- quicker-turn stuff sticks with Google sheets, ongoing stuff might lean towards django-powered
- tabletop.js
- google form into google-docs as janky user-submitted content
- using Git lab GUI to commit to raw JSON files
- so many people, needed to know who was working on what
- Preview: shows each project, who updated, when, version history, pulls from git server
- can even preview old versions
- command line tool to create new things
preview create $TEMPLATE
preview publish
pushes to CMS- CMS allows for free-form HTML via an API, free-form asset can be embedded or used as its own URL
- USE google docs (not spreadsheets) with a custom markup language (ArchieML) pulled into graphic as JSON by preview command-line-tool
- example of one of these pages: http://www.nytimes.com/interactive/2016/03/02/us/super-tuesday-results-delegates.html
- can preview changes in the preview server automatically
https://github.com/jsfenfen/parsing-prickly-pdfs
- LA precinct election example: https://github.com/jsfenfen/parsing-prickly-pdfs/blob/master/examples/la-precinct-bulletin/la-precinct-bulletin.ipynb
repo: https://github.com/jonkeegan/command-line-graphics slides: https://docs.google.com/presentation/d/1YEP9VJM16foortYfbaLrcwCR8X8V_XA5uEWPLoDjJ9Y/edit#slide=id.p
- command line is worth it when it comes to visuals
- ImageMagick for avengers cover analysis
- ImageMagic = command line photoshop
- resize images
convert -resize 200x *.jpg
- ffmpeg
- node exif package extracts metadata from images
- http://source.opennews.org/en-US/articles/cassini/
https://github.com/cjdd3b/nicar2016
- 2 flavors: Supervised, and unsupervised learning
- consider precision v. recall when evaluating models
- k-fold cross-validaation