Skip to content

Instantly share code, notes, and snippets.

@mitchthorson
Last active March 14, 2016 14:46
Show Gist options
  • Save mitchthorson/884289824571135a09cd to your computer and use it in GitHub Desktop.
Save mitchthorson/884289824571135a09cd to your computer and use it in GitHub Desktop.

#NICAR 2016

##Interactive News Spreadsheet

http://depot.thethrust.net/

  • Creating tools to take you 80-90% of the way there, frees you up to create the bespoke content that is hard, rather than the bespoke content that is easy.
  • Making code sharable is hard. Requires and extra layer of work to scrub it. Is it worth it?
  • Open sourcing can force you to be honest with your own development process.
  • data is available in interactive by default…people can cite your work already. why not make it clearly available in a public way.

##New things with old data

  • improve process lessons toll on people

##Data for breaking news

  • consider breaking news/disaster scenarios ahead of time: wildfires, tornadoes, hurricanes, floods, bridge collapses, building collapse, train plane boat crash, active shooters, terror, mining, chemical spills, power outages. Build tools when its quiet.

##Deep Dives

Failure Factories

  • had to overcome conventional wisdom with technology and data
  • buy in was very important
  • required a different approach
  • testing conventional wisdom with data in order to knock it down
  • asking what types of things NORMALLY impact performance, and tested each factor with data
  • beyond traditional data: called other districts, checked promises that were broken
  • database out of kid's stories
  • not one big analysis, tiny analysis after tiny analysis
  • technology helped: used database of kids in an interactive story...

working with big geodata (without messing up)

  • data 4.4 million uber pickup lat/lngs
  • geocode 93 million lat/long pickups counting the number that fell into each NY Census tract.
  • wanted to turn around quickly...failed
  • first mistake: Python. 23 weeks to run. know when your data needs a database.
  • estimate how long your code will take to run by testing small chunks
  • mistake 2: reinvent GIS should have used PostGIS
  • generate shape files in QGIS
  • put them in postgres and assigned each point a census tract
  • mistake 3 projection sloppiness
  • mistake 4 know how your tools work. PGCOPY better than ogr2ogr and passing files back and forth from tool to tool
  • mistake #5 didn't index.
  • mistake #6 too much corner cutting. didn't normalize data formats, ignored messy date time formatting, deleted columns that didn't matter (at the time), all corner cutting was fine for first story, but not for future analysis
  • STRUCTURE your data for any question
  • normalize your data
  • know the right tool, spend time learning the basics
  • don't invent something new
  • beware sunk costs that aren't panning out
  • visually validate your data

Dataviz for all, mobile, accessible

  • working for mobile changes your plan from the beginning
  • designers are used to working mobile first, but reporters and editors need to think that way too
  • stop doing charts that inherently don't work on mobile
  • choose design patterns that work on both (single column layouts)
  • keep it light
  • scrollmaster.js ?
  • GIF the dataviz
  • make mobile a part of the process
  • SVG crowbar from NYT http://nytimes.github.io/svg-crowbar/
  • make interactions lazy (scroll v tap)

Which chart should i use and why? information design for the human brain!

http://paldhous.github.io/NICAR/2016/infodesign.html

  • visualization: encoding data by visual cues
  • our brains don't treat all visual cues equally.
  • accuracy: length (aligned), length, slope, angle, area, color intensity, volume, color hue
  • multiple data sets over time: dotted line chart but no more than 5
  • many data sets (58 counties in CA), put counties on Y-axis, encode value with color intensity
  • move up and down the hierarchy, but think about it
  • color brewer, color oracle

Agate

guide: https://github.com/mattwaite/JOUR407-Data-Journalism/blob/master/Examples/FebruaryHeatWave.ipynb

Admin tools

John Schleuss LA TIMES

  • examples: bingo cards (oscars, debate)
  • a lot was initially done in Google Sheets, convert to JSON
  • ended up adding a new app to their django-powered graphics-rig with an editing interface
  • quicker-turn stuff sticks with Google sheets, ongoing stuff might lean towards django-powered

Lindsey Cook US News and World Report

  • tabletop.js
  • google form into google-docs as janky user-submitted content
  • using Git lab GUI to commit to raw JSON files

Gregor Aisch NYT

  • so many people, needed to know who was working on what
  • Preview: shows each project, who updated, when, version history, pulls from git server
  • can even preview old versions
  • command line tool to create new things preview create $TEMPLATE
  • preview publish pushes to CMS
  • CMS allows for free-form HTML via an API, free-form asset can be embedded or used as its own URL
  • USE google docs (not spreadsheets) with a custom markup language (ArchieML) pulled into graphic as JSON by preview command-line-tool
  • example of one of these pages: http://www.nytimes.com/interactive/2016/03/02/us/super-tuesday-results-delegates.html
  • can preview changes in the preview server automatically

Parsing Prickly PDF's

https://github.com/jsfenfen/parsing-prickly-pdfs

Command Line Graphics

repo: https://github.com/jonkeegan/command-line-graphics slides: https://docs.google.com/presentation/d/1YEP9VJM16foortYfbaLrcwCR8X8V_XA5uEWPLoDjJ9Y/edit#slide=id.p

  • command line is worth it when it comes to visuals
  • ImageMagick for avengers cover analysis
  • ImageMagic = command line photoshop
  • resize images convert -resize 200x *.jpg
  • ffmpeg
  • node exif package extracts metadata from images
  • http://source.opennews.org/en-US/articles/cassini/

Machine Learning basics

https://github.com/cjdd3b/nicar2016

  • 2 flavors: Supervised, and unsupervised learning
  • consider precision v. recall when evaluating models
  • k-fold cross-validaation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment