Skip to content

Instantly share code, notes, and snippets.

@cjdd3b
cjdd3b / make_mn_precincts.sh
Created July 19, 2018 16:44
Make a Minnesota precinct map with 2014 turnout data
wget ftp://ftp.gisdata.mn.gov/pub/gdrs/data/pub/us_mn_state_sos/bdry_votingdistricts/shp_bdry_votingdistricts.zip && \
unzip shp_bdry_votingdistricts.zip && \
shp2json bdry_votingdistricts.shp | \
ndjson-join --left 'd.properties.COUNTYCODE + d.properties.PCTCODE' 'd.county_id + d.precinct_num' <(ndjson-split 'd.features') <(csv2json -n precincts_2014.txt -r ";") | \
ndjson-map 'Object.assign(d[0].properties, d[1]), d[0]' | \
ndjson-reduce 'p.features.push(d), p' '{type: "FeatureCollection", features: []}' | \
geoproject 'd3.geoIdentity().reflectY(true).fitSize([960, 960], d)' | \
geo2topo precincts=- | \
toposimplify -f -p 0.05 | \
topoquantize 1e5 > ./mn-precincts-albers-d3.json && \
@cjdd3b
cjdd3b / hp-data-sample.json
Created June 8, 2018 19:39
Sample Strib homepage tracker JSON.
[
{
"html_raw":"<div class=\"tease is-lead \">\n<h3><a class=\"tease-headline\" data-content-id=\"484959641\" data-linkname=\"Accused football players sue U over sexual misconduct case\" data-linktype=\"headline\" data-modulename=\"homepage left\" data-moduletype=\"zone1-well-left\" data-position=\"0-1-lead\" href=\"http://www.startribune.com/accused-football-players-sue-university-of-minnesota-over-sexual-misconduct-case/484959641/\">Accused football players sue U over sexual misconduct case</a></h3>\n<div class=\"tease-timestamp js-timestamp \" data-st-timestamp=\"2018-06-08T18:21:57.000Z\">\n\n 1:21pm\n </div>\n<div class=\"tease-summary \">The lawsuit against the University of Minnesota seeks unspecified damages for \"being falsely cast as sex offenders.\"</div>\n<div class=\"tease-related\">\n<ul class=\"tease-list\">\n<li class=\"tease-list-item related-icn-article\">\n<a class=\"tease-list-item-link\" data-linkname=\"Report: U followed rules in football suspensions, cites 'break
  • Pablo J. Boczkowski is a professor in the School of Communication at Northwestern University.
  • Umbreen Bhatti is the director of the KQED Lab, the northern California public media organization’s innovation lab.
  • Yvonne Leow is president of the Asian American Journalists Association.
  • Jennifer Coogan is chief content officer of Newsela.
  • Nikki Usher is an assistant professor at the George Washington University’s School of Media and Public Affairs.
  • Hossein Derakhshan is a journalist and analyst, and coauthor of Information Disorder: Toward an interdisciplinary framework for research and policy making.
  • Millie Tran is global growth editor at The New York Times. Stine Bauer Dahlberg is managing director, brand at The New York Times.
  • Raju Narisetti is CEO of Gizmodo Media Group.
  • Lam Thuy Vo is a data reporter at BuzzFeed News.
  • Amie Ferris-Rotman is Foreign Policy’s Moscow correspondent and founder of Sahar Speaks.
@cjdd3b
cjdd3b / strib-suicides.txt
Created December 11, 2017 22:27
Data from the first chart on this interactive about Minnesota suicides: http://www.startribune.com/suicide-rate-in-minnesota-has-been-rising/440778623/
Year Suicides
1981-01-01 442
1982-01-01 470
1983-01-01 444
1984-01-01 443
1985-01-01 459
1986-01-01 541
1987-01-01 546
1988-01-01 488
1989-01-01 515
@cjdd3b
cjdd3b / data-journalism-software.md
Last active August 31, 2016 11:52
Software installation guide for Mizzou's Advanced Data Journalism course, Fall 2016.

Advanced Data Journalism (J4432) software requirements

Below is a list of the key software you'll need for class, along with some resources offering tips about how to get it installed.

Text editor

A good programming text editor will help you organize your code, catch typos and generally make your life a lot easier. We recommend Sublime Text 2, which you can easily download and install from their website.

Terminal client

@cjdd3b
cjdd3b / s3count.md
Last active June 18, 2020 18:31
How to count files in an S3 bucket

Counting files in S3 buckets and folders is harder than it should be. But here's a way to get it done using s3cmd:

  1. Install S3cmd
  • On Mac, brew install s3cmd
  • On Windows, go here
  1. From the command line, run s3cmd --configure

  2. Add your credentials when prompted.

@cjdd3b
cjdd3b / virtualenv.txt
Last active April 26, 2016 17:00
Virtual environment configuration instrux
sudo pip install virtualenvwrapper
export WORKON_HOME=~/Envs
mkdir -p $WORKON_HOME
source /usr/local/bin/virtualenvwrapper.sh
echo 'export WORKON_HOME=$HOME/Envs; source /usr/local/bin/virtualenvwrapper.sh' >> ~/.bash_profile
mkvirtualenv dataj
pip install jupyter
pip install agate
pip install WHATEVER_ELSE
@cjdd3b
cjdd3b / scraping_solution.py
Created April 13, 2016 15:50
Solution to scraping assignment
import csv, mechanize
from bs4 import BeautifulSoup
# Get the output file ready
# datafile = open('output.csv', 'w')
# writer = csv.writer(datafile)
br = mechanize.Browser()
br.open('http://enr.sos.mo.gov/EnrNet/CountyResults.aspx')
@cjdd3b
cjdd3b / cluster.py
Last active July 27, 2023 08:16
Example of perceptual hashing for near-duplicate image detection
'''
cluster.py
Uses the Hamming distance between perceptual hashes to surface near-duplicate
images.
To install and run:
1. pip install imagehash
2. Put some .dat files in a folder someplace (script assumes ./data/imgs/*.dat)
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 6,
"successful": 6,
"failed": 0
},
"hits": {
"total": 1,