Sung to the tune of 'Take It Easy' by The Eagles
Well, I'm in debug mode
tryin' to rewrite my code
I've got algoritihms on my mind
Five that Chase told me,
Two that someone showed me,
But K-means is a friend of mine
"id","tweet_hashtag","tweet_date_time","tweet_avatar","tweet_user_name","tweet_screen_name","tweet_text","tweet_id","tweet_user_location","session_date","session_slug" | |
10304,"#wjchat","2014-01-30 00:32:06+00","http://pbs.twimg.com/profile_images/1830163265/wjchat-twitter-icon_normal.png","wjchat","wjchat","Coming up in half an hour: Tonight on #wjchat, we're clearing the air -- what does it mean to be a ""digital"" journalist?","428687027458953216","Everywhere you are","2014-01-30 04:03:57.791376+00","1-29-2014" | |
10303,"#wjchat","2014-01-30 00:33:29+00","http://pbs.twimg.com/profile_images/378800000748762455/a475050acb1966ce2eb8562f056c09fd_normal.jpeg","Rachel C Stella","rachelcstella","RT @wjchat: Coming up in half an hour: Tonight on #wjchat, we're clearing the air -- what does it mean to be a ""digital"" journalist?","428687376551272448","La Salle, Ill.","2014-01-30 04:03:57.791376+00","1-29-2014" | |
10302,"#wjchat","2014-01-30 00:37:12+00","http://pbs.twimg.com/profile_images/378800000319335614/cdc1c0ce87570 |
Sung to the tune of 'Take It Easy' by The Eagles
Well, I'm in debug mode
tryin' to rewrite my code
I've got algoritihms on my mind
Five that Chase told me,
Two that someone showed me,
But K-means is a friend of mine
Moved the csv file - about 4.5 mb - to github as it was too big for a gist. Find it here.
<!DOCTYPE html> | |
<html> | |
<head> | |
<title>Search the Congressional Record using Capitol Words API</title> | |
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> | |
<meta name="viewport" id="meta-viewport" content="width=device-width,minimum-scale=1,maximum-scale=1"> | |
<meta name="apple-mobile-web-app-capable" content="yes" /> | |
<meta name="apple-mobile-web-app-status-bar-style" content="black" /> | |
<link rel="apple-touch-icon" href="http://media.scpr.org/assets/images/icon.png"> | |
<meta name="format-detection" content="telephone=no"/> |
year | overall | big_ten | big_ten_finish | big_ten_tournament | ncaa_tournament | date | round | opponent | opponent_seed | opponent_half | opponent_final | badgers_seed | badgers_half | badgers_final | badgers_result | url | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2014 | 26-7 | 12-6 | second | 1-1 | 3/20/2014 | First Round | American | 15 | 2 | ||||||||
2013 | 23-12 | 12-6 | 2-1 | 0-1 | 3/22/2013 | First Round | Ole Miss | 12 | 22 | 57 | 5 | 25 | 46 | Loss | http://espn.go.com/ncb/boxscore?gameId=330810275 | ||
2012 | 26-10 | 12-6 | 1-1 | 2-1 | 3/22/2012 | Sweet Sixteen | Syracuse | 1 | 33 | 64 | 4 | 27 | 63 | Loss | http://espn.go.com/ncb/boxscore?gameId=320820183 | ||
2012 | 26-10 | 12-6 | 1-1 | 2-1 | 3/17/2012 | Second Round | Vanderbilt | 5 | 31 | 57 | 4 | 32 | 50 | Win | http://espn.go.com/ncb/boxscore?gameId=320770275 | ||
2012 | 26-10 | 12-6 | 1-1 | 2-1 | 3/15/2012 | First Round | Montana | 13 | 29 | 49 | 4 | 39 | 73 | Win | http://espn.go.com/ncb/boxscore?gameId=320750275 | ||
2011 | 25-9 | 13-5 | 0-1 | 2-1 | 3/24/2011 | Sweet Sixteen | Butler | 8 | 33 | 61 | 4 | 24 | 54 | Loss | http://espn.go.com/ncb/boxscore?gameId=310830275 | ||
2011 | 25-9 | 13-5 | 0-1 | 2-1 | 3/19/2011 | Second Round | Kansas State | 5 | 30 | 65 | 4 | 36 | 70 | Win | http://espn.go.com/ncb/boxscore?gameId=310780275 |
from __future__ import with_statement | |
import os | |
import time, datetime | |
from fabric.operations import prompt | |
from fabric.api import * | |
from fabric.contrib.console import confirm | |
from fabric.colors import green | |
env.hosts = ['{{YOUR WEBFACTION PATH}}'] |
Code allows us to make all kinds of visuals and tools that display data for analysis.
But when you're starting to mix code, data and journalism - and you lack a deep statistics background to draw upon - everything looks like a nail that you can whack with your shiny hammer. And everything - scatterplots to nearest neighbor to regression - seems important.
So how do you move from citing only the average, median & percent change in all of your work and begin to build skills and knowledge that can lead to a deeper analysis of datasets?
I propose a discussion that helps beginning data journalists/news apps developers better understand which analytical and statistical methods are best suited to different data situations.
For example:
# Uses wrapper found here: https://github.com/datadesk/python-googlegeocoder | |
# pip install python-googlegeocoder | |
from googlegeocoder import GoogleGeocoder | |
import time | |
import logging | |
logging.basicConfig(format='\033[1;36m%(levelname)s:\033[0;37m %(message)s', level=logging.DEBUG) | |
def create_list_addresses_from_file(): |
getCheckboxIds: function(event){ | |
var activeCheckboxes = []; | |
if (!$("input:checkbox").is(":checked")) { | |
console.log("box is not checked"); | |
} | |
$("input:checkbox").each(function(){ | |
var $this = $(this); | |
if($this.is(":checked")){ | |
var categoryId = $this.attr("id"); | |
if (categoryId != undefined){ |
# Uses wrapper found here: https://github.com/datadesk/python-googlegeocoder | |
# pip install python-googlegeocoder | |
import csv | |
import logging | |
import time | |
import datetime | |
from googlegeocoder import GoogleGeocoder | |
logger = logging.getLogger("root") |