Using the t and csvkit to quickly collect and analyze #nicar16 tweets from the command-line
The t command-line Twitter tool is a great way to work with Twitter information in a spreadsheet.
Its homepage with good installation instructions is here:
And I've written some related instructions about how to get an authentication token from Twitter:
http://www.compjour.org/tutorials/twitter-app-authentication-process/
Doing a basic query for a term
Once you have it installed and you're authenticated, you can do a basic search for Tweets like this:
$ t search all 'nicar16'
The default behavior is to present the tweets in a human-readable format:
@mailbackwards
Good morning Denver, I'm at #NICAR16. Find me and say hi (and then come to
our talk on Sunday)
@tbtprojx
RT @MarshallProj: And about building your own criminal justice data w
@ultracasual @gabrieldance, @kenandavis + more at 3:30
https://t.co/mTNK1a1Xox #NICAR16
@sickmund
RT @MarshallProj: And about building your own criminal justice data w
@ultracasual @gabrieldance, @kenandavis + more at 3:30
https://t.co/mTNK1a1Xox #NICAR16
@tbtprojx
RT @MarshallProj: #NICAR16: Learn how to keep those news apps skills sharp at
11:30, with @gabrieldancehttp://bit.ly/1nwH4Zd
@rdmurphy
RT @A_L: Want to learn how to work with satellite data? @esagara and I will
be sharing our secrets today at 11:30 #NICAR16
Getting data in CSV format
But you can get them in CSV format using the --csv
flag:
$ t search all 'nicar16' --csv
ID | Posted at | Screen name | Text |
---|---|---|---|
707951040855982080 | 2016-03-10 15:27:35 +0000 | MaiAndy | RT @nkhensley: Saturday. #NICAR16 https://t.co/IBbqmP8KIo |
707950349508739072 | 2016-03-10 15:24:51 +0000 | ashlynstill | RT @Lindzcook: Join @ashlynstill and me in Denver 4 at 9am to learn programming concepts using fun games! Great place to start for newcomers #NICAR16 |
707950090355216384 | 2016-03-10 15:23:49 +0000 | karanormal | It's a beautiful day to live in Denver... Because #NICAR16. |
707949741179428864 | 2016-03-10 15:22:26 +0000 | HBCompass | Starting off #NICAR16 by tilting off a bench just in case everyone didn't know I'm awkward as hell. https://t.co/9HJ1Z6lvFT |
707949606831665153 | 2016-03-10 15:21:53 +0000 | nkhensley | Saturday. #NICAR16 https://t.co/IBbqmP8KIo |
707949340040548352 | 2016-03-10 15:20:50 +0000 | AlexSecanove | RT @biologypartners: Investigative journalists & data miners: welcome to Colorado. There are some exciting data analytics startups here for you to meet. #NICAR16 |
707949060238344193 | 2016-03-10 15:19:43 +0000 | natecarlisle | And @TonySemerad and I just landed at DEN. Next stop: #NICAR16 |
707949028881731585 | 2016-03-10 15:19:36 +0000 | michelleminkoff | Let #nicar16 officially begin -- my uniform is on! It's go time! https://t.co/K2Z2DIfu04 |
707948651151122433 | 2016-03-10 15:18:06 +0000 | ryanngro | My sixth NICAR conf and the first where I fell asleep before midnight on the first night. Losing my touch. #NICAR16 |
707948445131268096 | 2016-03-10 15:17:17 +0000 | 1GKh | RT @FerretScot: If you're interested in investigative journalism it's worth keeping an eye on #NICAR16 as it unfolds |
707948358275444736 | 2016-03-10 15:16:56 +0000 | cjsinner | SUPER excited for my first #NICAR16 |
Getting the max number of tweet results
By default, 20 of the most recent tweets are returned. You can change this by using the -n
flag; I believe the max nunber of results is capped at 3200, or, however many tweets have been posted in the last 7 days with the queried term.
And of course, you most likely want to be piping this directly into a text file that you can open up in Excel or what have you:
$ t search all 'nicar16' --csv -n 3200 > nicar16tweets.csv
Searching more specific streams
The t search
subcommand lets you narrow the query to just your own timeline (t search timeline 'nicar16'
) or even to a specific list. Run t search help
to see the descriptions:
t search all QUERY # Returns the 20 most recent Tweets that match the specified query.
t search favorites [USER] QUERY # Returns Tweets you've favorited that match the specified query.
t search help [COMMAND] # Describe subcommands or one specific subcommand
t search list [USER/]LIST QUERY # Returns Tweets on a list that match the specified query.
t search mentions QUERY # Returns Tweets mentioning you that match the specified query.
t search retweets [USER] QUERY # Returns Tweets you've retweeted that match the specified query.
t search timeline [USER] QUERY # Returns Tweets in your timeline that match the specified query.
t search users QUERY # Returns users that match the specified query.
Try csvkit
This is also a good time to try out csvkit, rather than using a spreadsheet.
Use csvcut
with the -n
flag to see the headers:
$ csvcut -n nicar16tweets.csv
1: ID
2: Posted at
3: Screen name
4: Text
Here's how to get the most frequent users (by screen name) of the hashtag in the set of tweets you've downloaded:
$ csvcut -c 'Screen name' nicar16tweets.csv | sort | uniq -c | sort -rn
82 BizJournalism
20 MacDiva
19 ultracasual
18 Jeremy_CF_Lin
17 IRE_NICAR
15 tbtprojx
15 RajneeshB
14 palewire
13 brentajones
13 KateReports
13 DanielleAlberti
12 seecmb
12 benlkeith
12 KarrieKehoe
12 HacksHackersCO
11 livlab
11 dougfisher
10 wjchat
10 harrisj
9 onyxfish
A note about using Excel
If you need yet another example of why you should stay away from Excel (and any other spreadsheet, but mostly Excel on OS X) until you absolutely need a spreadsheet, you will get this inexplicable error when opening up the csv file provided by t if you're on OS X:
The reason? Because when the first letters in a file are ID
, this causes Excel to shit itself. It's hard to imagine the logic that went into that decision to hardcode ID
as a magic word: https://support.microsoft.com/en-us/kb/215591