Skip to content

Instantly share code, notes, and snippets.

@luciovilla
Forked from dannguyen/t-nicar16-cli.md
Last active June 3, 2016 15:11
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save luciovilla/20d970576a4f33a8a7a9c84362544e61 to your computer and use it in GitHub Desktop.
Save luciovilla/20d970576a4f33a8a7a9c84362544e61 to your computer and use it in GitHub Desktop.
Using the command-line tools t and csvkit to track the #Migrahack hashtag

Using the t and csvkit to quickly collect and analyze #migrahack tweets from the command-line

The t command-line Twitter tool is a great way to work with Twitter information in a spreadsheet.

Its homepage with good installation instructions is here:

https://github.com/sferik/t

And I've written some related instructions about how to get an authentication token from Twitter:

http://www.compjour.org/tutorials/twitter-app-authentication-process/

Doing a basic query for a term

Once you have it installed and you're authenticated, you can do a basic search for Tweets like this:

$ t search all 'migrahack'

The default behavior is to present the tweets in a human-readable format:

   @JeronimoSaldana
   Anyone else at JFK's Terminal 5 with 2 hours to kill before heading to Chicago for #MigraHack?
   Pues, let's hang out! :D

   @MSFTChicago
   Tomorrow — uncover the trends and stories hidden in #data at @Migrahack:
   https://t.co/ROh6PcUPr4

   @fairtrade_Ervin
   RT @Lyndab08: Excited for @Migrahack 2016 this weekend. #migrahack https://t.co/8FDNnJBUeh

   @fairtrade_Ervin
   RT @JeronimoSaldana: So excited to attend @Migrahack #Migrahack in Chicago this weekend!
   #Not1More https://t.co/AzYZcB3OaR

   @fairtrade_Ervin
   RT @Not1_More: Roll-Call! Who’s coming to #Migrahack in Chicago this weekend?

   @ijjnews
   RT @Don_Rubiel: Ready for #migrahack https://t.co/Upwe1UeAMR

   @marisa_franco
   RT @JeronimoSaldana: So excited to attend @Migrahack #Migrahack in Chicago this weekend!
   #Not1More https://t.co/AzYZcB3OaR

   @Laura_GlobalEd
   @JeronimoSaldana @Migrahack Looks like an awesome idea!

Getting data in CSV format

But you can get them in CSV format using the --csv flag:

$ t search all 'migrahack' --csv
ID Posted at Screen name Text
707951040855982080 2016-03-10 15:27:35 +0000 MaiAndy RT @nkhensley: Saturday. #NICAR16 https://t.co/IBbqmP8KIo
707950349508739072 2016-03-10 15:24:51 +0000 ashlynstill RT @Lindzcook: Join @ashlynstill and me in Denver 4 at 9am to learn programming concepts using fun games! Great place to start for newcomers #NICAR16
707950090355216384 2016-03-10 15:23:49 +0000 karanormal It's a beautiful day to live in Denver... Because #NICAR16.
707949741179428864 2016-03-10 15:22:26 +0000 HBCompass Starting off #NICAR16 by tilting off a bench just in case everyone didn't know I'm awkward as hell. https://t.co/9HJ1Z6lvFT
707949606831665153 2016-03-10 15:21:53 +0000 nkhensley Saturday. #NICAR16 https://t.co/IBbqmP8KIo
707949340040548352 2016-03-10 15:20:50 +0000 AlexSecanove RT @biologypartners: Investigative journalists & data miners: welcome to Colorado. There are some exciting data analytics startups here for you to meet. #NICAR16
707949060238344193 2016-03-10 15:19:43 +0000 natecarlisle And @TonySemerad and I just landed at DEN. Next stop: #NICAR16
707949028881731585 2016-03-10 15:19:36 +0000 michelleminkoff Let #nicar16 officially begin -- my uniform is on! It's go time! https://t.co/K2Z2DIfu04
707948651151122433 2016-03-10 15:18:06 +0000 ryanngro My sixth NICAR conf and the first where I fell asleep before midnight on the first night. Losing my touch. #NICAR16
707948445131268096 2016-03-10 15:17:17 +0000 1GKh RT @FerretScot: If you're interested in investigative journalism it's worth keeping an eye on #NICAR16 as it unfolds
707948358275444736 2016-03-10 15:16:56 +0000 cjsinner SUPER excited for my first #NICAR16 😁😁😁

Getting the max number of tweet results

By default, 20 of the most recent tweets are returned. You can change this by using the -n flag; I believe the max nunber of results is capped at 3200, or, however many tweets have been posted in the last 7 days with the queried term.

And of course, you most likely want to be piping this directly into a text file that you can open up in Excel or what have you:

$ t search all 'migrahack' --csv -n 3200 > migrahack16tweets.csv

Searching more specific streams

The t search subcommand lets you narrow the query to just your own timeline (t search timeline 'migrahack') or even to a specific list. Run t search help to see the descriptions:

  t search all QUERY               # Returns the 20 most recent Tweets that match the specified query.
  t search favorites [USER] QUERY  # Returns Tweets you've favorited that match the specified query.
  t search help [COMMAND]          # Describe subcommands or one specific subcommand
  t search list [USER/]LIST QUERY  # Returns Tweets on a list that match the specified query.
  t search mentions QUERY          # Returns Tweets mentioning you that match the specified query.
  t search retweets [USER] QUERY   # Returns Tweets you've retweeted that match the specified query.
  t search timeline [USER] QUERY   # Returns Tweets in your timeline that match the specified query.
  t search users QUERY             # Returns users that match the specified query.

Try csvkit

This is also a good time to try out csvkit, rather than using a spreadsheet.

Use csvcut with the -n flag to see the headers:

$ csvcut -n migrahack16tweets.csv
  1: ID
  2: Posted at
  3: Screen name
  4: Text

Here's how to get the most frequent users (by screen name) of the hashtag in the set of tweets you've downloaded:

$ csvcut -c 'Screen name' migrahack16tweets.csv | sort | uniq -c | sort -rn
  82 BizJournalism
  20 MacDiva
  19 ultracasual
  18 Jeremy_CF_Lin
  17 IRE_NICAR
  15 tbtprojx
  15 RajneeshB
  14 palewire
  13 brentajones
  13 KateReports
  13 DanielleAlberti
  12 seecmb
  12 benlkeith
  12 KarrieKehoe
  12 HacksHackersCO
  11 livlab
  11 dougfisher
  10 wjchat
  10 harrisj
   9 onyxfish

A note about using Excel

If you need yet another example of why you should stay away from Excel (and any other spreadsheet, but mostly Excel on OS X) until you absolutely need a spreadsheet, you will get this inexplicable error when opening up the csv file provided by t if you're on OS X:

image

The reason? Because when the first letters in a file are ID, this causes Excel to shit itself. It's hard to imagine the logic that went into that decision to hardcode ID as a magic word: https://support.microsoft.com/en-us/kb/215591

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment