Skip to content

Instantly share code, notes, and snippets.

@mxbees
Created October 31, 2015 18:45
Show Gist options
  • Save mxbees/de9b85a3381b3f7f013f to your computer and use it in GitHub Desktop.
Save mxbees/de9b85a3381b3f7f013f to your computer and use it in GitHub Desktop.
extracts data from your twitter archives (the js files, not the csv)
#!/bin/bash
#this script requires 'jq' to be installed (for CLI json parsing). https://stedolan.github.io/jq/
#this assumes that this script is called within the directory that you unzipped your twitter archive
file=data/js/tweets/*
#iterate over each file in dir
for f in $file
do
#The .js files are weird bc the first line makes it not-json.
#So the 'tail' command grabs everything but that line.
#The 'jq' commend uses the -r flag so that the output isn't wrapped in quotes. The next bit is the filter to grab the text from each tweet (you can change this to any of the keys to grab whatever data you desire).
#The next bit just dumps into a file, appending rather than overwriting.
tail -n+2 $f | jq -r '.[] | .text' >> twitter_text.txt
done
#Inspiration for this came from here: http://www.drmaciver.com/2013/03/exploring-your-twitter-archive-with-unix/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment