Created
October 31, 2015 18:45
-
-
Save mxbees/de9b85a3381b3f7f013f to your computer and use it in GitHub Desktop.
extracts data from your twitter archives (the js files, not the csv)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
#this script requires 'jq' to be installed (for CLI json parsing). https://stedolan.github.io/jq/ | |
#this assumes that this script is called within the directory that you unzipped your twitter archive | |
file=data/js/tweets/* | |
#iterate over each file in dir | |
for f in $file | |
do | |
#The .js files are weird bc the first line makes it not-json. | |
#So the 'tail' command grabs everything but that line. | |
#The 'jq' commend uses the -r flag so that the output isn't wrapped in quotes. The next bit is the filter to grab the text from each tweet (you can change this to any of the keys to grab whatever data you desire). | |
#The next bit just dumps into a file, appending rather than overwriting. | |
tail -n+2 $f | jq -r '.[] | .text' >> twitter_text.txt | |
done | |
#Inspiration for this came from here: http://www.drmaciver.com/2013/03/exploring-your-twitter-archive-with-unix/ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment