Skip to content

Instantly share code, notes, and snippets.

@adewale
Created May 29, 2020 22:47
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save adewale/43e9be1e145b65774d2a05aa40a92607 to your computer and use it in GitHub Desktop.
Save adewale/43e9be1e145b65774d2a05aa40a92607 to your computer and use it in GitHub Desktop.
Converting Twitter archive into a human readable list of epigrams
=Convert javascript data into JSON
sed -e 's/window.YTD.tweet.part0 = //' ./data/tweet.js > ./data/tweet.json
=Convert comma-delimited JSON into newline-delimited JSON
cat data/tweet.json | jq -c '.[]' > newline.json
=Extract all the tweets that match the desired format
grep 'Theory:' newline.json > theories.json
= Extract the text and URL of each matching tweet. This is where we lose data because some tweets contain newlines (verify by comparing the number of tweets in both files) or because some tweets are retweets.
cat theories.json | jq -r '.[].full_text' > theories.txt
cat theories.json | jq -r '.[].id_str, .[].full_text' > theories.txt
cat theories.json | jq -r '.[].id_str, .[].full_text, "\n"'
cat theories.json | jq -r '"https://twitter.com/ade_oshineye/status/\(.[].id_str)", .[].full_text, "\n"' > theories.txt
cat theories.json | jq -r '.[].full_text, "https://twitter.com/ade_oshineye/status/\(.[].id_str)", "\n"' > theories.txt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment