Skip to content

Instantly share code, notes, and snippets.

Last active Apr 28, 2022
What would you like to do?
Twitter Archive to JSON

If you download your personal Twitter archive, you don't quite get the data as JSON, but as a series of .js files, one for each month (there are meant to replicate the Twitter API respones for the front-end part of the downloadable archive.)

But if you want to be able to use the data in those files, which is far richer than the CSV data, for some analysis or app just run this script.

Run sh ./ in the same directory as the /tweets folder that comes with the archive download, and you'll get two files:

  • tweets.json — a JSON list of the objects
  • tweets_dict.json — a JSON dictionary where each Tweet's key is its id_str

You'll also get a /json-tweets directory which has the individual JSON files for each month of tweets.

#!/usr/bin/env bash
mkdir json-tweets
mkdir .tmp-json-tweets
touch .tmp-tweets.json
touch tweets.json
echo "" > tweets.json
echo "" > .tmp-tweets.json
echo "Processing Tweet.js files..."
for f in tweets/*.js; do
tail -n +2 "$f" > json-"${f%.js}".json
echo "Creating tweets.json..."
echo "[ {" >> .tmp-tweets.json
for f in json-tweets/*.json; do
tail -n +2 "$f" | sed '$d' > .tmp-"${f%.js}"
echo "}, {" >> .tmp-"${f%.js}"
cat .tmp-"${f%.js}" >> .tmp-tweets.json
rm .tmp-"${f%.js}"
rmdir .tmp-json-tweets
cat .tmp-tweets.json | sed '$d' > tweets.json
echo "} ]" >> tweets.json
rm .tmp-tweets.json
cat tweets.json | jq '. | map({"key": .id_str | tostring, "value": .}) | from_entries' > tweets_dict.json
echo "DONE"
Copy link

vineethjose commented Feb 14, 2019

Can someone help with this?

Processing Tweet.js files...
tail: tweets/*.js: No such file or directory
Creating tweets.json...
/Users/xxx/Downloads/twitter/ line 27: jq: command not found

Copy link

edsu commented Apr 25, 2019

You will want to install jq

Copy link

thibaultmol commented Jul 4, 2019

Is it just me or does twitter now export in a single .js file instead?

Copy link

thibaultmol commented Jul 4, 2019

nvm, ignore that

Copy link

amandabee commented Nov 21, 2019

It is a giant json these days. You just have to strip off the leading window.YTD.tweet.part0 = to make it valid JSON

Copy link

tsuliwaensis commented May 12, 2020

Running the script just returns a zero-byte file for me.

Copy link

amandabee commented May 12, 2020

@tsuliwaensis You shouldn't need to run it anymore. Your archived tweets are already JSON, and once you edit the file to remove window.YTD.tweet.part0 = it will be valid JSON.

Copy link

hackingbutlegal commented Aug 2, 2020

It is a giant json these days. You just have to strip off the leading window.YTD.tweet.part0 = to make it valid JSON

Thank you!

Copy link

almereyda commented Mar 15, 2021

A batch job for creating json digests from the js archive distribution of the Twitter archive from within the data directory could look like:

rsync -I --backup --suffix='.json' --backup-dir='json' --exclude='manifest.js' ./*.js ./
sed -i -r 's/^window.*\ \=\ (.*)$/\1/' json/*

You can then dig into your data at will:

jq '.[] | .tweet | select(.entities.urls != []) | .entities | .urls | map(.expanded_url) | .[]' tweet.js.json | cut -d'/' -f3 | sed 's/\"//g' | sort | uniq -c | sort -g

Please note this will update the file modification times for the *.js files from the ones provided by the archive to the moment of running the command, due to the -I ignore switch, which makes rsync copy every file over itself.

Adapted from

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment