Skip to content

Instantly share code, notes, and snippets.

@jmikola
Last active February 3, 2023 18:58
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jmikola/e3df7655e9e572b9ce89aaa5a088d908 to your computer and use it in GitHub Desktop.
Save jmikola/e3df7655e9e572b9ce89aaa5a088d908 to your computer and use it in GitHub Desktop.
Using gallery-dl to backup a Twitter account

Using gallery-dl to backup a Twitter account

The following was adapted from:

Installation and configuration

First, I installed gallery-dl using one of the methods suggested in its README file. I then created the following configuration file in $HOME/.config/gallery-dl/config.json based advice from the aforementioned Reddit thread and the project's config docs:

{
  "extractor": {
    "twitter": {
      "text-tweets": true,
      "conversations": true,
      "expand": true,
      "logout": true,
      "pinned": true,
      "quoted": true,
      "replies": true,
      "retweets": true,
      "postprocessors": [
        { "name": "metadata", "event": "post", "filename": "{tweet_id}_main.json" }
      ],
      "cookies": {
        "_twitter_sess": "<REDACTED>",
        "ct0": "<REDACTED>",
        "lang": "en"
      }
    }
  }
}

Manually copying in cookies from the browser's web inspector tool seemed preferable to installing an extension to dump cookies to a cookies.txt file. It wasn't clear which cookies were required, but the ones above worked for me. _twitter_sess definitely sounds relevant and the config docs reference ct0 for generating CSRF tokens.

Note: the extractor.twitter.expand option is potentially very expensive. You may want to disable that option if you find yourself hitting rate limits (e.g. "[twitter][info] Waiting until HH:MM:SS for rate limit reset.").

Backing up an account

This backup.sh script can be used to dump tweets from a few URLs (as suggested in the thread). It takes a username as its first and only parameter.

#!/bin/bash

gallery-dl https://twitter.com/${1}/tweets --write-metadata
gallery-dl https://twitter.com/${1}/media --write-metadata
gallery-dl https://witter.com/${1}/with_replies --write-metadata
gallery-dl https://twitter.com/search?q=from:${1} --write-metadata

This approach worked for me, but the extractor.twitter.timeline.strategy option may be worth reading if you would prefer invoking gallery-dl once on a profile URL (e.g. https://www.twitter.com/USERNAME).

If you're worried about missing tweets, I found it helpful to run the script a second time and and call du -s ./gallery-dl/twitter/$USERNAME to check the size of the output directory before and after the second execution. Assuming the account doesn't tweet anything new, the du result should remain constant between executions.

Backing up a list of followed users

Additionally, this backup-follows.sh script can be used to dump followed accounts. This was primarily useful for backing up the list of accounts I follow.

#!/bin/bash

gallery-dl https://twitter.com/${1}/following --dump-json > ${1}_following.json

You can then use a tool like jq to parse that an extract a list of usernames:

cat <username>_following.json | jq ".[][2].legacy.screen_name"

Related issues

These are a few issues I came across in the gallery-dl project that seemed relevant:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment