Skip to content

Instantly share code, notes, and snippets.

@andrew-d
Created November 18, 2022 02:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save andrew-d/01edf125ddb35796e6b4b66c9144a566 to your computer and use it in GitHub Desktop.
Save andrew-d/01edf125ddb35796e6b4b66c9144a566 to your computer and use it in GitHub Desktop.
Extremely hacky set of scripts/instructions to fetch all user profiles from a Twitter archive
#!/bin/bash
set -euo pipefail
main() {
local url
url="$(jq -rc .response.data.url < "$1")"
if [[ -z "$url" ]]; then
echo "EMPTY: $1"
return
fi
if [[ "$url" != "https://t.co/"* ]] && [[ "$url" != "http://t.co/"* ]]; then
echo "SKIP: $1"
return
fi
local resolved
resolved="$(curl -fLs -o /dev/null -w '%{url_effective}' "$url")"
jq -c --arg url "$resolved" '.response.data.url |= $url' < "$1" > "$1.new"
mv "$1.new" "$1"
echo "RESOLVED: $1"
}
main "$@"

⚠️ This is pretty hacky ⚠️

Get unique pairs of DM IDs from your Twitter archive:

jq -r '.[] | .dmConversation.messages[] | [.messageCreate.recipientId, .messageCreate.senderId] | @csv' < direct-messages.json > dm-pairs.csv

Get all DM users:

{ xsv select -n 1 < dm-pairs.csv ; xsv select -n 2 < dm-pairs.csv; } | sort -u > dm-users.csv

Go to https://developer.twitter.com/en/docs/twitter-api/users/lookup/api-reference/get-users-id, click "Try a live request", then authorize oauth-playground.glitch.me to your Twitter account. Return to the page, open your browser's developer tools, and get the token cookie and paste it into run.sh, below. Then run:

mkdir users
./run.sh dm-users.csv

Finally, run resolve.sh (parallel invocation below) to resolve profile URLs (shortened with t.co) to real ones:

parallel -j 8 ./resolve.sh ::: users/*.*
#!/bin/bash
set -euo pipefail
token='redacted'
join_by() {
local d=${1-} f=${2-}
if shift 2; then
printf %s "$f" "${@/#/$d}"
fi
}
main() {
local fields=(
"id"
"name"
"username"
"description"
"location"
"url"
"created_at"
"protected"
)
local fieldsjson="[\"$(join_by '","' "${fields[@]}")\"]"
local total
total="$(wc -l < "$1" | tr -d "[:space:]")"
local req
local i=0
while read -r u; do
i=$((i + 1))
if [[ -s "users/$u.json" ]]; then
echo "($i / $total) SKIP: $u"
continue
fi
echo -n "($i / $total) FETCH: $u"
req="$(jq \
--arg id "$u" \
--argjson fields "$fieldsjson" \
-cn \
'{url: ("https://api.twitter.com/2/users/" + $id + "?user.fields=" + ($fields | join("%2C"))), method:"GET"}')"
if ! curl -fsSL \
'https://oauth-playground.glitch.me/request' \
-H 'authority: oauth-playground.glitch.me' \
-H 'accept: application/json' \
-H 'accept-language: en-US,en;q=0.9' \
-H 'content-type: application/json' \
-H "cookie: token=$token" \
-H 'origin: https://oauth-playground.glitch.me' \
-H 'referer: https://oauth-playground.glitch.me/?id=findUserById&params=%28%27query%21%28%29%7Ebody%21%28%29%7Epath%21%28%21id%21%272244994945%27%29%7Eid%21%271000714279358644224%27%7Euser.fields%21%27id%2C*user*description%2Clocation%27%29*name%2C%01*_' \
-H 'sec-ch-ua: "Google Chrome";v="107", "Chromium";v="107", "Not=A?Brand";v="24"' \
-H 'sec-ch-ua-mobile: ?0' \
-H 'sec-ch-ua-platform: "macOS"' \
-H 'sec-fetch-dest: empty' \
-H 'sec-fetch-mode: cors' \
-H 'sec-fetch-site: same-origin' \
-H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36' \
--output "users/$u.json" \
--no-clobber \
--data-raw "$req" ; then
echo " FAIL"
rm -f "users/$u.json"
else
echo " OK"
fi
sleep 1.1
done < "$1"
}
main "$@"
@andrew-d
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment