Skip to content

Instantly share code, notes, and snippets.

@joelnitta
Created November 18, 2022 03:03
Show Gist options
  • Save joelnitta/48a8c3b80fafc0e06dde9c39e5841915 to your computer and use it in GitHub Desktop.
Save joelnitta/48a8c3b80fafc0e06dde9c39e5841915 to your computer and use it in GitHub Desktop.
Archive a user's tweets
library(rtweet)
library(tidyverse)
# Initial authorization setup, only need to do once
# auth_setup_default() #nolint
# Authorize
auth_as("default")
# Set user name
user_name <- "PUT USER NAME HERE"
# Download all tweets from user (as many as possible anyways)
# The docs say it should work for up to 3200 tweets
# https://docs.ropensci.org/rtweet/reference/get_timeline.html
my_tweets <- get_timeline(
user = user_name,
n = Inf,
retryonratelimit = TRUE)
# And save them forever!
saveRDS(my_tweets, "my_tweets.RDS")
# Extract URLs of media (images etc)
# We can map back to the tweet the image came from by `tweet_id`, which
# matches `id` of my_tweets
media_url <-
my_tweets %>%
rename(tweet_id = id) %>%
mutate(media = map(entities, "media")) %>%
select(tweet_id, media) %>%
unnest(media) %>%
filter(!is.na(id)) %>%
mutate(
filename = str_match(media_url, "\\/([^\\/]*)$") |>
magrittr::extract(, 2)
)
# Download media to "media" folder
walk2(
media_url$media_url,
media_url$filename,
~download.file(.x, glue::glue("media/{.y}")))
@darachm
Copy link

darachm commented Nov 18, 2022

Hey works for me! Good job

Is there a way to allow non-R users to use it? Something so simple that somone could download R and run it with Rscript on Macs and ???? on windowz? I don't know how

FYI y'all new users of rtweet will need to run auth_as_default() once to login. rtweet also wants httpuv so of course to set it up you'd need to install that (rtweet prompts but if you're in a script well)

@joelnitta
Copy link
Author

@darachm glad it worked for you!

It should be possible to wrap this into a function that could be run from the command line - basically, the only input you need is the user name. But as you note they would still need to authorize using rtweet.

However, there's another reason the current script wouldn't work well for non-useRs: the data are nested, so you can't write them out to a flat CSV (they originally come from the twitter API as JSON). I save them as an RDS file, which you can only really work with in R. So if you wanted to make a "general purpose" script it might be better just to save the JSON.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment