Last active Jun 27, 2022
#youtubearchive has a bot that provides some control over a YouTube archiving setup. Videos are downloaded with a patched yt-dlp and saved to cloud storage.

In 2018, deleted videos were uploaded to IA into now-deindexed youtube_CHANNELID items. Uploads were suspended later in 2018 for a big software rewrite. In 2022, IA informed us that it longer wants the deleted YouTube videos collected by this project.

We do not yet have a nice way to browse the archived videos, but they can be manually retrieved.

If you have rack space, bandwidth, servers, or hard drives available for storing deleted videos, please let us know in the channel. This is somewhat urgent as the cloud storage may go away at an unknown time.

Using the IRC bot

You do not necessarily need access to the IRC bot. If you have a good list of channels to archive, please link to the list in the IRC channel. If something is urgent, ping benjins, JAA, ivan, or other active users with voice or ops.

  • !help shows help
  • !password PMs you the username and password for accessing the dashboard, logs, and stats
  • !status shows how many tasks and task-starting scripts are running
  • [+v/+o req.] !a <channel, user, playlist, or /watch?v= URL> [rationale] start a task to grab videos not yet grabbed
    • Note: no notification on completion, use !s or run again to make sure everything was grabbed
    • Include a rationale for channels that don't obviously meet the inclusion criteria
  • !s <channel, user, playlist, /watch?v= URL, or folder> shows how many videos are saved in the main folder for the corresponding channel or playlist; for /watch?v= URLs, the number of captures saved for the video
    • Beware the latest N videos shown here are confusingly reverse-sorted by upload time, not publish time
  • [+v/+o req.] !sa <channel, user, playlist, or /watch?v= URL> equivalent to !s followed by !a
  • [+v/+o req.] !abort TASK_NAME aborts a task
    • Use the task name from the dashboard or the -> TASK_NAME message, not a URL.
  • [+v/+o req.] !delist FOLDER removes a folder from the lists used by the task-starting scripts

If a video is already downloaded, it will be not be downloaded again. Only the first-downloaded version of the video and metadata is preserved.

Livestream videos that are currently streaming are always skipped.

Dealing with problems

A task may crash without notice after e.g. database downtime. Just !a again if this happens.

What to archive

If a channel has just released a video, please wait until a high-resolution format is available.

We have enough channels of people playing popular games.

Please don't archive channels with tons of autogenerated content, or channels with pirated movies (unless very high view counts). You will be banned for archiving Webdriver Torso.

We can only get about 1% of everything uploaded to YouTube. The goal is not to "save YouTube" but to maximize the value of the subset that we can collect.

If the majority of a channel is not worth archiving, please use a Copy Links extension and archive just the popular or otherwise desirable videos. These can be submitted to #youtubearchive-spam to avoid flooding the main channel.

Please archive:

  • Official news channels (all)
  • Official government channels (all)
  • Official corporate channels (all with useful content)
  • Official music artist channels (all not terrible)
  • "Featured channels" listed by official channels
  • Channels by politicians or notable leaders
  • Channels by political parties or interest groups
  • Newsworthy/popular but at-risk content that might be removed by uploader or YouTube
  • News dumps and clippings (if non-trivial view counts)
  • TV commercial dumps (almost all)
  • Non-English content
  • Talks, lectures
  • Information that is only available in video form
  • High-quality speedruns, playthroughs with interesting commentary, reference playthroughs
  • Music (rare or interesting or popular)
  • Very popular channels of any sort
  • Other projects approved by a channel operator

  • someone with 1.4PB (or some large fraction thereof) to store the videos that have disappeared from YouTube
  • an institutional sponsor
  • help with watching the dashboard for things that should not be archived
  • frontend programming in Svelte?
