Skip to content

Instantly share code, notes, and snippets.

@Rican7
Created October 29, 2020 01:58
Show Gist options
  • Save Rican7/f4cd641972ef3670656f82104fdb269c to your computer and use it in GitHub Desktop.
Save Rican7/f4cd641972ef3670656f82104fdb269c to your computer and use it in GitHub Desktop.
Download multi-file items from the Internet Archive (archive.org)

Internet Archive (archive.org) multi-file item downloading

Finding an item to download

  1. Find an item that you'd like to download (for example: https://archive.org/details/SuperNintendoUSACollectionByGhostware)
  2. Grab its "identifier" (you can see it in the HTML of the page, but otherwise just use the last part of the URL for a "details" page)

Downloading the files

At this point, you could just download the files by clicking on them on the HTML page, but that's slow and concurrent-download-limited... Sooo...

  1. Go to https://github.com/jjjake/internetarchive
  2. Follow the installation instructions

Download all files from the "item"

Using the identifier found in the first step, replace item_identifier_goes_here:

ia download item_identifier_goes_here

(Optionally, use the -C argument/option to allow for more easily resuming downloads later)

Download specific files from the "item"

Using the identifier found in the first step, replace item_identifier_goes_here:

ia download item_identifier_goes_here file-name-here other-file-name-here

Download a specific list of files from the "item"

Using the identifier found in the first step, replace item_identifier_goes_here:

  1. List the files in the item and write them to a file
    ia list item_identifier_goes_here > file-list.txt
  2. Modify the list to your liking
    vim file-list.txt
  3. Download the files based on the list
    cat file-list.txt | parallel 'ia download item_identifier_goes_here '{}' -C'
  4. ???
  5. Profit
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment