Skip to content

Instantly share code, notes, and snippets.

@PatrickLang
Last active July 30, 2018 16:38
Show Gist options
  • Save PatrickLang/6f54171a0acb94a9d4aefe450eb6e039 to your computer and use it in GitHub Desktop.
Save PatrickLang/6f54171a0acb94a9d4aefe450eb6e039 to your computer and use it in GitHub Desktop.
PowerShell script to scrape Apple 2 "4am" collection
$feedUrl = "https://archive.org/services/collection-rss.php?collection=apple_ii_library_4am"
# Issue - this only seems to get latest 50 entries
$feedContents = ([xml](Invoke-WebRequest -UseBasicParsing -Uri $feedUrl).Content).rss.channel.item
Write-Host Found $feedContents.Count items
$r = [regex]"([^/]+)/?$" # Thanks https://stackoverflow.com/a/8798550
$feedContents | ForEach-Object {
$identifier = $r.Match($_.link).Value
$fileList = [xml](Invoke-WebRequest -UseBasicParsing -Uri "https://archive.org/download/$($identifier)/$($identifier)_files.xml")
$zipFileNames = $fileList.files.file | Where-Object { $_.name -Match "zip$" }
$zipFileNames | ForEach-Object {
$zipFileUri = "https://archive.org/download/$($identifier)/$($_.name)"
Write-Host $zipFileUri
Start-BitsTransfer $zipFileUri
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment