Skip to content

Instantly share code, notes, and snippets.

@butlerblog
Created December 9, 2023 14:58
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save butlerblog/e49555ba48e7f96c5bf5293bdf68b373 to your computer and use it in GitHub Desktop.
Save butlerblog/e49555ba48e7f96c5bf5293bdf68b373 to your computer and use it in GitHub Desktop.
Prints a json manifest of an archive.org book
import requests
import argparse
import json
my_parser = argparse.ArgumentParser()
my_parser.add_argument('-id', '--id', help='ID of the book (https://archive.org/details/XXXX).', type=str)
args = my_parser.parse_args()
book_id = args.id
url = "https://archive.org/details/"+book_id
r = requests.get(url).text
infos_url = "https:" + r.split('bookManifestUrl="')[1].split('"\n')[0]
response = requests.get(infos_url)
with open('data.json', 'w', encoding='utf-8') as f:
json.dump(response.json(), f, ensure_ascii=False, indent=4)
print("="*40)
print(f"Printed json manifest for: https://archive.org/details/{book_id}")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment