Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Archive.org Scanned Book Downloader Bookmarklet

Archive.org Scanned Book Downloader Bookmarklet

A simple "1-click" javascript approach to downloading a scanned book from archive.org to read at your leisure on the device of your choosing w/out having to manually screenshot every pages of the book by hand. In short it's a glorified "Save Image As..." approach but consolidated down to "1 click". BTW there may be a much better option than this out there - I just built this as an autistic project to see if it would work.

Demo Video

Archive.org SBDL Demo

Obligatory Legal/Disclaimer:

By using this script you agree to delete all book files/images after your 1 hour or 14 days is up! I don't support using this script for any other use cases. After all, none of us have ever kept a library book past it's return date, right?

NOTES:

  • Scanned Books Only: This only works on "scanned" books where each page is an image file. This means A) you won't be able search the text of the book and B) the book file size will be tens of megabytes not kilobytes like an EPUB/etc. Given the above always try to find the book in text format first (epub, etc) before using this method.
  • Compatibility: As of 11/2021 I've tested this on a few books w/no problems so it seems pretty stable but if someone finds a book that doesn't work w/it LMK in comments. It's very possible (likely?) at some point archive.org will change something that either requires some adjustments to this script and/or makes this approach no longer possible. Feel free to recommend tweaks or fixes if anyone has any suggestions btw.
  • Borrowed and?: I've only tested this for "Borrowed" books but I suppose you could use on Free books too - although normally those already offer a PDF download so not really a reason to do that.
  • Support: This is just a basic javascript thing so there's no real danger here but I can't/don't provide any support if this doesn't work for you and/or your browser crashes while trying it.

Instructions

  1. Create a bookmarklet in your browser using the code below via https://mrcoles.com/bookmarklet/
  2. Go to archive.org and "Borrow" the book for 1 hour or 14 days (only tested with the 1 hour)
  3. Once the borrowed book page reloads click zoom icon to zoom into the 1st page of book at least 2 times (otherwise you'll get low-res version of book images)
  4. Write down or make a mental note of how many pages the book has
  5. Use browser's "Inspect Element" on first page of book to find the page image URL and right-click to "open link" in a new tab.
  6. Once on the new tab looking at the book's 1st page image, click the bookmarklet button made in step 1 and type in the number of pages the book has that you noted in step 4. Tip: Add 5-10 more pages than the book has just in case the covers/final pages of the book actually add up to a higher number.
  7. As soon as you click 'OK' after entering the page count watch for the browser's "Allow Multiple Downloads from this Site" type message in your browser and click 'Accept' or whatever. Otherwise the process will fail. Some browsers may not do this - so disregard if this isn't an issue w/your browser.
  8. Wait for the process to finish - a 300 page book takes around 3-5 minutes. Note: You can minimize the browser tab/window while the pages are downloading.
  9. Once all pages have been downloaded an "alert" message will popup when the pages have all been downloaded.
  10. At this point you'll have a bunch of book page images in your Downloads folder like mybookwhatever_000.jpg, mybookwhatever_001.jpg etc.
  11. If you want to make a PDF of the pages go to https://tools.pdf24.org/en/images-to-pdf and drag all these images into the upload area. When the images are uploading click the "A-Z sort" button at the bottom of the page to make sure the pages sort by filename.
  12. Click the "Create PDF" button when it's ready and download the PDF when it's done.
  13. Now you can enjoy reading the book at your leisure, wherever you want without having to wait for the annoying page load times of archive.org, etc!
function downloadFile(filePath){
    var link=document.createElement('a');
    link.href = filePath;
    link.download = filePath.substr(filePath.lastIndexOf('/') + 1);
    link.click();
}

function getNewURL(pageCount){
	if(pageCount == null) pageCount = 1;
	var url = document.location.href; 	
	var urlParts = url.split(".jp2");	
	var urlPrefixParts = urlParts[0].split("_");	
	var urlPageNumber = urlPrefixParts[urlPrefixParts.length-1];	
	var nextPageNumberString = String(parseInt(urlPageNumber)+pageCount).padStart(4,'0');  	
	var newURLPrefix = ''; 
	for(var p=0;p<urlPrefixParts.length-1;p++) newURLPrefix += urlPrefixParts[p] + '_';	
	var newURL = newURLPrefix + nextPageNumberString + '.jp2' + urlParts[1];	
	return newURL;
}

var confirm1 = confirm('Archive.org Scanned Book Downloader:\n\nReady Check: Are you on a window/tab viewing *just* the IMAGE of the 1st page of the book? If not cancel and run this when you are.');
if(!confirm1) return false;
var pageCount = prompt('Archive.org Scanned Book Downloader:\n\nHow many pages are in this book?');
var pageCounter = 0;
var pageInterval = null;
if(pageCount == null || pageCount == undefined || parseInt(pageCount) == NaN){
   console.log('no page count provided.. giving up.');
}else{
	pageInterval = window.setInterval(function(){	
		if(pageCounter > parseInt(pageCount)){
			window.clearInterval(pageInterval);
			pageInterval = null;			
			console.log('downloading done!..');			
			var pdfTime = confirm('All pages downloaded! (some files may still be downloading though)\n\nWould you like to go to a site to create a PDF with them now?');
			if(pdfTime){
				window.open('https://tools.pdf24.org/en/images-to-pdf','_blank');			
			}
		}else{
			var nextFile = getNewURL(pageCounter);
			downloadFile(nextFile);
			console.log('downloading next page! (' + nextFile + ')');
		}
		pageCounter += 1;
	},900);
}
@henryyjjames
Copy link

henryyjjames commented Dec 3, 2021

I did get everything working after a while, this is a very handy tool that I appreciate. I have been taking the images and making them into a PDF using Adobe Acrobat. The only issue I have experienced since is that a lot of the images fail to download, and this seems to be a browser issue rather than internet speed. I am hooked up to my ethernet with 600mbps down/up. I had to click resume for over half of the pages in a 474 page book. and it wasn't like the later half failed, it was random clumps. Maybe I'm wrong. LMK if there's a fix or a better way to do that.

And if one image link opens when it's not supposed to after the first time you open the image the book returns on archive.org and all your downloads fail.

Also, since the resolution of the image is coded into the URL, I bet you could just write that in to ensure the highest quality.

cc: @cemerson

@henryyjjames
Copy link

henryyjjames commented Dec 3, 2021

Currently having a hell of a time with https://archive.org/details/behindscreenhowg00mann
I keep getting this error on archive.org after all the downloads fail and I try to borrow it again.
Screen Shot 2021-12-03 at 8 58 59 p m

@cemerson
Copy link
Author

cemerson commented Dec 3, 2021

Thanks for the feedback @henryyjjames - good to know. Yeah I noticed the scale is in the URL of the image but archive does something sneaky where if you don't load at least 1 of the images at that scale it sometimes expires/drops your book borrow status - there are a few spots in their server/process that are very touchy like that. But yeah as I use it more I'll tweak things like that if/where I can. Also I haven't tested on a book larger than 300 pages so you may be right it may need some tweaks for larger books too.

@jan1980
Copy link

jan1980 commented Jan 19, 2022

It works wonderfully. I applied the tutorial and it worked in a 260 pages book. Then I merged the images into a pdf on the online tool that you suggested. Wonderful!

If anyone receives the error "0NaN.jp2undefined" is because you haven't "borrowed for 1 hour". Once you do, only after you do it, you can proceed with the rest of steps.

I never created a bookmarklet before either, so I went first and foremost through the process. You enter the code, make visible the bookmarks bar on your browser, you drag from the resulting blue button to that bar and the bookmarklet is created with the name you gave it.

Many thanks for this. I will delete and return the book for sure, but now I can move it around devices much more easily.

@ColdCactus
Copy link

ColdCactus commented Jan 23, 2022

Outstanding, works like a charm!
It did take me some minutes to figure out which link to get exactly. After that it did exactly what I hoped for.
Thank you! :)

@ColdCactus
Copy link

ColdCactus commented Jan 24, 2022

@cemerson I used it on a 550 page book and it works great, too. Also works for books that do not need to be borrowed, but can be downloaded only in a format unsuitable for OCR (and with printing/exporting disabled, too). This just saved me a LOT of work manually screenshotting all pages (or building an Autohotkeys script).

Similar to what @henryyjjames reported, for some reason my browser didn't download about 10% of the files (Edge reported "network errors") in the download list). If someone else has that happen - it's very easy to scroll through the list and click the "resume/restart download" button next to the ones that failed, that quickly sorted it out without running the tool again.

That's likely a problem with my browser or computer, not with your script, because it happened in clusters at random intervals throughout the process. Maybe something timed out, I did have other things claim a lot of bandwidth and CPU cycles at the same time.

I would suggest to amend the fifth item of the step by step list ("Inspect element") to say which link to open in a new tab. I know it's in the video, but since i have zero clue of web dev it took me some messing around to figure out what I'm looking at in the video and need to look for in the Inspector/source.

Fun Fact: I used PDF24 (rather than an online service) and that one taps out at 300 images haha. I split my book into two PDFs and then merged those. The result then went into Omnipage for OCR to make it searchable.

@cemerson
Copy link
Author

cemerson commented Jan 24, 2022

Very glad to hear it worked for you @ColdCactus!

@acoiman
Copy link

acoiman commented Feb 26, 2022

It works, you only need to execute the code on the dev tool console and comment out "if(!confirm1) return false;" because it generates an error.

@cemerson
Copy link
Author

cemerson commented Mar 23, 2022

Good idea on dev console execution, @acoiman. I'm not sure what the error is about (I just tested a couple books myself and it went fine) but glad you found way around it!

@alxpsr
Copy link

alxpsr commented Apr 20, 2022

Confirm - it works. Launched this via devtools console. Just wrap code above into function like that

const grabber = function () {
    function downloadFile(filePath) {
        var link = document.createElement('a');
        link.href = filePath;
        link.download = filePath.substr(filePath.lastIndexOf('/') + 1);
        link.click();
    }

    function getNewURL(pageCount) {
        if (pageCount == null) pageCount = 1;
        var url = document.location.href;
        var urlParts = url.split(".jp2");
        var urlPrefixParts = urlParts[0].split("_");
        var urlPageNumber = urlPrefixParts[urlPrefixParts.length - 1];
        var nextPageNumberString = String(parseInt(urlPageNumber) + pageCount).padStart(4, '0');
        var newURLPrefix = '';
        for (var p = 0; p < urlPrefixParts.length - 1; p++) newURLPrefix += urlPrefixParts[p] + '_';
        var newURL = newURLPrefix + nextPageNumberString + '.jp2' + urlParts[1];
        return newURL;
    }

    var confirm1 = confirm('Archive.org Scanned Book Downloader:\n\nReady Check: Are you on a window/tab viewing *just* the IMAGE of the 1st page of the book? If not cancel and run this when you are.');
    if (!confirm1) return false;
    var pageCount = prompt('Archive.org Scanned Book Downloader:\n\nHow many pages are in this book?');
    var pageCounter = 0;
    var pageInterval = null;
    if (pageCount == null || pageCount == undefined || parseInt(pageCount) == NaN) {
        console.log('no page count provided.. giving up.');
    } else {
        pageInterval = window.setInterval(function () {
            if (pageCounter > parseInt(pageCount)) {
                window.clearInterval(pageInterval);
                pageInterval = null;
                console.log('downloading done!..');
                var pdfTime = confirm('All pages downloaded! (some files may still be downloading though)\n\nWould you like to go to a site to create a PDF with them now?');
                if (pdfTime) {
                    window.open('https://tools.pdf24.org/en/images-to-pdf', '_blank');
                }
            } else {
                var nextFile = getNewURL(pageCounter);
                downloadFile(nextFile);
                console.log('downloading next page! (' + nextFile + ')');
            }
            pageCounter += 1;
        }, 900);
    }
}

Then execute it:

grabber()

@cemerson
Copy link
Author

cemerson commented Apr 20, 2022

@alxpsr nice

@hex20dec
Copy link

hex20dec commented May 22, 2022

Great tool! Works flawlessly. Thanks.

@cemerson
Copy link
Author

cemerson commented May 23, 2022

Thanks @hex20dec, glad to hear! Now if only someone would invent/share some way to get me better at reading w/out losing focus so quickly heh.

@hex20dec
Copy link

hex20dec commented May 23, 2022

Thanks @hex20dec, glad to hear! Now if only someone would invent/share some way to get me better at reading w/out losing focus so quickly heh.

Haha, have you tried any nootropics long term?

@cemerson
Copy link
Author

cemerson commented May 24, 2022

No, never actually. Worth it?

@chaquit0
Copy link

chaquit0 commented Jul 15, 2022

Thanks m8, really appreciated, it works really well, tried with this one: https://archive.org/details/zeropollutionfor0000neme

@cemerson
Copy link
Author

cemerson commented Jul 15, 2022

Thanks - glad to hear it worked, @chaquit0!

@mikkovedru
Copy link

mikkovedru commented Jul 16, 2022

It worked nicely. Thank you!

May I suggest that you add "(Firefox) Before starting the download, go to browser settings, choose the directory to download files in, and select to save files automatically."

@gagrotxgb
Copy link

gagrotxgb commented Aug 19, 2022

This same very script can also be used to download books from Pustak.org Just a few changes replacing the '_' with '/Image' & replacing the '.jp2' with '.jpg' and finally replacing the parse command (4,0) with (3,0)

@agatakotecka
Copy link

agatakotecka commented Sep 20, 2022

Does this method above still work for anyone?
In Firefox the bookmarklet does nothing for me. When I put no. of pages, no further action is observed.
In Chrome it produces Failed - No file. And at the same time archive.org lending for the book throws out lending error.....is it new anti-debugger protection against this script and the reason it fails?

@cemerson
Copy link
Author

cemerson commented Sep 20, 2022

Hey @agatakotecka - I just tested using Brave (basically Chrome) and it did work and also tried in Firefox which also worked for me. It's possible your browser version or something else is causing an error - maybe try a Chrome browser? It's also possible some books have something unique about them that causes the script to fail - if you have a book that isn't working let me know and I'm happy to try it for you. GL.

@agatakotecka
Copy link

agatakotecka commented Sep 20, 2022

Thanks Cemerson! I've did third test on latest Firefox browser (104) and it works like a charm.
The previous tests were done on older browsers such as FF 63 and Chrome 80, which don't seem to be compatible (they're old by now), or I've messed up with their settings. Also on latest FF I'm no longer being lent-out automatically.
Thank you very much for your great work tutorial.

@cemerson
Copy link
Author

cemerson commented Sep 20, 2022

Thank you, @agatakotecka - glad it worked for you! Happy reading! ;)

@nenabunena
Copy link

nenabunena commented Sep 23, 2022

I tried this 2 weeks ago in internet archive and it worked but starting last week it has stopped working, maybe they did something because I cannot find the image file on internet archive now.

@nenabunena
Copy link

nenabunena commented Sep 23, 2022

I tried 3 different books and I can't get it to work nor get the inspect element of the first page. Like this, I can't get it to work for this as an example:

https://archive.org/details/elviscloseuprare00levi

I also tried different browsers and still no go.

@agatakotecka
Copy link

agatakotecka commented Sep 24, 2022

I tried 3 different books and I can't get it to work nor get the inspect element of the first page. Like this, I can't get it to work for this as an example:

https://archive.org/details/elviscloseuprare00levi

I also tried different browsers and still no go.

Doesn't work for me also with latest Firefox. But the main reason is that it uses different links format than suitable for cemerson's script:

Page 1:
https://ia600202.us.archive.org/BookReader/BookReaderPreview.php?id=elviscloseuprare00levi&subPrefix=elviscloseuprare00levi&itemPath=/1/items/elviscloseuprare00levi&server=ia600202.us.archive.org&page=leaf1&fail=preview&&scale=2&rotate=0

Page 2:
https://ia600202.us.archive.org/BookReader/BookReaderPreview.php?id=elviscloseuprare00levi&subPrefix=elviscloseuprare00levi&itemPath=/1/items/elviscloseuprare00levi&server=ia600202.us.archive.org&page=leaf2&fail=preview&&scale=2&rotate=0

One can see that the variable here is number after 'leaf' work and this is not the compatible format for cemerson's script. Also they don't contain 'jp2' wording at all.

Not sure if this is something that can be accounted for in script update and whether it's feasible to do ( I have no idea how many more links variations archive.org uses - there may be a couple or there may be hundreds).

Anyway, I'd say use Excel for this and it's not difficult to create variable links for each page of your book. The only variable will be number after 'leaf'word here, different for each spreadsheet row (148 rows in total you need). Then use e.g. Flashget to download them all.

@cemerson
Copy link
Author

cemerson commented Sep 26, 2022

Just curious @agatakotecka does it work if you try Chrome or Brave?

@Alchemytr
Copy link

Alchemytr commented Sep 27, 2022

@cemerson, thanks so much for creating this script (and the instructions to go with it so twits like me can use it)!
As others have said, I'll be deleting the 'book' as soon as I've referenced the sections I need to. I'm just glad I don't need to wrangle with awful DRM to do this.

@nenabunena @agatakotecka - confirming that I've just successfully run the script on the latest version of Brave [Version 1.43.93 Chromium: 105.0.5195.127 (Official Build) (64-bit)].

I was getting the same error some people referred to further up the conversation ("0NaN.jp2undefined" appearing as the 'filename' for every empty file) and I discovered it was because I'd opened the thumbnail version of the first page of the book in a new tab. Of course, the script was expecting a different URL and it broke.

The key for me was looking at the video again and identifying exactly what class the line @cemerson selected was declared as (note - it's 'BRpageimage' with no quotation marks). This is unique to the correct image, so you can follow the below steps to ensure you also get it right :)

  1. Complete the 'inspect element' steps (you need to click on the header of the page to do this - right clicking on the book image you're looking at won't work)
  2. Click into the console i.e. the text you're looking at trying to locate the correct image link and press CTRL + F to search/filter
  3. Type (or copy/paste in) 'BRpageimage' with no quotation marks and you'll be taken to the correct line
  4. Right click on the image URL/link (preceded by src=, it will start with https:// as with any URL) and select 'Open in a new tab'
  5. You're in business! NOW, on the new tab you just opened (you should see a LARGE image of the first page of your new book), run the bookmarklet you created and added to your bookmarks bar and follow the prompts.

Hope this helps someone else. Enjoy and happy reading.

@cemerson
Copy link
Author

cemerson commented Sep 27, 2022

Thank you @Alchemytr for sharing all those details w/people - hopefully that will help folks having issues. Very glad it worked for you too btw :)

@nenabunena
Copy link

nenabunena commented Sep 29, 2022

Thanks @Alchemytr & @cemerson I was able to get the book by following this, copy/paste the link which didn't work for some books before or perhaps I did it wrong before. I will try this new method, thank you so much for taking the time to figure it out for me! Because I am sure I will use this very soon & update everyone how it goes!

https://www.isolveit.xyz/2021/05/download-borrow-books-from-archiveorg.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment