Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save cemerson/043d3b455317d762bb1378aeac3679f3 to your computer and use it in GitHub Desktop.
Save cemerson/043d3b455317d762bb1378aeac3679f3 to your computer and use it in GitHub Desktop.
Archive.org Scanned Book Downloader Bookmarklet

Archive.org Scanned Book Downloader Bookmarklet

A simple "1-click" javascript approach to downloading a scanned book from archive.org to read at your leisure on the device of your choosing w/out having to manually screenshot every pages of the book by hand. In short it's a glorified "Save Image As..." approach but consolidated down to "1 click". BTW there may be a much better option than this out there - I just built this as an autistic project to see if it would work.

Demo Video

Archive.org SBDL Demo

Obligatory Legal/Disclaimer:

By using this script you agree to delete all book files/images after your 1 hour or 14 days is up! I don't support using this script for any other use cases. After all, none of us have ever kept a library book past it's return date, right?

NOTES:

  • Scanned Books Only: This only works on "scanned" books where each page is an image file. This means A) you won't be able search the text of the book and B) the book file size will be tens of megabytes not kilobytes like an EPUB/etc. Given the above always try to find the book in text format first (epub, etc) before using this method.
  • Compatibility: As of 11/2021 I've tested this on a few books w/no problems so it seems pretty stable but if someone finds a book that doesn't work w/it LMK in comments. It's very possible (likely?) at some point archive.org will change something that either requires some adjustments to this script and/or makes this approach no longer possible. Feel free to recommend tweaks or fixes if anyone has any suggestions btw.
  • Borrowed and?: I've only tested this for "Borrowed" books but I suppose you could use on Free books too - although normally those already offer a PDF download so not really a reason to do that.
  • Support: This is just a basic javascript thing so there's no real danger here but I can't/don't provide any support if this doesn't work for you and/or your browser crashes while trying it.

Instructions

  1. Create a bookmarklet in your browser using the code below via https://mrcoles.com/bookmarklet/
  2. Go to archive.org and "Borrow" the book for 1 hour or 14 days (only tested with the 1 hour)
  3. Once the borrowed book page reloads click zoom icon to zoom into the 1st page of book at least 2 times (otherwise you'll get low-res version of book images)
  4. Write down or make a mental note of how many pages the book has
  5. Use browser's "Inspect Element" on first page of book to find the page image URL and right-click to "open link" in a new tab.
  6. Once on the new tab looking at the book's 1st page image, click the bookmarklet button made in step 1 and type in the number of pages the book has that you noted in step 4. Tip: Add 5-10 more pages than the book has just in case the covers/final pages of the book actually add up to a higher number.
  7. As soon as you click 'OK' after entering the page count watch for the browser's "Allow Multiple Downloads from this Site" type message in your browser and click 'Accept' or whatever. Otherwise the process will fail. Some browsers may not do this - so disregard if this isn't an issue w/your browser.
  8. Wait for the process to finish - a 300 page book takes around 3-5 minutes. Note: You can minimize the browser tab/window while the pages are downloading.
  9. Once all pages have been downloaded an "alert" message will popup when the pages have all been downloaded.
  10. At this point you'll have a bunch of book page images in your Downloads folder like mybookwhatever_000.jpg, mybookwhatever_001.jpg etc.
  11. If you want to make a PDF of the pages go to https://tools.pdf24.org/en/images-to-pdf and drag all these images into the upload area. When the images are uploading click the "A-Z sort" button at the bottom of the page to make sure the pages sort by filename.
  12. Click the "Create PDF" button when it's ready and download the PDF when it's done.
  13. Now you can enjoy reading the book at your leisure, wherever you want without having to wait for the annoying page load times of archive.org, etc!
function downloadFile(filePath){
    var link=document.createElement('a');
    link.href = filePath;
    link.download = filePath.substr(filePath.lastIndexOf('/') + 1);
    link.click();
}

function getNewURL(pageCount){
	if(pageCount == null) pageCount = 1;
	var url = document.location.href; 	
	var urlParts = url.split(".jp2");	
	var urlPrefixParts = urlParts[0].split("_");	
	var urlPageNumber = urlPrefixParts[urlPrefixParts.length-1];	
	var nextPageNumberString = String(parseInt(urlPageNumber)+pageCount).padStart(4,'0');  	
	var newURLPrefix = ''; 
	for(var p=0;p<urlPrefixParts.length-1;p++) newURLPrefix += urlPrefixParts[p] + '_';	
	var newURL = newURLPrefix + nextPageNumberString + '.jp2' + urlParts[1];	
	return newURL;
}

var confirm1 = confirm('Archive.org Scanned Book Downloader:\n\nReady Check: Are you on a window/tab viewing *just* the IMAGE of the 1st page of the book? If not cancel and run this when you are.');
if(!confirm1) return false;
var pageCount = prompt('Archive.org Scanned Book Downloader:\n\nHow many pages are in this book?');
var pageCounter = 0;
var pageInterval = null;
if(pageCount == null || pageCount == undefined || parseInt(pageCount) == NaN){
   console.log('no page count provided.. giving up.');
}else{
	pageInterval = window.setInterval(function(){	
		if(pageCounter > parseInt(pageCount)){
			window.clearInterval(pageInterval);
			pageInterval = null;			
			console.log('downloading done!..');			
			var pdfTime = confirm('All pages downloaded! (some files may still be downloading though)\n\nWould you like to go to a site to create a PDF with them now?');
			if(pdfTime){
				window.open('https://tools.pdf24.org/en/images-to-pdf','_blank');			
			}
		}else{
			var nextFile = getNewURL(pageCounter);
			downloadFile(nextFile);
			console.log('downloading next page! (' + nextFile + ')');
		}
		pageCounter += 1;
	},900);
}
@cemerson
Copy link
Author

Thanks - glad to hear it worked, @chaquit0!

@mikkovedru
Copy link

It worked nicely. Thank you!

May I suggest that you add "(Firefox) Before starting the download, go to browser settings, choose the directory to download files in, and select to save files automatically."

@gagrotxgb
Copy link

This same very script can also be used to download books from Pustak.org Just a few changes replacing the '_' with '/Image' & replacing the '.jp2' with '.jpg' and finally replacing the parse command (4,0) with (3,0)

@agatakotecka
Copy link

agatakotecka commented Sep 20, 2022

Does this method above still work for anyone?
In Firefox the bookmarklet does nothing for me. When I put no. of pages, no further action is observed.
In Chrome it produces Failed - No file. And at the same time archive.org lending for the book throws out lending error.....is it new anti-debugger protection against this script and the reason it fails?

@cemerson
Copy link
Author

Hey @agatakotecka - I just tested using Brave (basically Chrome) and it did work and also tried in Firefox which also worked for me. It's possible your browser version or something else is causing an error - maybe try a Chrome browser? It's also possible some books have something unique about them that causes the script to fail - if you have a book that isn't working let me know and I'm happy to try it for you. GL.

@agatakotecka
Copy link

Thanks Cemerson! I've did third test on latest Firefox browser (104) and it works like a charm.
The previous tests were done on older browsers such as FF 63 and Chrome 80, which don't seem to be compatible (they're old by now), or I've messed up with their settings. Also on latest FF I'm no longer being lent-out automatically.
Thank you very much for your great work tutorial.

@cemerson
Copy link
Author

Thank you, @agatakotecka - glad it worked for you! Happy reading! ;)

@nenabunena
Copy link

I tried this 2 weeks ago in internet archive and it worked but starting last week it has stopped working, maybe they did something because I cannot find the image file on internet archive now.

@nenabunena
Copy link

nenabunena commented Sep 23, 2022

I tried 3 different books and I can't get it to work nor get the inspect element of the first page. Like this, I can't get it to work for this as an example:

https://archive.org/details/elviscloseuprare00levi

I also tried different browsers and still no go.

@agatakotecka
Copy link

agatakotecka commented Sep 24, 2022

I tried 3 different books and I can't get it to work nor get the inspect element of the first page. Like this, I can't get it to work for this as an example:

https://archive.org/details/elviscloseuprare00levi

I also tried different browsers and still no go.

Doesn't work for me also with latest Firefox. But the main reason is that it uses different links format than suitable for cemerson's script:

Page 1:
https://ia600202.us.archive.org/BookReader/BookReaderPreview.php?id=elviscloseuprare00levi&subPrefix=elviscloseuprare00levi&itemPath=/1/items/elviscloseuprare00levi&server=ia600202.us.archive.org&page=leaf1&fail=preview&&scale=2&rotate=0

Page 2:
https://ia600202.us.archive.org/BookReader/BookReaderPreview.php?id=elviscloseuprare00levi&subPrefix=elviscloseuprare00levi&itemPath=/1/items/elviscloseuprare00levi&server=ia600202.us.archive.org&page=leaf2&fail=preview&&scale=2&rotate=0

One can see that the variable here is number after 'leaf' work and this is not the compatible format for cemerson's script. Also they don't contain 'jp2' wording at all.

Not sure if this is something that can be accounted for in script update and whether it's feasible to do ( I have no idea how many more links variations archive.org uses - there may be a couple or there may be hundreds).

Anyway, I'd say use Excel for this and it's not difficult to create variable links for each page of your book. The only variable will be number after 'leaf'word here, different for each spreadsheet row (148 rows in total you need). Then use e.g. Flashget to download them all.

@cemerson
Copy link
Author

Just curious @agatakotecka does it work if you try Chrome or Brave?

@Alchemytr
Copy link

@cemerson, thanks so much for creating this script (and the instructions to go with it so twits like me can use it)!
As others have said, I'll be deleting the 'book' as soon as I've referenced the sections I need to. I'm just glad I don't need to wrangle with awful DRM to do this.

@nenabunena @agatakotecka - confirming that I've just successfully run the script on the latest version of Brave [Version 1.43.93 Chromium: 105.0.5195.127 (Official Build) (64-bit)].

I was getting the same error some people referred to further up the conversation ("0NaN.jp2undefined" appearing as the 'filename' for every empty file) and I discovered it was because I'd opened the thumbnail version of the first page of the book in a new tab. Of course, the script was expecting a different URL and it broke.

The key for me was looking at the video again and identifying exactly what class the line @cemerson selected was declared as (note - it's 'BRpageimage' with no quotation marks). This is unique to the correct image, so you can follow the below steps to ensure you also get it right :)

  1. Complete the 'inspect element' steps (you need to click on the header of the page to do this - right clicking on the book image you're looking at won't work)
  2. Click into the console i.e. the text you're looking at trying to locate the correct image link and press CTRL + F to search/filter
  3. Type (or copy/paste in) 'BRpageimage' with no quotation marks and you'll be taken to the correct line
  4. Right click on the image URL/link (preceded by src=, it will start with https:// as with any URL) and select 'Open in a new tab'
  5. You're in business! NOW, on the new tab you just opened (you should see a LARGE image of the first page of your new book), run the bookmarklet you created and added to your bookmarks bar and follow the prompts.

Hope this helps someone else. Enjoy and happy reading.

@cemerson
Copy link
Author

Thank you @Alchemytr for sharing all those details w/people - hopefully that will help folks having issues. Very glad it worked for you too btw :)

@nenabunena
Copy link

Thanks @Alchemytr & @cemerson I was able to get the book by following this, copy/paste the link which didn't work for some books before or perhaps I did it wrong before. I will try this new method, thank you so much for taking the time to figure it out for me! Because I am sure I will use this very soon & update everyone how it goes!

https://www.isolveit.xyz/2021/05/download-borrow-books-from-archiveorg.html

@miluoshi
Copy link

miluoshi commented Dec 14, 2022

For me the original script didn't work, because page url didn't contain text .jp2. It was in this format: https://archive.org/details/<id>/mode/1up?view=theater

the following getNewURL worked:

function getNewURL(pageCount) {
  if (pageCount == null) pageCount = 1;
  var firstImageURL = document.querySelector('.BRpageimage').src;
  var newURL = new URL(firstImageURL);
  var filePath = newURL.searchParams.get('file');
  var idLength = filePath.match(/_(\d+)\.jp2/)[1].length;
  var newFileParam = newURL.searchParams.get('file').replace(/_\d+\.jp2/, `_${String(pageCount).padStart(idLength, '0')}.jp2`);
  newURL.searchParams.set('file', newFileParam);
  return newURL.origin + newURL.pathname + decodeURIComponent(newURL.search);
}

It reads url of the 1st image and generates url for following pages by replacing page number in the 1st image url.

@Yupoman
Copy link

Yupoman commented Dec 29, 2022

For those who do not understand (like me) how to add the bookmark, just copy and paste the code that is at the end in the main box of that page "mrcoles", click on "convert" and simply drag where it says "this link" to the markers.

@Xekep
Copy link

Xekep commented Jan 6, 2023

Fix:

function getNewURL(pageCount) {
  if (pageCount == null) pageCount = 1;
  var url = document.location.href;
  url = url.replace(/_(\d+)\.jp2/, "_" + ("0000" + pageCount).slice(-4) + '.jp2' );
  return url;
}

@TTneedsbooks
Copy link

Hey @agatakotecka - I just tested using Brave (basically Chrome) and it did work and also tried in Firefox which also worked for me. It's possible your browser version or something else is causing an error - maybe try a Chrome browser? It's also possible some books have something unique about them that causes the script to fail - if you have a book that isn't working let me know and I'm happy to try it for you. GL.

@cemerson, could you please help me with this 2 books? https://archive.org/details/myguruhisdiscipl00ishe_1 & https://archive.org/details/srimadbhagavatam0000unse_y6u5
I am using Chrome Browser. I cannot find the correct inspect element

@raindog308
Copy link

Just tested with Firefox 109 on macOS Ventura 13.2 and it worked fine.

The suggestion by @mikkovedru to set a download file is vital, otherwise you'll be prompted to save each page.

BTW, on macOS you can assemble PDFs using Preview. Just select all files and open with Preview, check page order because sometimes a few are mixed up, and then export as PDF.

@UrbanIXOrbit
Copy link

why not create a browser extension for this?

@Rangerrick2018
Copy link

Hey, I've gotten as far as setting up the bookmarklet, I'm just stuck trying to find the 1st-page url while inspecting the element. I don't understand code, can anyone help a bit?

Screenshot:
https://imgur.com/a/toMcp61

Book link:
https://archive.org/details/waffensshitlerse00stei/page/n7/mode/2up

Book title:
The Waffen SS : Hitler's elite guard at war 1939-1945

I'm using chrome, Ive pasted the code and I get the prompt asking me if I am on the page with JUST the first image. Any help would be greatly appreciated.

@cemerson
Copy link
Author

cemerson commented Mar 22, 2023

@Rangerrick2018 Normally when just 1 image downloads it is because you didn't set the site to allow multiple downloads - sometimes this prompt can be easy to miss/dismiss.

Anyhow - it looks like someone else has already pulled that one here :)

Interesting book btw - may add that to my list to check out.

@KOFESSE
Copy link

KOFESSE commented Apr 4, 2023

thank you verry much. But I can't enderstand. Please help me to download this book: https://archive.org/details/chevre19071919190000aubi
Thanks in advance.

@etozhepizteh
Copy link

etozhepizteh commented Apr 5, 2023

Thanks for your code! It works well on loan-free books.
Unfortunately, archive seems to interrupt the loan after a number of pages have been downloaded and sends an error that just creates new empty tabs in the browser. Have you encountered this issue yet? Maybe there is a way to overcome it by raising the time interval between the downloads somehow?
Anyway, if you are still interested in this project, I would be really glad to know your opinion as to how handle this issue.

UPD: Weirdly enough, Brave browser doesn't suffer from this issue, everything goes like clockwork. I have encountered difficulties only on Mozilla.

@mabba18
Copy link

mabba18 commented Apr 14, 2023

Thanks for this great tool. As an alternate suggestion, I just put the images into a zip file, renamed it .cbz and will use a Comic viewer to read.

@xwx1829
Copy link

xwx1829 commented Apr 21, 2023

whiteboard

@RaveHunter05
Copy link

It worked pretty well, thank you so much!

@cemerson
Copy link
Author

cemerson commented May 4, 2023

Thanks for this great tool. As an alternate suggestion, I just put the images into a zip file, renamed it .cbz and will use a Comic viewer to read.

Nice idea!

@cemerson
Copy link
Author

cemerson commented May 4, 2023

Thanks for your code! It works well on loan-free books. Unfortunately, archive seems to interrupt the loan after a number of pages have been downloaded and sends an error that just creates new empty tabs in the browser. Have you encountered this issue yet? Maybe there is a way to overcome it by raising the time interval between the downloads somehow? Anyway, if you are still interested in this project, I would be really glad to know your opinion as to how handle this issue.

UPD: Weirdly enough, Brave browser doesn't suffer from this issue, everything goes like clockwork. I have encountered difficulties only on Mozilla.

Nice to know Brave still working ok. I do expect at some point this solution will break due to changes they make on the site or just general javascript deprecation. If/when that happens I may try to fix it if I get inspired I supposed. Here's hoping that doesn't happen for a while though! :)

@shivangsorout
Copy link

Worked in firefox for me!! But not in chrome I don't know why though!!

@akfoutkonnen
Copy link

can you please help me with this book: https://archive.org/details/alternativemedic0000cros/page/n11/mode/2up
after a few page counts on chrome it returns failed - no file
TIA

@shivangsorout
Copy link

can you please help me with this book: https://archive.org/details/alternativemedic0000cros/page/n11/mode/2up after a few page counts on chrome it returns failed - no file TIA

Use firefox!! In first try it won't work but after first try it worked for me!!

@ulissesBR
Copy link

ulissesBR commented May 31, 2023

I tried in both Chrome, Brave, and Firefox, with no success. I type the number of pages and absolutely nothing happens next... =/
I followed the instructions step by step but even doing it more than one time, the result is the same =/.
Is there anyone experiencing the same problem?

@bitounu
Copy link

bitounu commented Jun 12, 2023

Here's working code:
`function downloadFile(filePath){
var link=document.createElement('a');
link.href = filePath;
link.download = filePath.substr(filePath.lastIndexOf('/') + 1);
link.click();
}

function getNewURL(pageCount){
if(pageCount == null) pageCount = 1;
var url = document.location.href;
var urlParts = url.split(".jp2");
var urlPrefixParts = urlParts[0].split("");
var urlPageNumber = urlPrefixParts[urlPrefixParts.length-1];
var nextPageNumberString = String(parseInt(urlPageNumber, 10)+pageCount).padStart(4,'0');
var newURLPrefix = '';
for(var p=0;p<urlPrefixParts.length-1;p++) newURLPrefix += urlPrefixParts[p] + '
';
var newURL = newURLPrefix + nextPageNumberString + '.jp2' + urlParts[1];
return newURL;
}

var confirm1 = confirm('Archive.org Scanned Book Downloader:\n\nReady Check: Are you on a window/tab viewing just the IMAGE of the 1st page of the book? If not cancel and run this when you are.');
if(!confirm1) return false;
var pageCount = prompt('Archive.org Scanned Book Downloader:\n\nHow many pages are in this book?');
var pageCounter = 0;
var pageInterval = null;
if(pageCount == null || pageCount == undefined || parseInt(pageCount) == NaN){
console.log('no page count provided.. giving up.');
}else{
pageInterval = window.setInterval(function(){
if(pageCounter > parseInt(pageCount)){
window.clearInterval(pageInterval);
pageInterval = null;
console.log('downloading done!..');
var pdfTime = confirm('All pages downloaded! (some files may still be downloading though)\n\nWould you like to go to a site to create a PDF with them now?');
if(pdfTime){
window.open('https://tools.pdf24.org/en/images-to-pdf','_blank');
}
}else{
var nextFile = getNewURL(pageCounter);
downloadFile(nextFile);
console.log('downloading next page! (' + nextFile + ')');
}
pageCounter += 1;
},900);
}`

Say thanks for it to all smart people who built ChatGPT.

@bitounu
Copy link

bitounu commented Jun 12, 2023

And for future, here's prompt:

I have bookmarklet for downloading images.
Instruction for bookmarklet says:

`Once on the new tab looking at the book's 1st page image, click the bookmarklet button made in step 1 and type in the number of pages the book has.
 As soon as you click 'OK' after entering the page count watch for the browser's "Allow Multiple Downloads from this Site" type message in your browser and click 'Accept' or whatever. Otherwise the process will fail. Some browsers may not do this - so disregard if this isn't an issue w/your browser.
Wait for the process to finish - a 300 page book takes around 3-5 minutes. Note: You can minimize the browser tab/window while the pages are downloading.
Once all pages have been downloaded an "alert" message will popup when the pages have all been downloaded.
At this point you'll have a bunch of book page images in your Downloads folder like mybookwhatever_000.jpg, mybookwhatever_001.jpg etc.`


I have bookmarklet that should work and download all pages but it doesn't work.
Bookmarklet looks like this:
<paste current not working code of bookmarklet>

When I have new tab with the book's 1st page image the URL is:
<paste first page image URL>
and on 2nd page image URL is:
<paste first page image URL>

Help me to make it work.

:)

@cemerson
Copy link
Author

@bitounu Interesting. I tried that code you pasted and it didn't work for me but maybe I used it wrong. In any case yeah nice idea to have AI try it out. Old bookmarklet code still works for me so I'll stick w/that for now but won't be surprised if/when the AI version is better. Maybe the entire task can be altered for AI's approach to it (like "here's a link to a book, please download all the pages and number and download them for me" or something?).

@bitounu
Copy link

bitounu commented Jun 13, 2023

Ha ha! 😄 I didn't tried it but I don't think it will do it better than you. I had some problems, it looks like archive.org borrowing expires earlier when I tried to download. I had to be quick to run script right after I borrowed book. I run it in Ubuntu FF with a lot other add-ons.

@horribleCCguru
Copy link

horribleCCguru commented Jul 2, 2023

Hey, so the .md file isn't very beginner friendly. Basically, what you need to do is essentially:

let prog = decodeURIComponent(`(function()%7Bfunction%20downloadFile(filePath)%7Bvar%20link%3Ddocument.createElement('a')%3Blink.href%20%3D%20filePath%3Blink.download%20%3D%20filePath.substr(filePath.lastIndexOf('%2F')%20%2B%201)%3Blink.click()%3B%7Dfunction%20getNewURL(pageCount)%7Bif(pageCount%20%3D%3D%20null)%20pageCount%20%3D%201%3Bvar%20url%20%3D%20document.location.href%3Bvar%20urlParts%20%3D%20url.split(%22.jp2%22)%3Bvar%20urlPrefixParts%20%3D%20urlParts%5B0%5D.split(%22_%22)%3Bvar%20urlPageNumber%20%3D%20urlPrefixParts%5BurlPrefixParts.length-1%5D%3Bvar%20nextPageNumberString%20%3D%20String(parseInt(urlPageNumber)%2BpageCount).padStart(4%2C'0')%3Bvar%20newURLPrefix%20%3D%20''%3Bfor(var%20p%3D0%3Bp%3CurlPrefixParts.length-1%3Bp%2B%2B)%20newURLPrefix%20%2B%3D%20urlPrefixParts%5Bp%5D%20%2B%20'_'%3Bvar%20newURL%20%3D%20newURLPrefix%20%2B%20nextPageNumberString%20%2B%20'.jp2'%20%2B%20urlParts%5B1%5D%3Breturn%20newURL%3B%7Dvar%20confirm1%20%3D%20confirm('Archive.org%20Scanned%20Book%20Downloader%3A%5Cn%5CnReady%20Check%3A%20Are%20you%20on%20a%20window%2Ftab%20viewing%20*just*%20the%20IMAGE%20of%20the%201st%20page%20of%20the%20book%3F%20If%20not%20cancel%20and%20run%20this%20when%20you%20are.')%3Bif(!confirm1)%20return%20false%3Bvar%20pageCount%20%3D%20prompt('Archive.org%20Scanned%20Book%20Downloader%3A%5Cn%5CnHow%20many%20pages%20are%20in%20this%20book%3F')%3Bvar%20pageCounter%20%3D%200%3Bvar%20pageInterval%20%3D%20null%3Bif(pageCount%20%3D%3D%20null%20%7C%7C%20pageCount%20%3D%3D%20undefined%20%7C%7C%20parseInt(pageCount)%20%3D%3D%20NaN)%7Bconsole.log('no%20page%20count%20provided..%20giving%20up.')%3B%7Delse%7BpageInterval%20%3D%20window.setInterval(function()%7Bif(pageCounter%20%3E%20parseInt(pageCount))%7Bwindow.clearInterval(pageInterval)%3BpageInterval%20%3D%20null%3Bconsole.log('downloading%20done!..')%3Bvar%20pdfTime%20%3D%20confirm('All%20pages%20downloaded!%20(some%20files%20may%20still%20be%20downloading%20though)%5Cn%5CnWould%20you%20like%20to%20go%20to%20a%20site%20to%20create%20a%20PDF%20with%20them%20now%3F')%3Bif(pdfTime)%7Bwindow.open('https%3A%2F%2Ftools.pdf24.org%2Fen%2Fimages-to-pdf'%2C'_blank')%3B%7D%7Delse%7Bvar%20nextFile%20%3D%20getNewURL(pageCounter)%3BdownloadFile(nextFile)%3Bconsole.log('downloading%20next%20page!%20('%20%2B%20nextFile%20%2B%20')')%3B%7DpageCounter%20%2B%3D%201%3B%7D%2C900)%3B%7D%7D)()`);

eval(prog);

you do this by opening dev tools, with browser tab focusing on the page 1 .jp2 URL, which you find by inspecting the archive.org site, hover over the book page, and you'll see it.

@ahaji2002
Copy link

@cemerson, thanks so much for creating this script (and the instructions to go with it so twits like me can use it)! As others have said, I'll be deleting the 'book' as soon as I've referenced the sections I need to. I'm just glad I don't need to wrangle with awful DRM to do this.

@nenabunena @agatakotecka - confirming that I've just successfully run the script on the latest version of Brave [Version 1.43.93 Chromium: 105.0.5195.127 (Official Build) (64-bit)].

I was getting the same error some people referred to further up the conversation ("0NaN.jp2undefined" appearing as the 'filename' for every empty file) and I discovered it was because I'd opened the thumbnail version of the first page of the book in a new tab. Of course, the script was expecting a different URL and it broke.

The key for me was looking at the video again and identifying exactly what class the line @cemerson selected was declared as (note - it's 'BRpageimage' with no quotation marks). This is unique to the correct image, so you can follow the below steps to ensure you also get it right :)

  1. Complete the 'inspect element' steps (you need to click on the header of the page to do this - right clicking on the book image you're looking at won't work)
  2. Click into the console i.e. the text you're looking at trying to locate the correct image link and press CTRL + F to search/filter
  3. Type (or copy/paste in) 'BRpageimage' with no quotation marks and you'll be taken to the correct line
  4. Right click on the image URL/link (preceded by src=, it will start with https:// as with any URL) and select 'Open in a new tab'
  5. You're in business! NOW, on the new tab you just opened (you should see a LARGE image of the first page of your new book), run the bookmarklet you created and added to your bookmarks bar and follow the prompts.

Hope this helps someone else. Enjoy and happy reading.

Thanks @Alchemytr , it works for me. The only issue for me is that although I allow the browser for multiple download, the Windows keeps asking for saving the file. I can see in the file manager that temporary files being downloaded automatically. So, I need to click manually one by one. Any idea how to avoid Windows asking for confirmation to save the file?

@TheAmazingAceLeo
Copy link

If anyone is finding it hard to get it to work or having it stop halfway through, it keeps downloading if you stay on the page that you first started downloading on.

@amochkin
Copy link

Fixes and refactor:

function downloadFile(filePath) {
	const link = document.createElement('a');
	link.href = filePath;
	link.download = filePath.substring(filePath.lastIndexOf('/') + 1);
	link.click();
}

function getNewURL(pageCount) {
	if (!pageCount) pageCount = 1;
	let url = document.location.href;
	url = url.replace(/_(\d+)\.jp2/, '_' + ('0000' + pageCount).slice(-4) + '.jp2');
	return url;
}

if (
	confirm(
		'Archive.org Scanned Book Downloader:\n\nReady Check: Are you on a window/tab viewing *just* the IMAGE of the 1st page of the book? If not cancel and run this when you are.',
	)
) {
	const pageCount = prompt('Archive.org Scanned Book Downloader:\n\nHow many pages are in this book?');
	let pageCounter = 0;
	let pageInterval = null;
	if (pageCount == null || isNaN(parseInt(pageCount))) {
		console.log('no page count provided.. giving up.');
	} else {
		pageInterval = window.setInterval(function () {
			if (pageCounter > parseInt(pageCount)) {
				window.clearInterval(pageInterval);
				pageInterval = null;
				console.log('downloading done!..');
				if (
					confirm(
						'All pages downloaded! (some files may still be downloading though)\n\nWould you like to go to a site to create a PDF with them now?',
					)
				) {
					window.open('https://tools.pdf24.org/en/images-to-pdf', '_blank');
				}
			} else {
				const nextFile = getNewURL(pageCounter);
				downloadFile(nextFile);
				console.log('downloading next page! (' + nextFile + ')');
			}
			pageCounter += 1;
		}, 900);
	}
}

@cockfighter
Copy link

cockfighter commented Feb 17, 2024

How are you activating the bookmarklet in one tab while still "viewing just at the IMAGE" in another tab (w Chrome osx)?
toggling between tabs deactivates Bookmarklet prompts (e.g. 'how many pages')? Otherwise I keep getting error message

error msg(s): Could not download - No file

Also, anyone else having difficulty isolating the specific string in Inspect > Elements (like me) the very same image (location) is also available from the Sources tab (usually in the last, or near last, folder).
fyi: images appear to be .jp2

edit: i can only get it to generate/download: download (1).html docs (as many as indicated in requested page count). I installed Brave (i.e. same browser utilized in demo video) with same results. any help?

@Enissay
Copy link

Enissay commented Feb 28, 2024

I confirm it is not working anymore:

  • borrow the book for 1hr
  • open the frist page/image in new tab
  • start the macro (10 for test xD)
  • All downloads fail
  • when I go back to the book page, somehow, my borrow expires!

I tried 3+ times the above and always the same result... Quite sad :<

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment