Skip to content

Instantly share code, notes, and snippets.

@nazarovsky
Forked from drelatgithub/docin-dl.py
Last active October 17, 2022 06:21
Show Gist options
  • Save nazarovsky/f7d0ad0685ae12bbc7b0680351f82458 to your computer and use it in GitHub Desktop.
Save nazarovsky/f7d0ad0685ae12bbc7b0680351f82458 to your computer and use it in GitHub Desktop.
Docin document downloader
###############################################################################
#
# Docin document downloader
#
# Valid as of 2022-16-08
#
###############################################################################
import argparse
import os
from types import SimpleNamespace
import urllib.request
conf = SimpleNamespace(
docin_pid = 0,
output_dir = ""
)
def download_image(pid):
i = 0
while True:
i += 1
try:
urllib.request.urlretrieve(
"http://211.147.220.164/index.jsp?file={}&pageno={}&width=1836&height=2376".format(pid, i),
os.path.join(conf.output_dir, "{0:03d}.png".format(i))
)
except urllib.error.HTTPError:
break
else:
print("Page", i, "saved.")
if __name__ == "__main__":
# Parse the arguments
parser = argparse.ArgumentParser()
parser.add_argument("docin_pid", type=str, help="The number after \"p-\" in docin url")
parser.add_argument("output_dir", type=str, help="The output directory")
args = parser.parse_args()
conf.docin_pid = args.docin_pid
conf.output_dir = args.output_dir
# Do the work
download_image(conf.docin_pid)
@matthewpenkala
Copy link

Wish we could find a way to auto-remove the watermark!

If it's just a simple B&W image (white background & black-ish text), then a very primitive levels adjustment can be made to offset the whites value (a brighter white point I believe) and subsequently crush those grey values. This obviously isn't ideal though...

@nazarovsky
Copy link
Author

nazarovsky commented Oct 17, 2022 via email

@nazarovsky
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment