Created
June 1, 2014 03:52
-
-
Save terryoy/e48cbb0069baf7c98748 to your computer and use it in GitHub Desktop.
PDF download and merge scripts (this script specifically downloads the Ouyang Family Book from National Library of China)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
# (book part 1) | |
for i in {1..101} | |
do | |
wget "http://mylib.nlc.gov.cn/system/doc/pdfBooks/books/9831679/20120824_05/1302019/$i" -O "001_$i.pdf" | |
done |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
# (book part 2) | |
for i in {1..131} | |
do | |
wget "http://mylib.nlc.gov.cn/system/doc/pdfBooks/books/9831679/20120824_05/1302020/$i" -O "002_$i.pdf" | |
done |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
# (book part 3) | |
for i in {1..103} | |
do | |
wget "http://mylib.nlc.gov.cn/system/doc/pdfBooks/books/9831679/20120824_05/1302021/$i" -O "003_$i.pdf" | |
done |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
# (book part 4) | |
for i in {1..225} | |
do | |
wget "http://mylib.nlc.gov.cn/system/doc/pdfBooks/books/9831679/20120824_05/1302022/$i" -O "004_$i.pdf" | |
done |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
# (book part 5) | |
for i in {1..233} | |
do | |
wget "http://mylib.nlc.gov.cn/system/doc/pdfBooks/books/9831679/20120824_05/1302023/$i" -O "005_$i.pdf" | |
done |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/python | |
# require PyPDF2 (e.g. "pip install PyPDF2") | |
from PyPDF2 import * | |
import os, time | |
PDF_PATH = './pdftest' | |
EXPORT_FILE = 'all.pdf' | |
filelist = [pdfname for pdfname in os.listdir(PDF_PATH) if pdfname.endswith('.pdf')] | |
filelist.sort(key=lambda x: # sort file name in number format (e.g. "XXX_XXX.pdf" | |
[int(y) for y in x.replace('.pdf', '').split('_')]) | |
merger = PdfFileMerger() | |
for fn in filelist: | |
inputfile = file(os.path.join(PDF_PATH, fn), 'rb') | |
merger.append(inputfile) | |
merger.write(file(EXPORT_FILE, 'wb')) | |
merger.close() |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment