Skip to content

Instantly share code, notes, and snippets.

@terryoy
Created June 1, 2014 03:52
Show Gist options
  • Save terryoy/e48cbb0069baf7c98748 to your computer and use it in GitHub Desktop.
Save terryoy/e48cbb0069baf7c98748 to your computer and use it in GitHub Desktop.
PDF download and merge scripts (this script specifically downloads the Ouyang Family Book from National Library of China)
#!/bin/bash
# (book part 1)
for i in {1..101}
do
wget "http://mylib.nlc.gov.cn/system/doc/pdfBooks/books/9831679/20120824_05/1302019/$i" -O "001_$i.pdf"
done
#!/bin/bash
# (book part 2)
for i in {1..131}
do
wget "http://mylib.nlc.gov.cn/system/doc/pdfBooks/books/9831679/20120824_05/1302020/$i" -O "002_$i.pdf"
done
#!/bin/bash
# (book part 3)
for i in {1..103}
do
wget "http://mylib.nlc.gov.cn/system/doc/pdfBooks/books/9831679/20120824_05/1302021/$i" -O "003_$i.pdf"
done
#!/bin/bash
# (book part 4)
for i in {1..225}
do
wget "http://mylib.nlc.gov.cn/system/doc/pdfBooks/books/9831679/20120824_05/1302022/$i" -O "004_$i.pdf"
done
#!/bin/bash
# (book part 5)
for i in {1..233}
do
wget "http://mylib.nlc.gov.cn/system/doc/pdfBooks/books/9831679/20120824_05/1302023/$i" -O "005_$i.pdf"
done
#!/usr/bin/python
# require PyPDF2 (e.g. "pip install PyPDF2")
from PyPDF2 import *
import os, time
PDF_PATH = './pdftest'
EXPORT_FILE = 'all.pdf'
filelist = [pdfname for pdfname in os.listdir(PDF_PATH) if pdfname.endswith('.pdf')]
filelist.sort(key=lambda x: # sort file name in number format (e.g. "XXX_XXX.pdf"
[int(y) for y in x.replace('.pdf', '').split('_')])
merger = PdfFileMerger()
for fn in filelist:
inputfile = file(os.path.join(PDF_PATH, fn), 'rb')
merger.append(inputfile)
merger.write(file(EXPORT_FILE, 'wb'))
merger.close()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment