Skip to content

Instantly share code, notes, and snippets.

@metaxy
Last active March 4, 2024 04:32
Show Gist options
  • Save metaxy/432a4a3fde759027c74c3402f43a4f5f to your computer and use it in GitHub Desktop.
Save metaxy/432a4a3fde759027c74c3402f43a4f5f to your computer and use it in GitHub Desktop.
Making good pdfs on linux

1. Fixing PDFs

Rotate and split pages

PDF Arranger: [Github] | [ubuntuusers.de] (5/5 ⭐)

2. Create a good Scan

2.1. Extract images of the pages

apt-get install -y poppler-utils
pdftoppm -jpeg -r 300 file.pdf out

2.2. Scantailor

Download Scantailor Universal - This version of scantailor works best on linux.

Move the the images all in one folder and open a new project.

Add a margin of 10mm to the output.

2.3. Create Ultra Small Files

Download and compile https://github.com/agl/jbig2enc

sudo apt install libleptonica-dev python2
./configure
make
sudo make install

Prepare the files with Scantailor. Go to the folder out.

wget https://raw.githubusercontent.com/agl/jbig2enc/master/pdf.py
jbig2 -s -p -v *.tif
python2 pdf.py output > small.pdf

2.4. Searchable PDF

Install OCRmyPDF https://github.com/jbarlow83/OCRmyPDF

sudo apt install ocrmypdf tesseract-ocr-deu tesseract-ocr-eng tesseract-ocr-rus
ocrmypdf --jbig2-lossy -l eng small.pdf ocr.pdf

3. Epub To PDF

With Calibre

Serifen: Meta Serif Pro Serifenlos: Gilroy Nicht-pro: Source Code Pro

Schrift: 16px

Rand: 20px

Heuristisch: an

Format: A4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment