Skip to content

Instantly share code, notes, and snippets.

@user202729
Last active January 20, 2024 20:50
Show Gist options
  • Save user202729/f9a81a9cdc1bd7b2f7d06b8465fcbe19 to your computer and use it in GitHub Desktop.
Save user202729/f9a81a9cdc1bd7b2f7d06b8465fcbe19 to your computer and use it in GitHub Desktop.

Commands to rasterize PDF

Resources for answers in https://unix.stackexchange.com/a/767380/296692 and https://superuser.com/a/1826641/577463 .

source_file.pdf

This file was generated by:

curl -o "CS3210 Reference.pdf" "https://raw.githubusercontent.com/btzy/homepage/master/nus/CS3210%20Reference.pdf"

pdftk "CS3210 Reference.pdf" cat 1 output a.pdf
pdfcrop --margin '15 15 15 15' a.pdf a.pdf
qpdf --rotate=90  a.pdf --replace-input
pdfjam  --paper a4paper a.pdf -o a.pdf

Reason: for a MS821 network printer, the generated a.pdf file cannot be printed correctly -- some cropped parts is still visible.

As can be seen in printer user guide, the maximum resolution supported by this printer is 1200 DPI.

Benchmark details

All commands are benchmarked with command time, whose output format is

3.90user 1.19system 0:05.13elapsed 99%CPU (0avgtext+0avgdata 36240maxresident)k
0inputs+0outputs (0major+6015minor)pagefaults 0swaps

The elapsed time and maximum resident memory is reported.

data=r"""
`pdf2ps` + `ps2pdf` with pipe (*)
gs -sDEVICE=ps2write -dNOCACHE -sOutputFile=- -q -dBATCH -dNOPAUSE a.pdf -c quit | ps2pdf - c.pdf
`pdf2ps` + `ps2pdf` with temporary file ([source](https://unix.stackexchange.com/a/634170/296692)) (*)
gs -sDEVICE=ps2write -dNOCACHE -sOutputFile=c.ps -q -dBATCH -dNOPAUSE a.pdf
ps2pdf c.ps c.pdf
`pdfimage24` (1200/2)
#Print time: 9.691
gs -dNOPAUSE -dBATCH -sDEVICE=pdfimage24 -r1200 -dDownScaleFactor=2 -o c.pdf a.pdf
`pdfimage24` (1200)
gs -dNOPAUSE -dBATCH -sDEVICE=pdfimage24 -r1200 -o c.pdf a.pdf
`pdfimage24` (2400/2)
#Print time: 15.822
gs -dNOPAUSE -dBATCH -sDEVICE=pdfimage24 -r2400 -dDownScaleFactor=2 -o c.pdf a.pdf
`pdfimage8` (2400/2)
gs -dNOPAUSE -dBATCH -sDEVICE=pdfimage8 -r2400 -dDownScaleFactor=2 -o c.pdf a.pdf
`pdfimage8` (1200)
#Print time: 9.539
gs -dNOPAUSE -dBATCH -sDEVICE=pdfimage8 -r1200 -o c.pdf a.pdf
`convert` (600)
#Print time: 9.884
convert -density 600 a.pdf c.pdf
`convert` (600) + `gs` to optimize ([source](https://superuser.com/a/1588781/577463))
convert -density 600 a.pdf b.pdf
gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=c.pdf b.pdf -q
`pdftoppm` (1200) (`.png`) + `img2pdf` ([source](https://superuser.com/a/1490924/577463)) (†)
pdftoppm -r 1200 -png a.pdf a
img2pdf a-1.png -o c.pdf
`pdftoppm` (1200) (`.jpg`) + `img2pdf` (†)
pdftoppm -r 1200 -jpeg a.pdf a
img2pdf a-1.jpg -o c.pdf
`pdftoppm` (1200) (`.tiff`) + `img2pdf` (†)
pdftoppm -r 1200 -tiff a.pdf a
img2pdf a-1.tif -o c.pdf
"""
data_items=data.strip().split("\n\n")
import re
import functools
from pathlib import Path
import subprocess
@functools.lru_cache
def benchmark(commands: str):
Path("/tmp/a.sh").write_text(commands)
Path("/tmp/c.pdf").unlink(missing_ok=True)
process=subprocess.run(["/bin/time", "bash", "a.sh"], cwd="/tmp/", stderr=subprocess.PIPE, stdout=subprocess.DEVNULL)
assert Path("/tmp/c.pdf").is_file()
match=re.search(rb" (\d+):(\d+\.\d+)+elapsed .* (\d+)maxresident\)k", process.stderr)
time_taken=int(match[1])*60+float(match[2])
maxresident=int(match[3])
file_size_bytes=Path("/tmp/c.pdf").stat().st_size
return time_taken, maxresident, file_size_bytes
result=[]
for item in data_items:
header,commands=item.strip().split('\n', maxsplit=1)
match=re.fullmatch(r"\s*#\s*Print time:\s*([0-9.]+)\n(.*)", commands, flags=re.DOTALL)
print_time=''
if match:
print_time=f"{float(match[1]):.01f}"
commands=match[2]
time_taken, maxresident, file_size_bytes=benchmark(commands)
result.append(( header, commands, time_taken, maxresident, file_size_bytes, print_time ))
result.sort(key=lambda x: x[2])
for header, commands, time_taken, maxresident, file_size_bytes, print_time in result:
print(f"| {header} | {time_taken:.3f} | {maxresident} | {file_size_bytes/1024:.1f} | {print_time} |")
for header, commands, time_taken, maxresident, file_size_bytes, print_time in result:
print(f"""
## {header}\n
```bash
{commands}
```
""".strip() + "\n")
# sort -t="|" -n -k2
# This is a bash script that is used to print a document and measure the time taken.
# Commands lpr and lpq must exist.
# The PDF file to be printed must be named a.pdf.
# It works by send the command with lpr, then lpq is used to repeatedly poll the printer until it's empty.
exit
echo "Printing" >> log.txt
starttime=$(python -c "import time; print(time.time())")
date +%M:%S.%N >> log.txt
lpr -Ppsc011-nb a.pdf >> log.txt
while true; do
echo :::::::: >> log.txt
date +%M:%S.%N >> log.txt
lpq -Ppsc011-nb |tee a.txt >> log.txt
if [ "$(cat a.txt)" = "no entries" ]; then
echo ======== done >> log.txt
break
fi
sleep 0.2s
done
python -c "import time, datetime; t=time.time()-$starttime; print(f'{t:.3f} |', datetime.timedelta(seconds=t))" |tee -a log.txt
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment