Skip to content

Instantly share code, notes, and snippets.

@gettalong
Last active August 18, 2023 21:22
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save gettalong/0d7c576064725774299cdf4d1a51d2b9 to your computer and use it in GitHub Desktop.
Save gettalong/0d7c576064725774299cdf4d1a51d2b9 to your computer and use it in GitHub Desktop.
Performance comparison of simple text rendering between Python reportlab, Ruby Prawn and HexaPDF

The Python PDF generation library reportlab contains a demo/benchmarking application that takes the Project Gutenberg text of Homer's Odyssey and creates a PDF version from it. This text contains 10.437 lines and about 611.000 characters.

The PDF is generated by simply showing each line of the source text, without wrapping or any other advanced text facilities, once using the built-in standard PDF fonts and once using a TrueType font, creating PDF documents with 232 pages.

This is a nice test of raw text output performance and, as noted above, doesn't need any advanced text layout facilities.

In addition to reportlab I have ported the code to Ruby's Prawn, Perl's PDF::API2 and PHP's TCPDF libraries, to have a broader comparison. Note that reportlab has a module implemented in C that replaces various CPU intensive methods. There is an extra entry for that version of reportlab.

The file script.sh is a small wrapper script that calls the binaries and records runtime, memory use and the size of the created file. Here are the results using ruby 2.4.2p198 (if you use Ruby 2.3 you will get worse results, for both time and memory):

|-----------------------------------------------------------------|
|                           |     Time |     Memory |   File size |
|-----------------------------------------------------------------|
| hexapdf     1x            |    617ms |  22,468KiB |     395,530 |
| prawn       1x            |    570ms |  19,232KiB |     537,468 |
| reportlab   1x            |    446ms |  21,320KiB |     413,116 |
| reportlab/C 1x            |    351ms |  23,908KiB |     413,116 |
| tcpdf       1x            |    970ms |  33,016KiB |     546,968 |
| PDF::API2   1x            |    954ms |  28,256KiB |     402,794 |
|-----------------------------------------------------------------|
| hexapdf     5x            |  2,517ms |  36,912KiB |   1,973,094 |
| prawn       5x            |  2,361ms |  37,200KiB |   2,688,812 |
| reportlab   5x            |  2,014ms |  42,376KiB |   2,064,824 |
| reportlab/C 5x            |  1,359ms |  40,028KiB |   2,064,824 |
| tcpdf       5x            |  5,104ms |  44,032KiB |   2,714,482 |
| PDF::API2   5x            |  5,088ms |  41,060KiB |   2,001,755 |
|-----------------------------------------------------------------|
| hexapdf     10x           |  5,209ms |  58,608KiB |   3,945,926 |
| prawn       10x           |  4,793ms |  57,188KiB |   5,378,100 |
| reportlab   10x           |  4,327ms |  67,648KiB |   4,130,218 |
| reportlab/C 10x           |  2,822ms |  59,400KiB |   4,130,218 |
| tcpdf       10x           |  9,665ms |  60,348KiB |   5,424,797 |
| PDF::API2   10x           | 11,878ms |  56,552KiB |   4,001,642 |
|-----------------------------------------------------------------|
| hexapdf     1x ttf        |    781ms |  21,848KiB |     476,902 |
| prawn       1x ttf        |  1,386ms |  22,696KiB |     554,823 |
| reportlab   1x ttf        |    785ms |  27,240KiB |     543,167 |
| reportlab/C 1x ttf        |    663ms |  28,220KiB |     543,167 |
| tcpdf       1x ttf        |  1,316ms |  33,548KiB |     666,954 |
| PDF::API2   1x ttf        |  9,524ms |  48,504KiB |     588,768 |
|-----------------------------------------------------------------|
| hexapdf     5x ttf        |  2,890ms |  37,096KiB |   2,333,185 |
| prawn       5x ttf        |  6,078ms |  37,696KiB |   2,702,110 |
| reportlab   5x ttf        |  4,381ms |  54,352KiB |   2,644,870 |
| reportlab/C 5x ttf        |  4,126ms |  50,800KiB |   2,644,870 |
| tcpdf       5x ttf        |  5,876ms |  51,064KiB |   3,120,326 |
| PDF::API2   5x ttf        | 43,102ms |  69,460KiB |   2,694,564 |
|-----------------------------------------------------------------|
| hexapdf     10x ttf       |  5,905ms |  66,192KiB |   4,654,309 |
| prawn       10x ttf       | 11,875ms |  62,132KiB |   5,386,301 |
| reportlab   10x ttf       |  6,536ms |  87,668KiB |   5,272,797 |
| reportlab/C 10x ttf       |  5,078ms |  79,088KiB |   5,272,797 |
| tcpdf       10x ttf       | 12,357ms |  70,672KiB |   6,188,591 |
| PDF::API2   10x ttf       | 88,427ms |  96,004KiB |   5,327,987 |
|-----------------------------------------------------------------|

Some comments on the results:

  • Memory usage is quite the same for all test scripts, with Prawn and HexaPDF consuming the least amount of memory most of the time.
  • HexaPDF produces the smallest files out of the box (no version has compression enabled).
  • Comparing Prawn with HexaPDF performance-wise, Prawn is faster with the built-in fonts while HexaPDF is way faster with TrueType fonts. The C-backed implementation of reportlab is, naturally, the fastest, especially when using the built-in fonts.
$:.unshift(File.join(__dir__, '../../lib'))
require 'hexapdf'
a4 = HexaPDF::Type::Page::PAPER_SIZE[:A4]
top_margin = a4[3] - 72
bottom_margin = 72
first_offset = [72, top_margin - 0.5 * 72]
page_num = 1
started = Time.now
doc = HexaPDF::Document.new
canvas = doc.pages.add.canvas
canvas.font(ARGV[2] || 'Times', size: 12)
canvas.leading = 14
canvas.move_text_cursor(offset: first_offset)
y = first_offset[1]
font = canvas.font
File.foreach(ARGV[0], mode: 'r') do |line|
canvas.show_glyphs_only(font.decode_utf8(line.rstrip!))
#canvas.text(line.rstrip)
canvas.move_text_cursor
y -= 14
if y < bottom_margin + 0.5 * 72
page_num += 1
canvas.end_text
# Remove the canvas object out of scope for garbage collection
doc.clear_cache(canvas.context.data)
canvas.context.contents = canvas.context.contents
canvas = doc.pages.add.canvas
canvas.font(ARGV[2] || 'Times', size: 12)
canvas.leading = 14
canvas.move_text_cursor(offset: first_offset)
y = first_offset[1]
if page_num % 100 == 0
puts('formatted page %d' % page_num)
end
end
end
doc.write(ARGV[1])
finished = Time.now
elapsed = finished - started
speed = page_num / elapsed
puts('%d pages in %0.2f seconds = %0.2f pages per second' % [page_num, elapsed, speed])
use strict;
use warnings;
use PDF::API2;
my $pdf = PDF::API2->new(-file => $ARGV[1]);
$pdf->mediabox('A4');
my $top_margin = 842 - 72 - 0.5 * 72;
my $bottom_margin = 72 + 0.5 * 72;
my $font;
if ($ARGV[2]) {
$font = $pdf->ttfont($ARGV[2]);
} else {
$font = $pdf->corefont('Times-Roman');
}
my $y = 0;
my $page;
my $content;
open my $file, $ARGV[0];
while (my $line = <$file>) {
chomp $line;
if ($y < $bottom_margin) {
$page = $pdf->page();
$content = $page->text();
$content->lead(14);
$content->font($font, 12);
$content->distance(72, $top_margin);
$y = $top_margin;
}
$content->text($line);
$content->cr();
$y = $y - 14;
}
$pdf->save();
require 'prawn'
started = Time.now
page_num = 0
Prawn::Document.generate(ARGV[1], page_size: "A4", compress: true, margin: 72) do |doc|
doc.font(ARGV[2] ? ARGV[2] : 'Times-Roman')
doc.font_size(12)
y = doc.margin_box.absolute_top - 0.5 * 72
File.foreach(ARGV[0], mode: 'r') do |line|
#doc.text(line.rstrip, leading: 3)
#doc.draw_text(line.rstrip, at: [0, y])
doc.add_text_content(line.rstrip!, doc.margin_box.absolute_left, y, {})
y -= 14
if y < 72 + 0.5 * 72
doc.start_new_page
y = doc.margin_box.absolute_top - 0.5 * 72
if doc.page_number % 100 == 0
puts('formatted page %d' % doc.page_number)
end
end
end
page_num = doc.page_number
end
finished = Time.now
elapsed = finished - started
speed = page_num / elapsed
puts('%d pages in %0.2f seconds = %0.2f pages per second' % [page_num, elapsed, speed])
#Copyright ReportLab Europe Ltd. 2000-2012
#####################################################################################
#
# Copyright (c) 2000-2014, ReportLab Inc.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
# are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
# * Neither the name of the company nor the names of its contributors may be
# used to endorse or promote products derived from this software without
# specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
# IN NO EVENT SHALL THE OFFICERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
# INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
# TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
# OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER
# IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
# IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
# SUCH DAMAGE.
#
#####################################################################################
from reportlab.pdfgen import canvas
from reportlab import rl_config
import time, os, sys
from reportlab.lib.units import inch, cm
from reportlab.lib.pagesizes import A4
from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont
rl_config.useA85 = 0
rl_config.ttfAsciiReadable = 0
font = 'Times-Roman'
if len(sys.argv) == 4:
pdfmetrics.registerFont(TTFont('font', sys.argv[3]))
font = 'font'
#precalculate some basics
top_margin = A4[1] - inch
bottom_margin = inch
left_margin = inch
right_margin = A4[0] - inch
frame_width = right_margin - left_margin
def run():
started = time.time()
canv = canvas.Canvas(sys.argv[2], invariant=0)
canv.setPageCompression(1)
canv.setFont(font, 12)
tx = canv.beginText(left_margin, top_margin - 0.5*inch)
tx.setLeading(14)
data = open(sys.argv[1],'r').readlines()
for line in data:
#this just does it the fast way...
tx.textLine(line.rstrip())
#page breaking
y = tx.getY() #get y coordinate
if y < bottom_margin + 0.5*inch:
canv.drawText(tx)
canv.showPage()
canv.setFont(font, 12)
tx = canv.beginText(left_margin, top_margin - 0.5*inch)
tx.setLeading(14)
#page
pg = canv.getPageNumber()
if pg % 100 == 0:
print('formatted page %d' % canv.getPageNumber())
if tx:
canv.drawText(tx)
canv.showPage()
canv.save()
finished = time.time()
elapsed = finished - started
pages = canv.getPageNumber()-1
speed = pages / elapsed
print('%d pages in %0.2f seconds = %0.2f pages per second' % (
pages, elapsed, speed))
run()
#/bin/bash
OUT_FILE=/tmp/bench-result.pdf
TXT_FILE=odyssey.txt
TXT_FILE_5X=/tmp/5odyssey.txt
TXT_FILE_10X=/tmp/10odyssey.txt
TTF=/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf
trap exit 2
function bench_cmd() {
cmdname=$1
FORMAT="| %-25s | %'6ims | %'7iKiB | %'11i |\n"
shift
time=$(date +%s%N)
/usr/bin/time -f '%M' -o /tmp/bench-times "$@" &>/dev/null
if [ $? -ne 0 ]; then
cmdname="ERR ${cmdname}"
time=0
mem_usage=0
file_size=0
else
time=$(( ($(date +%s%N)-time)/1000000 ))
mem_usage=$(cat /tmp/bench-times)
file_size=$(stat -c '%s' $OUT_FILE)
fi
printf "$FORMAT" "$cmdname" "$time" "$mem_usage" "$file_size"
}
cd $(dirname $0)
cat {,,,,}odyssey.txt > $TXT_FILE_5X
cat $TXT_FILE_5X $TXT_FILE_5X > $TXT_FILE_10X
declare -A inputs
inputs["1x"]=$TXT_FILE
inputs["5x"]=$TXT_FILE_5X
inputs["10x"]=$TXT_FILE_10X
if [[ $# -ge 1 ]]; then
KEYS="$1"
shift
else
KEYS="1x 5x 10x"
fi
if [[ $# -ge 1 ]]; then
TTFS="$1"
shift
else
TTFS="${TTF}"
fi
echo "|-----------------------------------------------------------------|"
echo "| | Time | Memory | File size |"
echo "|-----------------------------------------------------------------|"
for ttf in "" $TTFS; do
for key in $KEYS; do
file=${inputs[$key]}
bench_cmd "hexapdf ${key} ${ttf: -3}" ruby -I../../lib hexapdf.rb $file ${OUT_FILE} $ttf
bench_cmd "prawn ${key} ${ttf: -3}" ruby prawn.rb $file ${OUT_FILE} $ttf
bench_cmd "reportlab ${key} ${ttf: -3}" python rlcli.py $file ${OUT_FILE} $ttf
bench_cmd "reportlab/C ${key} ${ttf: -3}" python3 rlcli.py $file ${OUT_FILE} $ttf
bench_cmd "tcpdf ${key} ${ttf: -3}" php tcpdf.php $file ${OUT_FILE} $ttf
bench_cmd "PDF::API2 ${key} ${ttf: -3}" perl pdfapi.pl $file ${OUT_FILE} $ttf
echo "|-----------------------------------------------------------------|"
done
done
echo
<?php
require_once('tcpdf/tcpdf.php');
$pdf = new TCPDF('P', 'pt', 'A4', true, 'UTF-8', false);
$pdf->SetMargins(72, 72 + 0.5 * 36, 0);
$pdf->SetPrintHeader(false);
$pdf->SetPrintFooter(false);
$pdf->SetAutoPageBreak(TRUE, 72 + 0.5 * 36);
if ($argc == 4) {
//Activate the following line, then run as root once to generate the needed files
//$font_name = TCPDF_FONTS::addTTFfont($argv[3], '', '', 32);
$font_name = 'dejavusans';
} else {
$font_name = 'times';
}
$pdf->setFontSubsetting(true);
$pdf->SetFont($font_name, '', 12, '', true);
$pdf->AddPage();
$pdf->setCellHeightRatio(1.2);
$handle = fopen($argv[1], 'r');
while (($line = fgets($handle)) !== false) {
$pdf->Cell(0, 0, $line, 0, 1, 'L', false, '', 0, false, 'T', 'T');
}
fclose($handle);
if (substr($argv[2], 0, 1) !== '/') {
$file = __DIR__ . '/' . $argv[2];
} else {
$file = $argv[2];
}
$pdf->Output($file, 'F');
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment