Skip to content

Instantly share code, notes, and snippets.

@gettalong
Last active January 22, 2020 02:13
Show Gist options
  • Save gettalong/8afae547ac3e50e9b8ce6c521a2a0eea to your computer and use it in GitHub Desktop.
Save gettalong/8afae547ac3e50e9b8ce6c521a2a0eea to your computer and use it in GitHub Desktop.
Performance comparison of line wrapping between Ruby Prawn and HexaPDF

This is a follow-up benchmark to the one comparing the basic text output performance between Hexapdf, Ruby Prawn and other libraries.

This time the performance of line wrapping and simple general layouting is tested. Again, the Project Gutenberg text of Homer's Odyssey is used for this purposes. The used Ruby scripts are attached below.

The text of the Odyssey is arranged on pages of the dimension 400x1000 and 200x1000, and once with the standard PDF Type1 font Times-Roman and once with the TrueType font Times New Roman. In the case of pages of size 400x1000 no line wrapping needs to be done because each line is shorter than 400 points. In the other case (200x1000) lines need to be actually wrapped and the resulting PDF has roughly twice the number of pages.

Results:

|-------------------------------------------------------------------|
|                           |      Time |      Memory |   File size |
|-------------------------------------------------------------------|
| hexapdf     400           |   1,913ms |   72,584KiB |     390,619 |
| prawn       400           |  19,043ms |   49,324KiB |     460,898 |
| reportlab   400           |   3,389ms |   52,436KiB |     425,470 |
| tcpdf       400           |   2,716ms |  121,108KiB |     443,646 |
|-------------------------------------------------------------------|
| hexapdf     200           |   2,513ms |   70,064KiB |     495,017 |
| prawn       200           |  27,068ms |   48,600KiB |     585,932 |
| reportlab   200           |   3,449ms |   52,832KiB |     509,965 |
| tcpdf       200           | 181,253ms |  141,932KiB |     583,118 |
|-------------------------------------------------------------------|
| hexapdf     400 ttf       |   2,154ms |   72,984KiB |     462,051 |
| prawn       400 ttf       |  16,344ms |   47,816KiB |     490,402 |
| reportlab   400 ttf       |   3,071ms |   58,668KiB |     543,667 |
| tcpdf       400 ttf       |   3,321ms |  144,048KiB |     551,846 |
|-------------------------------------------------------------------|
| hexapdf     200 ttf       |   2,559ms |   71,700KiB |     583,832 |
| prawn       200 ttf       |  26,756ms |   54,440KiB |     628,400 |
| reportlab   200 ttf       |   3,535ms |   59,872KiB |     647,250 |
| tcpdf       200 ttf       | 197,758ms |  143,964KiB |     713,095 |
|-------------------------------------------------------------------|

Comments:

HexaPDF is much faster than Prawn in all cases and produces smaller files, but uses about 1.45 times the memory.

However, the comparison is not completely fair due to the way HexaPDF handles text layouting. When the HexaPDF::Layout::TextLayouter object is created, the Unicode text is converted into Glyph objects. Then box.fit is called and these Glyph objects are run first through the text segmentation algorithm and then through the line wrapping algorithm. The not fitting pieces are returned as rest in the script. However, since the objects in rest have already been run through the text segmentation algorithm, this step can be skipped the next time box.fit is called.

In contrast Prawn returns the parts that don't fit into the text box as String which has to run through the text segmentation algorithm every time. I don't know if this is the whole reason why Prawn is that much slower, will have to look at its source code to see if I'm using a method that does much, much more than the current HexaPDF equivalent.

For the reportlab variant it may be possible to use the basic Paragraph flowable and do the splitting manually but I didn't get that to work.

And also for TCPDF there may be more optimized methods for doing this benchmark.

$:.unshift(File.join(__dir__, '../../lib'))
require 'hexapdf'
file = ARGV[0]
width = ARGV[1].to_i
height = 1000
doc = HexaPDF::Document.new
tl = HexaPDF::Layout::TextLayouter.create(File.read(file), width: width, height: height,
font_features: {kern: false}, font_size: 10,
font: doc.fonts.add(ARGV[3] || "Times"))
tl.style.line_spacing(:fixed, 11.16)
while !tl.items.empty?
canvas = doc.pages.add([0, 0, width, height]).canvas
tl.items, = tl.draw(canvas, 0, height)
end
doc.write(ARGV[2])
require 'prawn'
file = ARGV[0]
width = ARGV[1].to_i
height = 1000
Prawn::Document.generate(ARGV[2], page_size: [width, height], compress: true, margin: 0) do |doc|
doc.font(ARGV[3] ? ARGV[3] : 'Times-Roman')
doc.font_size(10)
text = File.read(file)
while !text.empty?
text = doc.text_box(text, at: [0, height], width: width, height: height, kerning: false)
doc.start_new_page unless text.empty?
end
end
#Copyright ReportLab Europe Ltd. 2000-2012
#see license.txt for license details
import sys, copy, os
from reportlab.platypus import *
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.lib.enums import TA_LEFT
from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont
import reportlab.rl_config
reportlab.rl_config.invariant = 0
reportlab.rl_config.useA85 = 0
reportlab.rl_config.ttfAsciiReadable = 0
styles = getSampleStyleSheet()
Elements = []
font = 'Times-Roman'
if len(sys.argv) == 5:
pdfmetrics.registerFont(TTFont('font', sys.argv[4]))
font = 'font'
ParaStyle = copy.deepcopy(styles["Normal"])
ParaStyle.fontName = font
ParaStyle.fontsize = 10
ParaStyle.leading = 11.16
ParaStyle.alignment = TA_LEFT
ParaStyle.allowOrphans = 1
ParaStyle.allowWidows = 1
ParaStyle.spaceBefore = 0
ParaStyle.spaceAfter = 0
height = 1000
width = int(sys.argv[2])
def myPage(canvas, doc):
canvas.saveState()
canvas.restoreState()
def go():
doc =SimpleDocTemplate(sys.argv[3], pagesize=(width, height), leftMargin=0, rightMargin=0, topMargin=0, bottomMargin=0)
doc.build(Elements, myPage, myPage)
def p(txt, style=ParaStyle):
Elements.append(Paragraph(txt, style))
def parseOdyssey(fn):
text = open(fn,'r').read()
#p(text)
L=list(map(str.strip, text.split('\n')))
for P in L:
if not P:
P = ':'
p(P)
go()
parseOdyssey(sys.argv[1])
<?php
require_once('tcpdf/tcpdf.php');
$pdf = new TCPDF('P', 'pt', array($argv[2], 1000), true, 'UTF-8', false);
$pdf->SetMargins(0, 0, 0, 0);
$pdf->SetPrintHeader(false);
$pdf->SetPrintFooter(false);
$pdf->SetAutoPageBreak(TRUE);
if ($argc == 5) {
//Activate the following line, then run as root once to generate the needed files
//$font_name = TCPDF_FONTS::addTTFfont($argv[4], '', '', 32);
$font_name = 'dejavusans';
} else {
$font_name = 'times';
}
$pdf->setFontSubsetting(true);
$pdf->SetFont($font_name, '', 10, '', true);
$pdf->AddPage();
$pdf->setCellHeightRatio(1.12);
$utf8text = file_get_contents($argv[1], false);
$pdf->Write(2, $utf8text, '', 0, '', false, 0, false, false, 0);
if (substr($argv[3], 0, 1) !== '/') {
$file = __DIR__ . '/' . $argv[3];
} else {
$file = $argv[3];
}
$pdf->Output($file, 'F');
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment