gettalong/README.md

## README.md

      
    Raw
  

              README.md
            
          
    This is a follow-up benchmark to the one comparing the basic text output performance between Hexapdf, Ruby Prawn and other libraries.
This time the performance of line wrapping and simple general layouting is tested. Again, the Project Gutenberg text of Homer's Odyssey is used for this purposes. The used Ruby scripts are attached below.
The text of the Odyssey is arranged on pages of the dimension 400x1000 and 200x1000, and once with the standard PDF Type1 font Times-Roman and once with the TrueType font Times New Roman. In the case of pages of size 400x1000 no line wrapping needs to be done because each line is shorter than 400 points. In the other case (200x1000) lines need to be actually wrapped and the resulting PDF has roughly twice the number of pages.
Results:
|-------------------------------------------------------------------|
|                           |      Time |      Memory |   File size |
|-------------------------------------------------------------------|
| hexapdf     400           |   1,913ms |   72,584KiB |     390,619 |
| prawn       400           |  19,043ms |   49,324KiB |     460,898 |
| reportlab   400           |   3,389ms |   52,436KiB |     425,470 |
| tcpdf       400           |   2,716ms |  121,108KiB |     443,646 |
|-------------------------------------------------------------------|
| hexapdf     200           |   2,513ms |   70,064KiB |     495,017 |
| prawn       200           |  27,068ms |   48,600KiB |     585,932 |
| reportlab   200           |   3,449ms |   52,832KiB |     509,965 |
| tcpdf       200           | 181,253ms |  141,932KiB |     583,118 |
|-------------------------------------------------------------------|
| hexapdf     400 ttf       |   2,154ms |   72,984KiB |     462,051 |
| prawn       400 ttf       |  16,344ms |   47,816KiB |     490,402 |
| reportlab   400 ttf       |   3,071ms |   58,668KiB |     543,667 |
| tcpdf       400 ttf       |   3,321ms |  144,048KiB |     551,846 |
|-------------------------------------------------------------------|
| hexapdf     200 ttf       |   2,559ms |   71,700KiB |     583,832 |
| prawn       200 ttf       |  26,756ms |   54,440KiB |     628,400 |
| reportlab   200 ttf       |   3,535ms |   59,872KiB |     647,250 |
| tcpdf       200 ttf       | 197,758ms |  143,964KiB |     713,095 |
|-------------------------------------------------------------------|

Comments:
HexaPDF is much faster than Prawn in all cases and produces smaller files, but uses about 1.45 times the memory.
However, the comparison is not completely fair due to the way HexaPDF handles text layouting. When the HexaPDF::Layout::TextLayouter object is created, the Unicode text is converted into Glyph objects. Then box.fit is called and these Glyph objects are run first through the text segmentation algorithm and then through the line wrapping algorithm. The not fitting pieces are returned as rest in the script. However, since the objects in rest have already been run through the text segmentation algorithm, this step can be skipped the next time box.fit is called.
In contrast Prawn returns the parts that don't fit into the text box as String which has to run through the text segmentation algorithm every time. I don't know if this is the whole reason why Prawn is that much slower, will have to look at its source code to see if I'm using a method that does much, much more than the current HexaPDF equivalent.
For the reportlab variant it may be possible to use the basic Paragraph flowable and do the splitting manually but I didn't get that to work.
And also for TCPDF there may be more optimized methods for doing this benchmark.

  
## hexapdf.rb
$:.unshift(File.join(__dir__, '../../lib'))
require 'hexapdf'

file = ARGV[0]
width = ARGV[1].to_i
height = 1000

doc = HexaPDF::Document.new
tl = HexaPDF::Layout::TextLayouter.create(File.read(file), width: width, height: height,
                                          font_features: {kern: false}, font_size: 10,
                                          font: doc.fonts.add(ARGV[3] || "Times"))
tl.style.line_spacing(:fixed, 11.16)

while !tl.items.empty?
  canvas = doc.pages.add([0, 0, width, height]).canvas
  tl.items, = tl.draw(canvas, 0, height)
end

doc.write(ARGV[2])

## prawn.rb
require 'prawn'

file = ARGV[0]
width = ARGV[1].to_i
height = 1000

Prawn::Document.generate(ARGV[2], page_size: [width, height], compress: true, margin: 0) do |doc|
  doc.font(ARGV[3] ? ARGV[3] : 'Times-Roman')
  doc.font_size(10)

  text = File.read(file)
  while !text.empty?
    text = doc.text_box(text, at: [0, height], width: width, height: height, kerning: false)
    doc.start_new_page unless text.empty?
  end
end

## rlcli.py
#Copyright ReportLab Europe Ltd. 2000-2012
#see license.txt for license details

import sys, copy, os
from reportlab.platypus import *
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.lib.enums import TA_LEFT
from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont

import reportlab.rl_config
reportlab.rl_config.invariant = 0
reportlab.rl_config.useA85 = 0
reportlab.rl_config.ttfAsciiReadable = 0

styles = getSampleStyleSheet()

Elements = []

font = 'Times-Roman'
if len(sys.argv) == 5:
    pdfmetrics.registerFont(TTFont('font', sys.argv[4]))
    font = 'font'

ParaStyle = copy.deepcopy(styles["Normal"])
ParaStyle.fontName = font
ParaStyle.fontsize = 10
ParaStyle.leading = 11.16
ParaStyle.alignment = TA_LEFT
ParaStyle.allowOrphans = 1
ParaStyle.allowWidows = 1
ParaStyle.spaceBefore = 0
ParaStyle.spaceAfter = 0

height = 1000
width = int(sys.argv[2])

def myPage(canvas, doc):
    canvas.saveState()
    canvas.restoreState()

def go():
    doc =SimpleDocTemplate(sys.argv[3], pagesize=(width, height), leftMargin=0, rightMargin=0, topMargin=0, bottomMargin=0)
    doc.build(Elements, myPage, myPage)

def p(txt, style=ParaStyle):
    Elements.append(Paragraph(txt, style))

def parseOdyssey(fn):
    text = open(fn,'r').read()
    #p(text)
    L=list(map(str.strip, text.split('\n')))
    for P in L:
        if not P:
            P = ':'
        p(P)

    go()

parseOdyssey(sys.argv[1])

## tcpdf.php
<?php

require_once('tcpdf/tcpdf.php');

$pdf = new TCPDF('P', 'pt', array($argv[2], 1000), true, 'UTF-8', false);
$pdf->SetMargins(0, 0, 0, 0);
$pdf->SetPrintHeader(false);
$pdf->SetPrintFooter(false);

$pdf->SetAutoPageBreak(TRUE);

if ($argc == 5) {
  //Activate the following line, then run as root once to generate the needed files
  //$font_name = TCPDF_FONTS::addTTFfont($argv[4], '', '', 32);
  $font_name = 'dejavusans';
} else {
  $font_name = 'times';
}
$pdf->setFontSubsetting(true);
$pdf->SetFont($font_name, '', 10, '', true);

$pdf->AddPage();

$pdf->setCellHeightRatio(1.12);
$utf8text = file_get_contents($argv[1], false);
$pdf->Write(2, $utf8text, '', 0, '', false, 0, false, false, 0);

if (substr($argv[3], 0, 1) !== '/') {
  $file = __DIR__ . '/' . $argv[3];
} else {
  $file = $argv[3];
}

$pdf->Output($file, 'F');
	$:.unshift(File.join(__dir__, '../../lib'))
	require 'hexapdf'

	file = ARGV[0]
	width = ARGV[1].to_i
	height = 1000

	doc = HexaPDF::Document.new
	tl = HexaPDF::Layout::TextLayouter.create(File.read(file), width: width, height: height,
	font_features: {kern: false}, font_size: 10,
	font: doc.fonts.add(ARGV[3] \|\| "Times"))
	tl.style.line_spacing(:fixed, 11.16)

	while !tl.items.empty?
	canvas = doc.pages.add([0, 0, width, height]).canvas
	tl.items, = tl.draw(canvas, 0, height)
	end

	doc.write(ARGV[2])
	require 'prawn'

	file = ARGV[0]
	width = ARGV[1].to_i
	height = 1000

	Prawn::Document.generate(ARGV[2], page_size: [width, height], compress: true, margin: 0) do \|doc\|
	doc.font(ARGV[3] ? ARGV[3] : 'Times-Roman')
	doc.font_size(10)

	text = File.read(file)
	while !text.empty?
	text = doc.text_box(text, at: [0, height], width: width, height: height, kerning: false)
	doc.start_new_page unless text.empty?
	end
	end
	#Copyright ReportLab Europe Ltd. 2000-2012
	#see license.txt for license details

	import sys, copy, os
	from reportlab.platypus import *
	from reportlab.lib.styles import getSampleStyleSheet
	from reportlab.lib.enums import TA_LEFT
	from reportlab.pdfbase import pdfmetrics
	from reportlab.pdfbase.ttfonts import TTFont

	import reportlab.rl_config
	reportlab.rl_config.invariant = 0
	reportlab.rl_config.useA85 = 0
	reportlab.rl_config.ttfAsciiReadable = 0

	styles = getSampleStyleSheet()

	Elements = []

	font = 'Times-Roman'
	if len(sys.argv) == 5:
	pdfmetrics.registerFont(TTFont('font', sys.argv[4]))
	font = 'font'

	ParaStyle = copy.deepcopy(styles["Normal"])
	ParaStyle.fontName = font
	ParaStyle.fontsize = 10
	ParaStyle.leading = 11.16
	ParaStyle.alignment = TA_LEFT
	ParaStyle.allowOrphans = 1
	ParaStyle.allowWidows = 1
	ParaStyle.spaceBefore = 0
	ParaStyle.spaceAfter = 0

	height = 1000
	width = int(sys.argv[2])

	def myPage(canvas, doc):
	canvas.saveState()
	canvas.restoreState()

	def go():
	doc =SimpleDocTemplate(sys.argv[3], pagesize=(width, height), leftMargin=0, rightMargin=0, topMargin=0, bottomMargin=0)
	doc.build(Elements, myPage, myPage)

	def p(txt, style=ParaStyle):
	Elements.append(Paragraph(txt, style))

	def parseOdyssey(fn):
	text = open(fn,'r').read()
	#p(text)
	L=list(map(str.strip, text.split('\n')))
	for P in L:
	if not P:
	P = ':'
	p(P)

	go()

	parseOdyssey(sys.argv[1])
	<?php

	require_once('tcpdf/tcpdf.php');

	$pdf = new TCPDF('P', 'pt', array($argv[2], 1000), true, 'UTF-8', false);
	$pdf->SetMargins(0, 0, 0, 0);
	$pdf->SetPrintHeader(false);
	$pdf->SetPrintFooter(false);

	$pdf->SetAutoPageBreak(TRUE);

	if ($argc == 5) {
	//Activate the following line, then run as root once to generate the needed files
	//$font_name = TCPDF_FONTS::addTTFfont($argv[4], '', '', 32);
	$font_name = 'dejavusans';
	} else {
	$font_name = 'times';
	}
	$pdf->setFontSubsetting(true);
	$pdf->SetFont($font_name, '', 10, '', true);

	$pdf->AddPage();

	$pdf->setCellHeightRatio(1.12);
	$utf8text = file_get_contents($argv[1], false);
	$pdf->Write(2, $utf8text, '', 0, '', false, 0, false, false, 0);

	if (substr($argv[3], 0, 1) !== '/') {
	$file = __DIR__ . '/' . $argv[3];
	} else {
	$file = $argv[3];
	}

	$pdf->Output($file, 'F');