This is a follow-up benchmark to the one comparing the basic text output performance between Hexapdf, Ruby Prawn and other libraries.
This time the performance of line wrapping and simple general layouting is tested. Again, the Project Gutenberg text of Homer's Odyssey is used for this purposes. The used Ruby scripts are attached below.
The text of the Odyssey is arranged on pages of the dimension 400x1000 and 200x1000, and once with the standard PDF Type1 font Times-Roman and once with the TrueType font Times New Roman. In the case of pages of size 400x1000 no line wrapping needs to be done because each line is shorter than 400 points. In the other case (200x1000) lines need to be actually wrapped and the resulting PDF has roughly twice the number of pages.
Results:
|-------------------------------------------------------------------|
| | Time | Memory | File size |
|-------------------------------------------------------------------|
| hexapdf 400 | 1,913ms | 72,584KiB | 390,619 |
| prawn 400 | 19,043ms | 49,324KiB | 460,898 |
| reportlab 400 | 3,389ms | 52,436KiB | 425,470 |
| tcpdf 400 | 2,716ms | 121,108KiB | 443,646 |
|-------------------------------------------------------------------|
| hexapdf 200 | 2,513ms | 70,064KiB | 495,017 |
| prawn 200 | 27,068ms | 48,600KiB | 585,932 |
| reportlab 200 | 3,449ms | 52,832KiB | 509,965 |
| tcpdf 200 | 181,253ms | 141,932KiB | 583,118 |
|-------------------------------------------------------------------|
| hexapdf 400 ttf | 2,154ms | 72,984KiB | 462,051 |
| prawn 400 ttf | 16,344ms | 47,816KiB | 490,402 |
| reportlab 400 ttf | 3,071ms | 58,668KiB | 543,667 |
| tcpdf 400 ttf | 3,321ms | 144,048KiB | 551,846 |
|-------------------------------------------------------------------|
| hexapdf 200 ttf | 2,559ms | 71,700KiB | 583,832 |
| prawn 200 ttf | 26,756ms | 54,440KiB | 628,400 |
| reportlab 200 ttf | 3,535ms | 59,872KiB | 647,250 |
| tcpdf 200 ttf | 197,758ms | 143,964KiB | 713,095 |
|-------------------------------------------------------------------|
Comments:
HexaPDF is much faster than Prawn in all cases and produces smaller files, but uses about 1.45 times the memory.
However, the comparison is not completely fair due to the way HexaPDF handles text layouting. When the HexaPDF::Layout::TextLayouter
object is created, the Unicode text is converted into Glyph objects. Then box.fit
is called and these Glyph objects are run first through the text segmentation algorithm and then through the line wrapping algorithm. The not fitting pieces are returned as rest
in the script. However, since the objects in rest
have already been run through the text segmentation algorithm, this step can be skipped the next time box.fit
is called.
In contrast Prawn returns the parts that don't fit into the text box as String which has to run through the text segmentation algorithm every time. I don't know if this is the whole reason why Prawn is that much slower, will have to look at its source code to see if I'm using a method that does much, much more than the current HexaPDF equivalent.
For the reportlab variant it may be possible to use the basic Paragraph flowable and do the splitting manually but I didn't get that to work.
And also for TCPDF there may be more optimized methods for doing this benchmark.