Skip to content

Instantly share code, notes, and snippets.

@gettalong
gettalong / README.md
Last active June 9, 2021 17:57
Using a TrueType font with HexaPDF

HexaPDF is now able to use a TrueType font to generate content. There are still some limitations, like the missing support for subsets but most things work quite well already. Complete integration into the Canvas and font selection API is also not done yet.

The attached script generates a PDF showcasing all available glyphs defined in a font as well as a sample text containing characters from the Unicode BMP as well as from other Unicode planes.

@gettalong
gettalong / README.md
Last active August 18, 2023 21:22
Performance comparison of simple text rendering between Python reportlab, Ruby Prawn and HexaPDF

The Python PDF generation library reportlab contains a demo/benchmarking application that takes the Project Gutenberg text of Homer's Odyssey and creates a PDF version from it. This text contains 10.437 lines and about 611.000 characters.

The PDF is generated by simply showing each line of the source text, without wrapping or any other advanced text facilities, once using the built-in standard PDF fonts and once using a TrueType font, creating PDF documents with 232 pages.

This is a nice test of raw text output performance and, as noted above, doesn't need any advanced text layout facilities.

In addition to reportlab I have ported the code to Ruby's Prawn, Perl's PDF::API2 and PHP's TCPDF libraries, to have a broader comparison. Note that reportlab has a module implemented in C that replaces various CPU intensive methods. There is an extra entry for that version of reportlab.

The file script.sh is a small wrapper script that calls the binaries and records runtime, memory use and the size of the created

@gettalong
gettalong / standard_pdf_fonts.pdf
Last active July 22, 2016 21:36
HexaPDF examples showing off the standard 14 PDF fonts
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@gettalong
gettalong / README.md
Last active June 28, 2016 06:05
HexaPDF show_boxes.rb example

This is a HexaPDF example for parsing content streams and working with the text parts.

HexaPDF provides the class HexaPDF::Content::Processor for processing the operators of content streams. By subclassing we can define custom behavior for each operator. This could, for instance, be used to render the contents of a page.

However, in this case we want to show how text can be handled. Since the text inside a content stream is encoded, we need to decode it before we can use it as UTF-8 string. For this HexaPDF provides two helper methods #decode_text and #decode_text_with_positioning.

The first one just decodes and returns the text itself as string. This is useful when one just wants to get basic information out of a PDF. The second one, however, returns the text together with positioning information. This could be used, for example, to correctly show the text parts of a PDF page on the console or to convert a PDF into a text file with correct text runs.

The example uses the second method to draw r

def invoke(operator, *operands)
@operators[operator].invoke(self, *operands)
serialize(operator, *operands)
end
@gettalong
gettalong / strscan-part.c
Created October 22, 2015 20:30
Possible `StringScanner#scan_float` method
static VALUE
strscan_scan_float(VALUE self)
{
struct strscanner *p;
double retval;
char *start;
char *end;
GET_SCANNER(self, p);
if (EOS_P(p))
@gettalong
gettalong / README.md
Last active August 30, 2015 06:01
HexaPDF Graphics Primitives

This is a demo program showing the graphics primitives for drawing on PDF content streams or modifying the graphics state.

The following primitives are used:

    1. row: Coordinate system transformations (translate, scale, rotate, skew)
    1. row: Graphics state parameters for stroking (line width, line cap style, line join style, miter limit, line dash pattern)
    1. row: Basic shapes (line, polyline, rectangle, rounded rectangle, polygon, rounded polygon, circle, ellipse)
    1. row: Additional shapes (circular arc, elliptical arc wo/w inclination, composite arcs)
  • 5./6. row: Path painting (first four columns) and clipping path (last column) operations
    1. row: A square with a corner radius equal to the length of its sides, a composite elliptical annulus, a pie chart, a picture and all of the previous encapsulated as form XObject and then drawn
@gettalong
gettalong / performance.md
Last active December 31, 2020 12:39
HexaPDF Performance Comparison

A short and very unscientific comparison of the performance of HexaPDF to other PDF utilities when reading, eventually optimizing and then writing a file.

When available, multiple compression modes are compares:

  • No indicator - no compression done
  • C - Compacting by removing unused and deleted objects
  • S - Usage of object and cross-reference streams
  • P - Recompression of page content streams

For the HexaPDF tests, the hexapdf binary was used with different options for the optimization command:

@gettalong
gettalong / webgen_performance.md
Created January 19, 2014 11:30
Performance optimizations for webgen

StackProf

I recently came across the stackprof gem for Ruby 2.1.0 and decided to give it a spin by analyzing a webgen run of the webgen website.

StackProf is a sampling call-stack profiler like Google perftools but built only using functionality available in Ruby 2.1 itself. It is very fast, the overhead is barely noticeable.

The webgen website is probably the most complex webgen website I currently use, uses all (or nearly all) features of webgen and is therefore perfect for the task.

Pre-Optimization performance

@gettalong
gettalong / gist:2869794
Created June 4, 2012 17:46
Using Ruby in a Bash function to shorten the CWD
function shorten_pwd {
ruby -e "puts Dir.pwd.sub(/^#{ENV['HOME']}/, '~').split('/').map {|l| l.length > 6 ? l[0,3] << '…' << l[-3,3] : l}.join('/')"
}