gettalong/README.md

## README.md

      
    Raw
  

              README.md
            
          
    When creating a PDF it depends on the application writing the PDF whether decomposed Unicode characters ("combining sequences") are correctly positioned.
The basic way (that most applications use) is to just treat the separate Unicode characters as if they were normal characters. This leads to incorrectly positioned combining marks as the glyph width of the combining mark is not suitable for all characters it can be combined with.
A better way would be to perform Unicode normalization (see http://unicode.org/reports/tr15/), more specifically Normalization Form C (NFC) which composes characters if possible (in contrast to NFD which decomposes them). However, this may lead to changes in the meaning of some characters (see the link and scroll down to figure 3).
The best way would be to use fonts that contain all needed information to correctly position combining characters. Many modern OpenType fonts include such information in internal structures (like the GPOS table). Note that the application writing the PDF needs to be able to handle this information since for PDF glyph positioning is done by the writer, not the reader!
The script umlaut.rb uses HexaPDF to create a sample PDF that shows "Müller" twice, once in NFC form and once in NFD form. As can be seen the output of the NFD form positions the diaresis incorrectly (as expected since HexaPDF's Canvas#text method simply outputs the given Unicode string). The used font is Linux Libertine, a free font with many typographic features.
Also see: https://en.wikipedia.org/wiki/Combining_character

  
## umlaut.pdf

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              umlaut.pdf
            
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## umlaut.rb
require 'hexapdf'

doc = HexaPDF::Document.new
doc.config['font.map'] = {'sans' => {none: "LinLibertine_Rah.ttf"}}

canvas = doc.pages.add(doc.wrap(Type: :Page, MediaBox: [0, 0, 300, 200])).canvas
canvas.font("sans", size: 100)
canvas.text("Müller".unicode_normalize(:nfc), at: [10, 100])
canvas.text("Müller".unicode_normalize(:nfd), at: [10, 0])

doc.write('umlaut.pdf')
	require 'hexapdf'

	doc = HexaPDF::Document.new
	doc.config['font.map'] = {'sans' => {none: "LinLibertine_Rah.ttf"}}

	canvas = doc.pages.add(doc.wrap(Type: :Page, MediaBox: [0, 0, 300, 200])).canvas
	canvas.font("sans", size: 100)
	canvas.text("Müller".unicode_normalize(:nfc), at: [10, 100])
	canvas.text("Müller".unicode_normalize(:nfd), at: [10, 0])

	doc.write('umlaut.pdf')