I'm currently implementing support for AcroForm radio button fields. The resulting PDF objects look fine to me but the radio buttons won't work in Adobe Reader and Evince.
require 'hexapdf' | |
doc = HexaPDF::Document.open(ARGV[0]) | |
outline = doc.catalog[:Outlines] | |
first_entry = outline[:First] | |
last_entry = outline[:Last] | |
current = first_entry | |
loop do | |
puts current[:Title] |
Edit: I think I found the problem: Acrobat needs the Catalog dictionary to be an indirect reference that is not in an object streams.
- good3.pdf: PDF encrypted with (A)RC4 using V=4, PDF version 1.5, cross-reference and object streams, Catalog dictionary not in the object stream.
require 'benchmark-driver' | |
setup_code = <<EOF | |
require 'ramda' | |
def transduce(transformation, reducing_fn, initial, input) | |
input.reduce(initial, &transformation.call(reducing_fn)) | |
end | |
PUSHES = -> list, item { list.push(item) } |
const readline = require('readline'); | |
const fs = require('fs'); | |
var PDFDocument = require('pdfkit'); | |
var top_margin = 72 + 0.5 * 72; | |
var bottom_margin = 842 - 72 - 0.5 * 72; | |
var margins = {top: 0, bottom: 0, left: 72, right: 72}; | |
var pdf = new PDFDocument({size: 'A4', autoFirstPage: false, margins: margins}); | |
var y = 842; | |
var font = process.argv[4] || 'Times-Roman'; |
HexaPDF is a pure Ruby library with an accompanying application for working with PDF files. In short, it allows
- creating new PDF files,
- manipulating existing PDF files,
- merging multiple PDF files into one,
- extracting meta information, text, images and files from PDF files,
- securing PDF files by encrypting them and
- optimizing PDF files for smaller file size or other criteria.
This is a follow-up benchmark to the one comparing the basic text output performance between Hexapdf, Ruby Prawn and other libraries.
This time the performance of line wrapping and simple general layouting is tested. Again, the Project Gutenberg text of Homer's Odyssey is used for this purposes. The used Ruby scripts are attached below.
The text of the Odyssey is arranged on pages of the dimension 400x1000 and 200x1000, and once with the standard PDF Type1 font Times-Roman and once with the TrueType font Times New Roman. In the case of pages of size 400x1000 no line wrapping needs to be done because each line is shorter than 400 points. In the other case (200x1000) lines need to be actually wrapped and the resulting PDF has roughly twice the number of pages.
Results:
|-------------------------------------------------------------------|
This is the code for the HexaPDF post "Simple Text Metrics": https://hexapdf.gettalong.org/news/2017/simple-text-metrics.html
When creating a PDF it depends on the application writing the PDF whether decomposed Unicode characters ("combining sequences") are correctly positioned.
The basic way (that most applications use) is to just treat the separate Unicode characters as if they were normal characters. This leads to incorrectly positioned combining marks as the glyph width of the combining mark is not suitable for all characters it can be combined with.
A better way would be to perform Unicode normalization (see http://unicode.org/reports/tr15/), more specifically Normalization Form C (NFC) which composes characters if possible (in contrast to NFD which decomposes them). However, this may lead to changes in the meaning of some characters (see the link and scroll down to figure 3).
The best way would be to use fonts that contain all needed information to correctly position combining characters. Many modern OpenType fonts include such information in internal structures (like the GPOS table). Note that the application writing the P