Skip to content

Instantly share code, notes, and snippets.

@gettalong
gettalong / pdfkit.js
Created January 28, 2018 08:58
pdfkit.js
const readline = require('readline');
const fs = require('fs');
var PDFDocument = require('pdfkit');
var top_margin = 72 + 0.5 * 72;
var bottom_margin = 842 - 72 - 0.5 * 72;
var margins = {top: 0, bottom: 0, left: 72, right: 72};
var pdf = new PDFDocument({size: 'A4', autoFirstPage: false, margins: margins});
var y = 842;
var font = process.argv[4] || 'Times-Roman';
@gettalong
gettalong / README.md
Last active October 29, 2017 16:41
Unicode NFC/NFD differences in PDF

When creating a PDF it depends on the application writing the PDF whether decomposed Unicode characters ("combining sequences") are correctly positioned.

The basic way (that most applications use) is to just treat the separate Unicode characters as if they were normal characters. This leads to incorrectly positioned combining marks as the glyph width of the combining mark is not suitable for all characters it can be combined with.

A better way would be to perform Unicode normalization (see http://unicode.org/reports/tr15/), more specifically Normalization Form C (NFC) which composes characters if possible (in contrast to NFD which decomposes them). However, this may lead to changes in the meaning of some characters (see the link and scroll down to figure 3).

The best way would be to use fonts that contain all needed information to correctly position combining characters. Many modern OpenType fonts include such information in internal structures (like the GPOS table). Note that the application writing the P

@gettalong
gettalong / README.md
Created May 10, 2017 18:29
Simple Text Metrics
@gettalong
gettalong / standard_pdf_fonts.pdf
Last active July 22, 2016 21:36
HexaPDF examples showing off the standard 14 PDF fonts
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@gettalong
gettalong / README.md
Last active June 28, 2016 06:05
HexaPDF show_boxes.rb example

This is a HexaPDF example for parsing content streams and working with the text parts.

HexaPDF provides the class HexaPDF::Content::Processor for processing the operators of content streams. By subclassing we can define custom behavior for each operator. This could, for instance, be used to render the contents of a page.

However, in this case we want to show how text can be handled. Since the text inside a content stream is encoded, we need to decode it before we can use it as UTF-8 string. For this HexaPDF provides two helper methods #decode_text and #decode_text_with_positioning.

The first one just decodes and returns the text itself as string. This is useful when one just wants to get basic information out of a PDF. The second one, however, returns the text together with positioning information. This could be used, for example, to correctly show the text parts of a PDF page on the console or to convert a PDF into a text file with correct text runs.

The example uses the second method to draw r

def invoke(operator, *operands)
@operators[operator].invoke(self, *operands)
serialize(operator, *operands)
end
@gettalong
gettalong / webgen_performance.md
Created January 19, 2014 11:30
Performance optimizations for webgen

StackProf

I recently came across the stackprof gem for Ruby 2.1.0 and decided to give it a spin by analyzing a webgen run of the webgen website.

StackProf is a sampling call-stack profiler like Google perftools but built only using functionality available in Ruby 2.1 itself. It is very fast, the overhead is barely noticeable.

The webgen website is probably the most complex webgen website I currently use, uses all (or nearly all) features of webgen and is therefore perfect for the task.

Pre-Optimization performance

@gettalong
gettalong / strscan-part.c
Created October 22, 2015 20:30
Possible `StringScanner#scan_float` method
static VALUE
strscan_scan_float(VALUE self)
{
struct strscanner *p;
double retval;
char *start;
char *end;
GET_SCANNER(self, p);
if (EOS_P(p))
@gettalong
gettalong / gist:2869794
Created June 4, 2012 17:46
Using Ruby in a Bash function to shorten the CWD
function shorten_pwd {
ruby -e "puts Dir.pwd.sub(/^#{ENV['HOME']}/, '~').split('/').map {|l| l.length > 6 ? l[0,3] << '…' << l[-3,3] : l}.join('/')"
}
@gettalong
gettalong / small_benchmarks.rb
Created June 2, 2012 06:11
Some small benchmarks used during the creation of kramdown
# -*- coding: utf-8 -*-
require 'benchmark'
class Test
CONST = 5
N = 1_000_000
def test_const
Benchmark.bm 20 do |results|
results.report 'one' do