Skip to content

Instantly share code, notes, and snippets.

@wipfli
Created December 3, 2023 17:34
Show Gist options
  • Save wipfli/8ed93e9066a0458aac817bf4b61ad0bb to your computer and use it in GitHub Desktop.
Save wipfli/8ed93e9066a0458aac817bf4b61ad0bb to your computer and use it in GitHub Desktop.
Indexed Font Encoder

Indexed Font Encoder

I would like to create an indexed font encoder. It should be an application that takes the following as input:

  • labels.json: a list of map labels to encode
  • .ttf fonts: a directory containing ttf font files used for the encoding
  • ignore_codepoints.json: a list of unicode codepoints which should be ignored for the encodoing

And the output of the application should be:

  • labels_encoded.json: a list of the encoded map labels
  • MapLibre font pbfs: a directory of font pbf files which hold the indexed font

image

Indexed Font Concept

A usual MapLibre font stores the glyph for every unicode codepoint directly. So there is a one-to-one mapping from unicode codepoint to glyph (see https://github.com/wipfli/about-text-rendering-in-maplibre for more info on standard MapLibre text rendering).

In an indexed font, we first shape the labels with a text shaping engine such as Harfbuzz. The result of this is a list of positioned glyphs. A positioned glyph is 6 numbers:

positioned_glyph = (index, x_offset, y_offset, x_advance, y_advance, cluster)

The index is the index of the glyph in the .ttf file. The offset and advance values say where the glyph goes, and the cluster is used for the cursor and can be ignored for us I think. https://harfbuzz.github.io/ has some more info on this.

In any case, in an indexed font we just store the positioned glyphs at indexed locations in the font files and in the tiles, we store not the letters themselves, but rather a reference to the right glyphs in the font.

Example: Say we have a font which supports three unicode codepoints A, B, C and they are stored at glyph indicies 1, 2, 3. In an indexed font, we would store the glyphs for A, B, and C at codepoints 1, 2, 3 and the string "ABBC" would be encoded as "\u0001\u0002\u0002\u0003".

You can find a demo for this with some Nepali map labels at https://github.com/wipfli/nepali-map-labels.

In principle this technique should work with any language and script without code modifications in MapLibre GL JS and Native. This is very attractive because it means we should just be able to make maps in Khmer, Hindi, or Burmese already today...

Inputs

labels.json should look like this:

{
  "Zürich",
  "Basel",
  "Bern",
  "Geneva"
}

.ttf fonts should

@wipfli
Copy link
Author

wipfli commented Dec 3, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment