Skip to content

Instantly share code, notes, and snippets.

@adrientetar
Last active December 26, 2019 14:13
Show Gist options
  • Save adrientetar/2c515f2e304bfc57c9d37c7e39b586c7 to your computer and use it in GitHub Desktop.
Save adrientetar/2c515f2e304bfc57c9d37c7e39b586c7 to your computer and use it in GitHub Desktop.
Technical note on the design of a new UFO library.

Thoughts on fontTools.ufoLib

w.r.t. my experience with defcon and ufoLib. Some of the functionality I discuss (notifications, etc.) definitely shouldn't go into fontTools however we ought to make fontTools.ufoLib "compatible" with these extra features. I'm sure we can do simple and versatile.

IMO fontTools.ufoLib should basically be written from scratch (with copy-pasting here and there) since ufoLib/defcon have significant bloat and apparently we want to use lxml.

General comments

Avoid the check-and-use pattern

ufoLib does a lot of things that look like this:

if not os.path.exists(path):
    return
file = open(path)
file.read()

This is slow, as it's making an extra check which becomes expensive in hot code paths as I witnessed when flame-graphing ufoLib/defcon.

In this case it's also wrong (the well-known TOCTOU from C) as the filesystem can be accessed from multiple processes and thus is racy.

Exceptions let us drop pre-checks:

try:
    with open(path) as file:
        file.read()
except FileNotFoundError:
    return

Inline validators as much as possible and do minimal validations, "take it or leave it"

Take what's good, leave what's bad. For perf and complexity reasons, also for letting a minimally bad file be opened...

Conceptually the reading process goes like this: file -parse-> tree (e.g. ElementTree) -structure-> font (e.g. Font) -validate-> blessed font

Validation should be optional since with e.g. with fontTools you want to be able to open and modify the font even if it's semantically invalid, and with fontmake you want to optimize for speed (assuming 99% of fonts are presumed valid) and it'll fail during compilation if some data is invalid either way. For font editors though validation is generally desirable otherwise you have to do many checks everytime you use data from the font, which isn't practical for the developer and quickly becomes redundant.

In the majority of cases though validation is as simple as converting the data into its attended type (which can be done easily with attrs); more sophisticated forms of validations include uniqueness of identifiers and correctness of segments in the contours.

Maintain a location, pass it when throwing

Just like fontTools.feaLib does – it maintains a location tuple that is passed to each exception being thrown and indicates the current filename/line/column.

The original ElementTree library doesn't maintain a location but lxml objects have a sourceline attribute.

Retain data structures

Whatever data structure the reader reads to, it should be retained after the initial read (don't trash them or copy them). It should be the data, that sticks until the end of the program.

This ties into support for custom classes. I'll need to extend the base with extra attributes, so..

  1. do we want to read to primitive data structures like dict (which are easy to swap/serialize/compose) and then the Font, etc. classes "mount" (store) that data structure and mutate it internally. I like the KISS aspect of it: "this" file corresponds to "this" data structure. or
  2. do we want to pass classes to the reader (which it'll instantiate? in which case the ctor is part of the reader/writer API?)

What we should aim for is not need to reconstruct classes (e.g. currently for glyph anchors, ufoLib gives a list of dicts and defcon takes each dict and creates an Anchor() from it – the two pass aspect can be avoided).

Use the attrs library

attrs lets us write data classes without boilerplate ctors, cmp etc. Showoff

A compelling candidate for writing the data structures mentioned in the previous point, and directly instantiate from the reader.

Single file UFO

Would be nice to have UFOs as a single file.

Only Mac considers dotted folders as files ("packages"), and file picker dialogs on Windows/Linux/etc. aren't expecting a folder (or it's impractical, e.g. when you saveAs with a folder the folder picker can only choose one that exists, you can't write a desired name and just click OK). Also the filesystem isn't optimized for many small files afaict.

Many modern multi-platforms apps (Microsoft Office, XD, Sketch etc.) use a zipped tree of files (zip with no compression).

Going forward it would be nice to move towards single-file becoming the default.

Also what I like with this is we can lock the whole file while we're reading it, in a directory things can race. Directories cannot be locked on Windows.

[Idea] Immutable data structures? (not sure if it's compelling here)

Clojure/FP-like data structures, cf. FB Immutable.js talk.

Hierarchy of Layer -> Glyph -> Contour etc. could form a deeply nested such structure inside the Font.

Takes more memory, but is fully versioned... easy to slice?

makes it easy to save the font while still using it (since it's COW).

[UFO] Layers stored in the glyphs would probably make more sense

And make renaming etc. easier. Layer colors?

[UFO] Conversion b/w points and segments is expensive

Could probably be made zero-cost

[UFO] Group membership could be a glyph attribute

[UFO] Image storage should prefer linked images not embedded

[UFO] Font info too tied to OT

I'd prefer a more generic set of infos and custom parameters that can override specific OT fields.

What I will need

Inject custom classes

Does not need to be dynamic.

Custom classes that ain't scrambling reader/writer data

What happens right now with current stack is ufoLib reads data with zealous checks and at times unnecessary copy, then ufoLib and defcon are totally blind to eachother. ufoLib sets the attribute of whatever Font object it's given like .anchors, .guidelines etc. defcon treats it like arbitrary data and retakes each element one-by-one, sends notifications, makes asserts etc. while it's TOTALLY UNNEEDED in that case!

The custom classes need a privileged path where it just swallows the furnished data, ideally with zero copy (i.e. setting data using that path is free).

The alternative is to disable notifications/have zero-cost-when-no-subscribers notifications system and let the ufoLib set data normally.

Also, ctors should be as zero-cost as possible.

Efficient, expressive serialization system

Essentially for Copy and Paste.

  • When copying part of a glyph, currently I create another glyph, use a special pen to pass on the selection to it then serialize that glyph. Maybe that could be simplified? It would be nice to have a serialization pen but I don't know if the pickle library could work that way. Also if serializing all elements of a glyph that are selected can be automated, that's cool.

  • I should be able to deserialize into a Glyph without clearing its contents (for Paste, basically) – defcon doesn't currently allow it

Point pens

Will these be going into fontTools?

Misc stuff

defcon has extended unicodeData and some bezierMath (join segments, cut contour at position). Will these be going into fontTools?

Also there's the representations system (just a cache, pretty straightforward) and identifiers (unique persistent hash for a given point, I think). These are only useful in apps so I don't think they should be in fontTools.

The defcon modified? system

a.k.a the detect external modifications thing.

Note: the stampGlyphDataState method in defcon is also wrong as it stamps before attempting to load the Glyph and thus is prone to data races.

Cons:

  • the os.path.getmtime() function can be expensive when in a hot path.

A notification system

The NSNotificationCenter-like system has several shortcomings that are well-documented (relevant: Deprecating the Observer pattern):

  • string notifications prone to typos (e.g.), no checking

  • you have to unregister and not fuck it up.. which is unpractical for gui environments. otherwise you get errors like: "only one observer allowed for this notification" (with weakrefs, you don't even need to unsubscribe explicitly... if we make subscribers a set() then it's totally irrelevant!)

Related Q: does the order of notifications matter? can we make it so that it doesn't matter? (generally representations clearing [kill-cache] should have priority)

  • high overhead of the machinery (packing notifications, weakrefs, is it disabled?, send to all possible targets :: objects get spawn during that process/high cost even with no observers), a slimmer, typed system such as delegates should be more effective

Solution:

have each object store its listeners and call them directly

  • avoids having a big global notification handler that's expensive to work with
  • avoids having to deal with many weakrefs and a chain of getDispatcher calls (i.e. where's my font? :: for e.g. glyph guidelines that can take 6 stack frames and weakref unpacking)

NOTE: compared to the current system this won't allow e.g. subscribing to all instances of a given class or notification in bulk but I don't think that's needed (even if it turned out to be, we needn't optimize for that case).

use an Enum type for notifications

  • that gives us typechecking
  • possible to add to that enum at runtime? otherwise store it as a class attr

allow disarming notifications? have a _isLoading attr? or try to handle that all in ctor?

  • Note: if we nil the cost of notifications w. no subscribers, this becomes a non-issue
@madig
Copy link

madig commented Nov 15, 2017

Re: Single file UFOs
One thing I like about UFOs now is that they lend themselves better to version control. I dislike how e.g. .glyphs files are just big text blobs you'd need tools to dissect. Granted, this might be a more ideological view.

@adrientetar
Copy link
Author

@madig It is a legitimate issue, which comes up with modern Office files (e.g. docx). There's no built-in git feature that handles it, but one can configure a diff driver that unzips the file and displays the diff of its contents, or unzip before committing.

Arguably GitHub would also need a visual diff functionality to be truly useful with UFOs;

@davelab6
Copy link

Arguably GitHub would also need a visual diff functionality to be truly useful with UFOs

The GitHub API is rich enough that we can build our own :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment