Skip to content

Instantly share code, notes, and snippets.

Last active May 3, 2024 22:38
Show Gist options
  • Save uucidl/8d4ca362c8341d1bb7bd0553eb3e9a1c to your computer and use it in GitHub Desktop.
Save uucidl/8d4ca362c8341d1bb7bd0553eb3e9a1c to your computer and use it in GitHub Desktop.
The Problem Of UI (User Interfaces)


Note: a lot of programmers talk about UI without mentionning the user even once, as if it was entirely a programming problem. I wonder what we’re leaving off the table when we do that.

Evaluation Axes

  • input handling, and latency,
  • composition,
  • dataflow,
  • layout,
  • painting,
  • styling,
  • extension,
  • resource demands: power draw, cpu cycles

Caveats/blind spots of homegrown solutions

Accessibility is a big issue. How to make your UI Accessible? Usually platform vendors provide APIs to enumerate/navigate/queryh UI elements, extracting some metadata for text-readers and the like.

Multi-viewport UIs: support for multiple windows, multiple displays/monitors with mixed DPI.

Power-draw. (When little of the screen is changing, power-draw should be commensurate with the updated areas. In other word, partial rendering, partial presentation)

Collaboration: support for more than one editing client for multi-user collaboration on the same data.

Responsive layouts and scalable/zoomables UI help users adapt an UI to viewing distances or screen sizes.

Internationalized input methods. (“IME” on windows)

Adaptability to large teams (adding new controls, new assets, new kinds) without contention.

Multi-touch and touch interfaces create many questions: they allow multiple-items to be interacted with at a time (contrary to the pair mouse-keyboard) ; should this be constrained somehow (to only similar items? to only similar items supporting similar interactions? what if more than one person is touching the screen?) ; they also either expect the items to be bigger or the hotzones to be more lenient.

Hi-DPI requires a great deal more pixels to fill. Also what should be the internal units?

Plugin supports: this exposes lots of problematic scenarios when plugins are allowed to hook to the underlying event loop and desktop APIs. For instance imagine dealing with non-DPI aware plugins within a DPI aware process on Windows.

GPU-based rendering:

  • how does it affect latency?
  • does it fit the type of graphics shown in UI?

Multiple-interactions in one: Very often UI reach a difficulty when we want to select multiple elements and change them all in parallel. Increment/Change more than one element at a time.




Consider each quad representing the bounding box of a UI control. The minkowski sum of that quad and a round region of about radius ‘tr’, representing the tolerance radius, depends on the input device (mouse, touch etc…) and represents the collision box between that input device’s center point and the control. In case n>1 controls match, the ambiguity must be resolved. @idea{separating axes test}

Window Management

In an UI where moving windows or elements is offered to users, it’s nice to satisfy needs for tightly packing those elements. Two options:

  • snap (magnetic)
  • bump+friction (solids) { allows for putting elements next to each other without them touching each other }

Layout algorithm (Flutter)

layout transformation :: min/max width, min/max height, fixed size elements, flexible size elements -> sizes per element

constraints go in, traverse the tree, bring out size of elements traversed. tree has row nodes and column nodes. go through fixed size elements first, taking the dimension that match the node type (row => heights, column => widths) apply size in chosen dimension to flexible elements


Sometimes people get confused between these two opposites:

  • the product is the user interface,
  • the user interface surfaces or allows user interacting with a deeper model or simulation

What remains true is that in the two points of view, data has to be shared across multiple systems, and data ultimately has to be seen and interacted with by the user, and belongs to the user interface.

Some surface level aspects of user interfaces (what people call design):

  • structure, sequence
  • 2d/3d layout
  • chrome, animations

Logic of interactions are difficult to express.

Example1: Firefox Web Render

@url: @title: The Whole Web At Maximum Speed

GPU based rendering.

Transformations used:

  • Page transformed into “stacking context” tree (Compositing tree?)
  • Early culling of display items to remove those that are not shown in the viewport.
  • Compositing tree turned into render task tree, aftering having optimized it (reduce number of intermediary textures
  • Batching draw calls (maximal)
  • Assist pixel shaders by allowing Early Z-culling
    • Opaque pass where opaque objects are draw front to back
    • Transculent shapes are drawn back to front

Pathfinder project: rendering font glyphs on GPU as well @url:


Reading notes for @url:


Confusing when it's more than one level deep. User going back to their UI: "How did I put my layout together?"

Window Management

  • Ability to move windows
  • Sizing vertically, horizontally, diagonally

However it gets tricky to lay things out. Solution: Snapping extremities to boundaries of windows, with visible guide lines. Snapping extremities to the parent container also makes sense.

Auto-anchoring feature. Whenever two windows are snapped together, moving the separator between resizes each side. Windows are "connected." Pressing down for a while then moving allows you to let go of that restriction.

Resizing containers

Internal windows are moved and resized using a physics (spring-mass) system.

We'll first describe the generic display time line, introducing some vocabulary. The display reads content from what's called a framebuffer, a buffer of pixels to be shown. It has its own timeline:

 0 1 2 3 4 5 6 7 8 9 a b c d e f 0 1 2 3 4 5 6 7 8 9 a b c d e f          time ->
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .        
 . . . . . . . . . . . . . o . . . . . . . . . . . . . . . o . . .        vblank signal
 -------------------------)      -------------------------)               scan-out from front-buffer (presentation)

This is the case for a traditional non-freesync/g-sync display.

In the ideal case, we provide new content to the screen each frame. A frame is a group of pixels that are consistent as a sample in time:

 0 1 2 3 4 5 6 7 8 9 a b c d e f 0 1 2 3 4 5 6 7 8 9 a b c d e f 0        time ->
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 . . . . . . . . . . . . . o----). . . . . . . . . . . . . o----).        vblank signal
 -------------------------)      -------------------------)               scan-out from front-buffer (presentation)
 . . . . . . . . . . . . . -----). . . . . . . . . . . . . -----).        swap front-buffer and back-buffer: depends on data being ready.
 a------------------------ ~ ~ ~ b------------------------ ~ ~ ~ a        front-buffer
 a-------------------------------b-------------------------------a        screen as seen by user
  • swapping front-buffer and back-buffer needs to happen during the swap period to prevent tearing (content changing while the scan-out period is active, where the screen shows two frame at a single time)
  • freesync/g-sync and other adaptative sync tech allow delivering content at arbitrary points, not only during the vblank period

Note that the presentation time (beginning of the scan-out) of a frame is often implicit to our programs which can't usually know what the real effective presentation time is for their user.

Animations feel correct when their calculation time approximates the presentation time within some accuracy. (It depends on ability of people for detecting jank in the speed of moving elements)

Missing a frame looks like this (repeated frame, noticeable by users)

 0 1 2 3 4 5 6 7 8 9 a b c d e f 0 1 2 3 4 5 6 7 8 9 a b c d e f          time ->
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 . . . . . . . . . . . . . o----). . . . . . . . . . . . . o----).        vblank signal
 -------------------------)      -------------------------)               scan-out from front-buffer (presentation)
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . -----).        swap front-buffer and back-buffer:
 a-------------------------------------------------------- ~ ~ ~ b        front-buffer
 a---------------------------------------------------------------b        screen

Unsynchronized front-buffer swap looks like this: (tearing, noticeable when the content changes quite a bit in the horizontal direction)

 0 1 2 3 4 5 6 7 8 9 a b c d e f 0 1 2 3 4 5 6 7 8 9 a b c d e f          time ->
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 . . . . . . . . . . . . . o----). . . . . . . . . . . . . o----).        vblank signal
 -------------------------)      -------------------------)               scan-out from front-buffer (presentation)
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . -----).        swap front-buffer and back-buffer:
 a---------------------------------------b------------------------        front-buffer
 a-------------------------------a-------b------------------------        screen: shows tearing at the line scanned out at time 4

Lowering UI latency

What's acceptable user latency? Can I make the argument that the lower possible the better? (i.e. that the simulated world that the UI shows is more useable, "feels more real" the lower the latency?)

Components throughout the system add latency in unknown way. And system builders add components between us and the scan-out in ways that isn't entirely transparent. (Examples: desktop compositors, GPU api buffering, in screen or transport latency and buffering)

An unacceptable demand on latency is one that cannot be achieved by the system: we can't expect latency that's smaller than what the screen speed allows.

Where is latency hiding? Can I simulate the feeling of laggy displays?

  • mouse cursor trailing behind the actual mouse position
  • clicking the wrong thing:
    • mouse position as input trails the actual screen position shown by cursor.
    • screen disagrees with simulation

How can we reduce input latency?

Our goal is to transform:

user inputs, autonomous processes -> frame content on the screen for the next presentation time

Given the following steps:

  • transport user inputs to program
  • map input to user interface
  • user interface interprets input into data model changes and new graphical content.
  • render: new graphical content results in frame buffer content
  • transport frame to screen

Some parts of the graphical content depend on user inputs, some other parts not. This varies frame to frame? (Imagine resizing panels in an user interface)

We define the input lag as follows: Input lag :: time(scan-out) - time(user inputs transport)

Reading user-inputs as close to the presentation time is what low-latency mean.

How can we push time(user inputs transport) closer to time(scan-out)?

General optimization: By reducing durations spent mapping inputs to user interface, rendering the UI etc... This is our baseline, it gives us the lowest expectable latency.

Another angle of attack is to reduce the "slop", the time wasted waiting for scan-out to start, when the frame is already ready. If that time exists, then we can technically wait before reading and processing user input until the very last moment where rendering the frame would not result in a missed frame.

I.e. if there are "compressible" parts in the data pipeline leading to new buffer contents, we can compress them by waiting upfront just the right amount of time that guarantees the frame will still be shown.

How to estimate the time it takes to process an input? What's the probability distribution, and does it have an average? What's the risk of estimating badly? What should we do when we estimated badly?

Our estimator at worst can take the conclusion that it does not matter, because rendering will take so long that we're in a regime where low-latency isn't even achievable at all. In that regime I'd make the hypothesis that slow but consistent times are enough. I.e. relying on reading inputs on every frame start.

If we were to render too fast, the estimator is dealing with that by using the extra capacity to produce lower latency. Normally it's the vblank synchronization that provides back-pressure to avoid having to render too many frames. Note how disabling vblank effectively disables the back-pressure by rendering more frames, in exchange for lower input latency.

Strategies when missing a frame?

If it's possible, it might be acceptable to allow tearing when we just about unexpectedly missed the frame boundary. There's tearing and tearing.. Maybe tearing within the first lines of the screen isn't so bad/visible.

How sophisticated should the predictor be?

Can we do something like branch prediction in CPU, that is, collect statistics about disparate events so as to predict well clicks in various parts of the screen? Think about the difference between moving the mouse around (only few elements change state and need to be redrawn, due to hover) and clicking on the a the button selecting a tab in a panel, which would redraws the full section.

If missing a prediction isn't so bad, then there's no need to get sophisticated.

Resources we care about:

  • CPU usage (time taken from computation, battery life)
  • Memory


A user interface takes available input devices, interprets those continuous and discrete actions to trigger data transformations, computation and communication.

An user interface presents itself and the application to output devices (displays)


For ergonomy, the position of elements of an UI is stable in the coordinate system of that UI. Their position generally shifts as the result of an user input. Exceptions: timeline editors, graphs.

Display elements are in general not occluding each other, except in windowing systems. Most apps today follow instead a tiling arrangement.

Modern displays are framebuffer based and therefore can be seen as local caches. Going further, graphics processing units (i.e. display accelerators) go beyond that and can store bitmap elements, textures and GPU programs.

Although modern GPUs can re-render most UI within one display frame, to preserve resources (CPU resources, for computing/battery life) an UI can implement:

  • just-needed rendering: rather than rendering at the display frame rate (144hz, 60hz) the UI can be rendered only the "cache" is out-of-date
  • partial rendering: only render what as changed

This applies to CPU-bound computations. Computations done on the GPU would save on CPU computations. Implications however for battery life depend on whether the computation is more efficient on the GPU than on the CPU and whether the entire GPU can go back to IDLE quickly enough.

Just-needed rendering:

  • UI is called when input devices receive inputs,
  • UI is called on spontaneous state changes: timers, network or when a frame ceases to be valid.
  • UI elements define validity of a rendering frame: For animation, there is a certain time-to-live attached to the rendered frame. For corner cases such as layouting where the UI needs to converge to a stable state, elements may opt to mark more than one frame as out-dated. On these conditions the UI will trigger a re-render of the UI.

Partial rendering strategies:

  • by subregion (keep track of "dirty regions" and re-render only that)
  • overlays (dirty or fast updated regions are rendered separately and blit on top of unchanged areas) Goal: preserve resources. Failure mode: complete re-render

Quality checklist:

  • Can I mark and copy text? Is it any text, or just specific things?
  • Can I enlarge the font or the entire view without breaking the app?
  • Can I resize the window without breaking the app?
  • Can I use the app with just the keyboard, or just the mouse (with an OS-provided on-screen keyboard)?
  • Does it work with a screen reader?
  • Does it play nicely with other OS accessibility features (high-contrast mode or DPI settings)?
  • Does it support localisation?
  • Does it have legible and high-quality text rendering and various sizes?
  • Does it have standard OS chrome (Window icons, menu-bar)

Technical quality checklist:

  • Good efficiency (resource un-used when application is idling, minimal data-retention)
  • Styling
  • Layouting
  • Custom UI elements, canvas for custom drawing
  • Scalable UI elements with nevertheless sharp edges
  • Good platform sympathy: DPI settings, accessibility,


@author Mikko Mononen @url:

@comment { Mikko Mononen is referring to "RectCut" as defined in @url: by Martin Cohen.

Martin Cohen can also be found at @url:

The idea is to take axis-aligned bounding boxes (rects) and carry the layout as an input rect that gets mutated via rectangle producing functions that reserve space within it:

    Rect layout = { 0, 0, 180, 16};
    Rect r1 = cut_left(&layout, 16); // carves out a rectangle of width 16 on the left
    Rect r2 = cut_left(&layout, 12);  // carves out another one
    Rect r3 = cut_right(&layout, 32); // carves out another one, this time on the right side.


Inspired by @FlohOfWoe's Sokol, I've been tinkering with IMGUI layout based on C's struct initialization. The idea I'm exploring is rectCut + flexbox. Where in common case you can just get a slice of rect, but you can also create small flexbox-like layouts to go with it.

// Menu Layout
MIbox itemSearchCont = { .layout.dir = MI_ROW, .layout.spacing = 4, };
MIbox itemSearchIcon = { .content = miMeasureIcon() };
MIbox itemSearchInput = {};
miBoxAddChilren(&itemSearchCont, &itemSearchIcon, &itemSearchInput);

MIitem item1 = { .text = "Item 1", .detail = "Alt+Shift+Space"};
MIitem item2 = { .flags = MI_ITEM_CHECKED, .icon = ICON_EMOJI_PEOPLE, .text="People", .detail = "Alt+P" };
MIitem item3 = { .flags= MI_ITEM_SUBMENU, .icon = ICON_EMAIL, .text="Email" };

MIbox menuCont = { .layout.dir = MI_COL, .layout.spacing = 4, .layout.pack = MI_START, .layout.pad.x = 6, .layout.pad.y = 6 };
MIbox itemBox1 = { .content = miMeasureItem(item1) };
MIbox itemBox2 = { .content = miMeasureItem(item2) };
MIbox itemBox3 = { .content = miMeasureItem(item3) };
miBoxAddChildren(&menuCont, &itemSearchCont, &itemBox1, &itemBox2, &itemBox3);

miBoxMoveTo(&menuCont, (MIpoint) {.x=500,.y=200});

// Menu logic
miIcon(itemSearchIcon.rect, ICON_SEARCH);
if (miChanged(miInput(itemSearchInput.rect, (MIInput){.text=text, .maxText=sizeof(text)}))) {
    printf("Search: %d\n", text);
if (miPressed(miItem(itemBox1.rect, item1))) {
    printf("Item1 pressed\n"); 
miItem(itemBox2.rect, item2);
miItem(itemBox3.rect, item3);

It's all one pass, you build the layout as you go. I have few nasty-to-implement cases. Menus are one of those, variable width content, each row having items aligned to both ends.

Another example how the rectCut works together with the flexbox.

// Search box
char const* searchText = "Search";

MIbox searchCont = { .layout.dir = MI_ROW, .layout.spacing = 4 });
MIbox button = { .content = miMeasureButton((MIbutton){.label=searchText})};
MIbox input = { .content = miMeasureInput((MIinput){}), .grow = 1});
miBoxAddChildren(&searchCont, &input, &button);
miRectCutAndLayout(&windowCont, &searchCont, windowLayout);

miButton(button.rect, (MIButton){.label=searchText, .variant=MI_FILLED)});
miInput(input.rect, (MIinput){,text=text, .maxText=sizeof(text)});

// Slider
MIbox slider = { .content = miMeasureSlider((MISlider){})};
miRectCutAndLayout(&windowCont, &slider, windowLayout);

miSlider(slider.rect, (MIslider){.value=&sliderValue, .vmin=0, .vmax=100});

When adding custom layouts to your layout, it kinda needs to do the measuring twice. The flexbox like layout is super simple, so most time is likely spent in measure text.

UIAutomation + Accessibility module

  • describing an UI as a tree
  • element types:
    • content: textual, visual
    • container: . regions tagged with some purpose . to assist with navigation
  • focus:
    • there is a concept of focus connected with non-spatial control devices such as the keyboard

Any node may have 0..n children elements

Why a tree? Alternatives?

Linear sequences: Focus management conflict. Groups tagged with ids? (I.e. the tree exists somewhere; ids may be non-sensical)


APIs for defining trees:

  • serial form: s-exps / xml / json / iff
  • host language literal equivalent
  • element based:
    • construction of fragments
    • children list manipulation
    • re-rooting
  • iterator based (zipper?)
  • infix/postfix

Some Problems:

  • compactness
  • can lead to creation of graphs




  • WAI-ARIA proposes some tags to inform about roles of UI elements
  • Microsoft's UIAutomation also has control types

General Information about Accessibility on Multiple Platforms on Linux ATK appears to be a thing.

Copy link

Copy link

uucidl commented Feb 17, 2023 link is broken. Working URL:

Thanks, I fixed the link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment