A Theory of Compositional Tool-Assisted Speedrunning
Gamer jargon is italicized. Corresponding definitions are bolded.
A finite state machine (FSM) is a physically-implementable computer which consists of multiple states as well as rules for transitioning from one state to another. A controller is -- for now -- a discrete-time-varying complete atomic Boolean algebra (CABA) (WP, nLab); that is, a controller is a collection of Boolean flags which vary over a ticking clock. For our purposes, a video game, or game, is an FSM whose transitions are labeled by a controller. We will generalize controllers more later.
A speedrun is an athletic manipulation of a video game which transitions to a specified state; the aesthetics or swag of a speedrun are the intermediate transitions selected by the athlete during performance. A tool-assisted speedrun (TAS) is a pre-recorded performance of a speedrun.
An immediate and fundamental concept is the humanly possible, which concerns the physical difficulty of athletically manipulating the controller. For TAS authors, the humanly possible does not matter; only the FSM's transitions -- and the controller inputs which provoke them -- are important. There are several existing theories of TAS:
- Temporal TAS: Time is treated as a free variable, and the author modulates the controller, re-recording inputs
- Computational TAS: The FSM states are searched via emulation or abstract interpretation for desirable properties
- Splicing TAS: Certain FSM states are isolated as intermediate states, and then those intermediate states are treated as final states, recursing to smaller TAS segments
In all existing theories, time is an arrow which flows from start to finish, and there are two primary tools. A save state is an FSM state which has been highlighted by the author, and re-recording (also turbo controlling) is a superhuman sequence of controller inputs. The typical TAS today is produced by splicing several temporal TAS segments, each produced by re-recording from its own save state, along with some computational segments which are generated by reverse engineering of the FSM.
I humbly propose a new approach, Compositional TAS, which generalizes Splicing TAS. Briefly, we will generalize FSMs to categories, introducing the notion of state equivalence, and then we will allow intermediate states to be equated with one another. This will yield a time-weighted graph of states, and searches through this graph can be incrementalized and amortized.
A state projection is an equivalence class of states, or equivalently, a decidable property of states. A route is a finite sequence of state projections, along with the belief that there exists a finite sequence of transitions between each projection. For example, suppose that we can pick up a key and open a locked door; there is a projection for holding the key, another projection for unlocking the door, and there exists a sequence of controller inputs which transition from the former to the latter.
It is worth inspecting this further. Why might an author believe that a route exists? Often, the author has played the game as intended, and already determined that several routes exist by providing witness runs which exhibit those routes. The author might also notice that, if one route ends where another begins, then there is a longer route which concatenates them; similarly, they might imagine that there exists a projection which is intermediate to a route, and cleave it into two smaller routes, particularly when splicing.
This all motivates a graph of routes and category of routes. Given some FSM, its graph of routes is the directed graph whose vertices are its state projections and edges are all of the of finite sequences of controller configurations which transition from one projected state to another, given that for every state in the source vertex, there exists at least one transition to at least one state in the target vertex. For our key-and-door example, holding the key is a vertex, opening the door is a vertex, and the edge consists of all of the sequences of inputs which start with holding the key and end with opening the door.
Correspondingly, the free category (WP, nLab) on a graph of routes is a category of routes, which can be viewed as a deductive system. This is the system within which an author might deduce that, for example, if we have a route from the key to the door, and a route to the key, then we have a route to the door.
This is often where non-TAS speedrunners end their inquiry. They measure the cost of performing routes empirically, practice their controller manipulations, and construct an any% route, which is a route that terminates at a community-defined projection, traditionally called credits.
Optimality vs. Optimization
Speedrunners routinely misuse the concept of optimality. "Optimal" merely means "best," and speedrunners confuse what is best (ontic) with what is currently believed to be best (epistemic). We will try to disambiguate in order to better understand.
First, consider an optimal route. A runner constructing an any% route will want to shorten that route, and they will measure the cost of a route in terms of the typical expected length of that route. The expected value is exact when our starting and ending projections have only one element; that is, if we start at a particular state and end at a particular state, then the shortest paths between them have fixed length. A runner is often uncaring towards the massive state space of a game, and so declares any discovered fixed-length path to be "optimal." The runner knows that, in the future, a new path may be discovered which is "even more optimal."
We cannot be quite so fluid with beliefs when creating TAS. The thoughtful reader may have already noticed that state projections need to be relatively precise. In our running example, it is not enough to hold the key, but to hold the key and also be able to navigate to the door. If we find some glitch which grants the key early but we cannot reach the door, then this glitch does not belong to our desired equivalence class, so we should choose a projection which excludes that glitch.
Let's define a slightly tighter structure which reflects this principle. Given some FSM, its graph of states has a vertex for each state and an edge for every individual state transition. Its category of states is the free category on the graph of states. For clarity, two paths in a category of states are equivalent when they begin and end at the same source and target vertices. The cost of an edge in the graph is the number of ticks required to complete the transition, the cost of a path is the sum of the costs of its edges, and the cost of an arrow in the category is the minimum cost of its equivalence class of paths.
First, note that this gives us a formal ontic definition of optimality. Given starting and ending projections, an optimal run is any path in the graph of states with minimal cost such that its source vertex is in the startng projection and ending vertex is in the ending projection. Conversely, an optimal route is any route which includes an optimal run.
Second, note that we now have a basic approach to optimization. Given any run, we may ask whether there are any equivalent paths in the graph of states with lower cost. A direct search is infeasible, but we will explore several heuristic improvements. This is one of the keys of compositional TAS: a good run comes from optimization of an existing run.
Let me take an extended tangent. A 100% run is one which visits all items, which we will not characterize beyond being fancy state projections. Some games define 120% runs, 117% runs, 99% runs, etc. based on game-specific rules about which projections count as items; other games offer easy percentile arithmetic. In Super Metroid, there are 100 items to pick up, so 100% means visiting all 100 item projections in some order. (As we will discuss later, this is a partial order.) In extreme contrast, low% runs visit as few items as possible. Super Metroid speedrunners currently consider 14% to be the lowest humanly possible, and recognize a surprising 21 distinct routes! And there is a 13% TAS.
Let us draw some concrete information from Super Metroid. Each item triggers a cutscene, which is a sequence of projections ignoring controller inputs. Cutscenes are undesirable for any run because they consume time, and each Super Metroid item cutscene costs roughly eight seconds. By quick arithmetic, this means that 14% runs must spend 1m52s on cutscenes and 100% runs must spend 13m20s, an expensive consideration for the any% TAS author. What might not be obvious, but is definitely true in the case of Super Metroid, is that any% runs pick up extra items in order to save time! The best-known any% TAS is 21%, and the current humanly possible any% route is 19%.
This is a valuable lesson for us: state projections are generally not monotone in cost under any sort of ordering or catenation. Less obtusely, we sometimes need to spend time, e.g. in cutscenes, in order to gain time elsewhere, e.g. in a boss fight. This is already obvious from one of the ancient TAS principles of taking damage to save time, also known as damage abuse.
Also notice that low% TAS depends on what is proven possible, that is, a witnessed run of a route. For TAS, a witness is the sequence of controller inputs, and the community also often prepares videos demonstrating playback on a suitable platform.
That's the entire tangent. Now let's imagine how to shorten an existing witness.
Infima & Bottlenecks
We can take the skeleton (WP, nLab) of our category of states, yielding a poset of state projections. In a poset, we no longer care about exactly which sequence of controller inputs labels each transition; we only care about whether such a sequence exists. (The Axiom of Choice is not required in this case.) Note that we obtain equivalence classes of states -- projections -- when we skeletize our category of states. Unlike our graph of routes, our poset is acyclic; once we prove that we can reach a particular vertex, we definitionally cannot not move backward. We will call these vertices bottlenecks (also chokepoints), and when an equivalence class has only one member (e.g. when a game is first started) we will call it an infimum (WP, nLab).
There are several examples of bottlenecking in speedruns. The example I will use is often called a room. A room is a vertex in a poset of state projections, but informally, it is a region of a map, a ludic space that can be explored during gameplay. The distinguishing features of a room are that it may be entered and exited via controller inputs, only one room may be entered at a time, and often there are easily-projected portions of a game state which indicate which room is currently inhabited.
Our poset does not distinguish between adjacent rooms with no barriers to entry. In games like Zelda 3, the rooms are grouped into superrooms, which are four rooms connected by small cutscenes. Navigation between superrooms can be one-way, but navigation between rooms inside a superroom is usually bidirectional. Note also that rooms may have implicit versioning, in the sense that a map may allow navigation from a room to an altered version of that room which appears similar but is technically a distinct vertex.
Another version of bottlenecking comes from stages, also called levels, instances, or dungeons. A stage is a poset of rooms; navigating a stage will entail navigating many rooms in sequence. A skip is a route which does not visit every room in a stage. A strat is an algorithm for controller inputs which reliably navigates a portion of a stage. When speedrunners talk of strats and skips, they are talking about efficient approaches for completing stages.
We can propose the first part of Compositional TAS now. The compositional approach relies on collecting and organizing routes which efficiently traverse stages. This works as long as there is a large database of such routes. However, it has obvious shortcomings as phrased. In Zelda 3, there are multiple ways to traverse a room, depending on the items which the player has acquired; efficient traversal could rely on picking up specific items at specific times during the run. This also occurs in Super Metroid low%, where there are over a dozen approaches to some rooms. An infamous pathological example comes from Super Mario World 2, where each stage begins and ends with an ammunition counter in the range 0-6, and composition of routes across stages requires the counter to line up, leading to an exponential number of routes to consider. (Fans will note that Super Mario World 2 also has ammunition types in later stages, increasing the search space even more.)
Let us return to the concept of state projections. Without loss of generality, we will consider states as fixed-length bitstrings. A projection is simple if it can be decided by comparing a fixed portion of a state to a constant value. In many games, simple projections include which room a state is in, how much health or ammunition an avatar has, how many bottlenecks remain before credits, and the locations of various in-game objects in the playing field.
It happens to be the case that, in many games, particular simple projections are only visited once. For example, in Super Mario World 2, each stage is visited at most once; in Zelda 3, each boss room is visited at most once. This implies that, when searching for routes, we may discard routes which perform multiple visits. This does not help Zelda 3 much, but simplifies Super Mario World 2 to the point where most runners use a route which visits each stage once in linear order.
Simple projections also enable targeted searching for glitches. Suppose that we know, in our graph of projections, that there is a glitch which corrupts state in order to effect a particular edge; performing the glitch reliably prepares states. Then, a glitch strat is a collection of controller inputs which narrows down the projection until it is simple, then performs the glitch, reliably preparing the state during a live run. When such a strat is only dependent on the projections, then it may be called tech.
Strategies & Tactics
To a speedrunner, a strat is a tactic. So, what is a strategy? Strategically, a speedrunner wants to discover efficient routes. How are efficient routes found? With search. For runners, search is a ludic interactive experience where a human attempts to build an intuition for the underlying mechanics of an FSM by experimentally peturbing its states.
We will replace this with a more computational approach. Suppose that we limit ourselves to a single room, and search for efficient traversals of that one room. Each search starts from some simple projection which prepares our avatar, and ends with a corresponding simple projection which is exiting the room. Since we also have our category of states, we can identify equivalences within intermediate states; and since we also have our category of projections, we can identify mid-room projections which bottleneck parts of the room.
We also have some tactics which don't correspond to any common strat. For example, we may use A* search to walk across a room efficiently, using example traversals as admissible heuristics. To avoid explosion, we may use A* on simple projections only.
All together: The recipe
We can now explain how to avoid the explosion of possible state considerations when performing Compositional TAS. First, only optimize between bottlenecks; second, require example witnesses for each projection at each bottleneck.
Using Super Mario World 2 as an example, we can take a witness run which traverses all stages, and slice it up into segments which each only traverse a single stage. We then annotate each stage with the starting and ending ammunition information, which are simple projections. In order to improve a stage, we incrementally search within its rooms for improvements, ignoring changes in ammunition. Crucially, whenever an improvement would alter the starting or ending projections, we require the runner to provide an example witness which completes the adjacent stage with adjusted ammunition, and add that example witness to our list of segments which need to be optimized.
The upside of Compositional TAS is that it may search more aggressively than Computational TAS, which is usually brute-force over state space. The downside is that it requires witness runs, which may require Temporal TAS to produce in the case of superhuman tech.
Table of jargon
|game||FSM with CABA-labeled transitions|
|swag||asthetic choice of intermediate states|
|route||collection of inputs which transition between projected states|
|any%||a route which ends at credits|
|low%||a route which avoids certain state projections|
|cutscene||an uncontrollable sequence of state projections|
|proven possible||a route which has been run|
|room||vertex in poset of state projections|
|map||collection of connected rooms|
|stage||poset of rooms|
|strat||a route through a series of rooms|
|skip||a route through a stage not visiting every room|
|tech||a strat which only relies on simple projections|