atrick/2022-04-incremental-escape-analysis.md

## 2022-04-incremental-escape-analysis.md

      
    Raw
  

              2022-04-incremental-escape-analysis.md
            
          
    Incremental escape analysis for SIL

Introduction

This early design document is part of reviewing PR: Swift Optimizer: add EscapeInfo, a new utility for escape analysis #42242.
I'm still in the middle of writing this. Open questions or things I haven't finished explaining are marked with [REVIEW].
A proper design document would have more examples and illustrations.
Escape analysis design for programming languages

Classical escape analysis requires two basic concepts: objects and pointers. Memory is partitioned into objects. Objects are partitioned into fields. Pointers are directional connections between objects forming the object graph. Additionally, variables may identify object allocation or hold copies of pointers.
Escape information propagates forward along the object graph and into fields. Escape analysis is ultimately about knowing which objects are exposed such that they can be accessed outside of the analysis scope or accessed via arbitrary pointers. When an object escapes, all fields within the object escape. When a field escapes, all pointed-to objects escape transitively.
Escape information does not propagate backward in the object graph. A field may escape without its parent objects escaping. An object my escape without the fields pointing to that object escaping.
Assigning a field or variable to a pointer creates a directional aliasing relation between the location and the pointer. Escape information propagates from the field or variable to any object that the pointer on the right hand of the assignment may point to.
Designing escape analysis for a programming language requires mapping these concepts to the source language.
What is a Swift object?

This document uses the term "object" as shorthand for "reference-counted object". Other SIL documentation may use "object" to refer to values in non-reference-counted storage. In this discussion, when not referring to reference-counted objects, we use the more general terms "values" or "addressible memory locations". A value may or may not reside in an addressible memory location. An addressible memory locations may or may not be inside an object. Storage classes can be ordered from least to most information about their location as follows:
value -> addressible location -> object

In general, an addressible location might refer to an object's stored property. An object's header, however, is never addressible. And its stored properties are never directly addressible. All addresses into an object are ultimately derived from an object reference.
SIL field identification

In SIL, a projection path is a series of projections that identify fields within a composite value.
Address projections project from an composite address to a field address:
%sAdr = alloc_stack $Outer
%fAdr = struct_element_addr %sAdr : $*Outer, #f
%gAdr = struct_element_addr %fAdr : $*Inner, #g

Value projections project from a composite value to a field value:
%sVal = struct $Outer
%f0Val = struct_extract %sVal : $Outer, #f0
%g1Val = struct_extract %fVal : $Inner, #g1

Composition combines values in the opposite direction. This is not projection, but is also relevant for propagating field information:
%gVal = ...
%fVal = struct $Inner (%gVal)
%sVal = struct $Outer (%fVal)

SSA Traversal

A symbolic projection path is a string of dot-separations projection symbols: "s0.s1" is field #1 of an inner struct projected from field #0 of an outer struct. The same string represents both address and value projections. See SmallProjectionPath for details.
Symbolic projection paths can be constructed during use-def (upward) SSA traversal. Projections push a symbol onto the front of a path. Compositions pop a symbol from the front of the current path. The popped symbol selects which composite operand "reaches" the original use.
For example:

A upward struct_extract is a push-to-front

    %sVal = struct $Outer (...)                 -        "f0.g1"
    %f0Val = struct_extract %sVal : $Outer, #f0 -   "g1" ⤴
    %g1Val = struct_extract %fVal : $Inner, #g1 - ""⤴


A upward struct is a pop-from-front that selects the instruction operand

    %gVal = ...                        -              ""
    %fVal = struct $Inner (..., %gVal) -         "g1" ⤴
    %sVal = struct $Outer (%fVal)      - "f0.g1" ⤴

Symbolic projection paths can be also be constructed during def-use (downward) SSA traversal. Projections pop a symbol from the current path. Compositions push a symbol onto the path. The symbol for a composition corresponds to the projection that would extract the original value out of the composed value
For example

A downward struct_extract is a pop-from-back

    %sVal = struct $Outer (...)                 - "f0.g1" ⤵
    %f0Val = struct_extract %sVal : $Outer, #f0 -         "g1" ⤵
    %g1Val = struct_extract %fVal : $Inner, #g1 -              ""


A downward struct is a push-to-back

    %gVal = ...                        - "" ⤵
    %fVal = struct $Inner (..., %gVal) -    "g1" ⤵
    %sVal = struct $Outer (%fVal)      -         "f0.g1" 

A symbolic projection path can be fed into either SSA traversal as an initial pattern to match. This selects which uses to propagate information into or which component values to propagate information from.
The SILValue -> isEscaping relation

Escape analysis creates a relation between either a SIL value or a SIL address and a projection path to escape information. Where SILValue denotes either a value or an address, we have the relation:
(SILValue, path) -> isEscaping

If path is empty, then isEscaping is true only if all fields escape. If path is any number of any fields ("**"), then isEscaping is true if any field escapes.
When SILValue is a SIL addresses, this escape information refers to any value potentially stored at the addressible location. The path is independent of whether SILValue refers to an address or value.
%oAdr = alloc_stack $Outer
%oVal = load $*Outer

isEscaping(%oAdr, "**") is identical to isEscaping(%oVal, "**").
[REVIEW] The name of EscapeInfo indicates that it is the result of escape analysis. But it's the opposite. It's the internal state. It's sometimes useful to clarify the difference between an algorithm's internal state vs. its result.
Walking up and down: chaining SSA Traversals

The isEscaping relation is independent of whether a SILValue is a value or the address of a value. Consequently, escape analysis can chain together the projection paths from address and value traversals without changing the information represented by the path. Escape analysis stitches together SSA projection traversals to merge escape information about an addressible location with information about values loaded from that address. This is referred to as "walking up" and "walking down".
%oAdr = alloc_stack $Outer
///...
%f0Adr = struct_element_addr %oAdr: $*Outer, #f0
%f0Val = load %f0Adr : $*Inner
%g1Val = struct_extract %fVal : $Inner, #g1

A use-def (upward) traversal of SIL values starting at %g1Val reaches %f0Val with a single-element path: 's1' (struct field "g1")
A use-def (upward) traversal of SIL addresses starting at %f0Adr reaches %oAdr with a single-element path: 's0' (struct field "f0")
Walking up from %g1Val first traverses the value projections. It handles the load by initiating a traversal of address projections with an initial path built during the first traversal. Address traversal prepends to the same projection path. Walking up returns the address or value at which no more upward traversals are possible along with the projection path taken to reach that address or value. Here it returns: (%oAdr, "s0.s1").
Consider alternative SIL that instead projects an address into g1 before loading g1Val directly:
%oAdr = alloc_stack $Outer
///...
%f0Adr = struct_element_addr %oAdr: $*Outer, #f0
%g1Adr = struct_element_adr %f0Adr : $*Inner, #g1 // project into Inner instead of loading
%g1Val = load %f0Adr : $*Field

Walking up produces the same result in each case, regardless of where the load occurs: %oAdr, "s0.s1".
Walking down similarly chains together address and value traversals at loads.
Example: Outer-store 1/2

%iAdr = alloc_stack $Inner
%iVal = load %iAdr : $*Inner
%oVal = struct $Outer (%iVal: $Inner, ...)
store %oVal to %otherAdr

Walking down from %iAdr returns the store operand and projection path:
(store(%oVal), "s0")

Walking up and down: aliasing stored references

Escape analysis alternates upward and downward walks to propagate information across assignment. When an object references is stored to a location, it aliases all references that may be loaded from the location.
The store of %oVal shown above could conservatively be considered an escape. Analyzing the storage location, however, allows more precision. Recall that a store propagates escape information from the store's addressible location into the value stored at that location. Escape analysis does this by alternating upward and downward walks.
Consider these SIL instructions in addition to the example "Outer-store 1/2":
Example: Outer-store 2/2

bb0(%otherAdr):
  //...
  store %oVal to %otherAdr
  %oVal2 = load %otherAdr : $*Outer
  %f0Val2 = struct_extract %oVal2 : $Outer, #f0
  apply(%f0Val2)

When a downward walk reaches a store, it triggers an upward walk with an initial projection path:
(%g1Val, "s0")

The upward walk stops at (%otherAdr, "s0), triggering another downward walk:
During this second downward walk, another load is visited, followed by a struct_extract. The field matches the back of the projection path, "s0", so the downward walk continues to the apply. Since the apply is an escape, the origin of the traversal, iAdr, also escapes.
[REVIEW]: this is confusing because the upward walk from a store needs to determine two things: (1) whether the location is exposed, and (2) not whether a value stored in that location can escape within this scope. Those are not the same thing. Presumably, when an address traversal reaches an exposed address, it does something conservative to all the values stored at the address.
Projection chains

By definition, a projection path identifies components within a single object. Escape analysis, however, propagates information across multiple objects.
When propagating across a load, information about an addressible location is transferred to information about values loaded from that location. Escape analysis summarizes information about both the addressible location and the loaded values as if they refer to the same value. The projection path makes no distinction.
Escape analysis, however, also propagates information from an object reference into the object's addressible locations (its stored properties). It does this by chaining together multiple projection paths. Projection chains are represented using a single symbolic projection path, where each link in the chain is a "class projection", denoted with the symbol 'c'. Such as:
"s0.c0.s1"

Corresponding to SIL of the form:
%ref = struct_extract %wrapper : $Wrapper, #f0
%pAdr = ref_element_adr %ref : $Class, #p1
%gAdr = struct_element_adr %pAdr : $*Property, #g1
%gVal = load %gAdr : $*Inner

A class projection corresponds to ref_element_addr, which converts a reference into a SIL address. Escape analysis views a reference as a value that points to another object. Those pointee objects may escape without the reference escaping.
isEscaping(%ref, "s0") != isEscaping(%ref, "s0.c0")

Contrast that with the way that escape analysis views an address as a location holding the relevant value rather than as a pointer to the value. If a value stored at the address escapes, it is equivalent to the address escaping.
isEscaping(%ref, "s0.c0.s1") == isEscaping(%gAdr, "") == isEscaping(%gVal, "")

Converting a reference to an address, therefore, implies that escape information now refers to the pointee object rather than the pointer value.
Limiting SIL escape analysis to object references

SIL has three kinds of pointers: addresses, raw pointers, and references. Escape analysis collapses the escape information of addresses and values because there is no need to model a SIL address as a pointer.


SIL addresses can't be stored, so they never directly escape.


In most situations, the compiler can assume that addresses into stack allocated objects do not escape--an earlier analysis must be responsible for checking that. As long as this escape analysis algorithm is never used by a pass that diagnoses escaping pointers to stack-allocated objects, then it does not need to model stack addresses as pointers.


This analysis also deliberately ignores the presence of raw pointers. A cast from address to pointer is treated like an escape of the addressible location. A cast from a reference to a pointer is treated like an escape of the reference.
Consequently, escape analysis is only concerned with whether references escape.
For example, this escape analysis views these two SIL sequences the same way:
%struct = struct $S (%ref)

vs.
%stack = alloc_stack
%stackAddr = struct_element_addr %stack, #f
store %ref to %stackAddr

The escape information on %struct is equivalent to the escape information on %structAddr. In both cases, we only care whether %ref may escape, exposing the object pointed to by %ref.
Three forms of escape analysis

Incremental escape analysis is designed to support three higher-level forms of analysis:


uniquely identified reference lifetime


function argument side effect summaries


address exposure for alias analysis


Until now, this discussion has led directly to the first form: uniquely identified reference lifetime. Let's see how that form uses the analysis, then look at how the analysis can be adapted for different forms.
Escape analysis for uniquely identified references

isEscaping(%ref, "") determines whether the reference itself escapes, regardless of whether anything the reference points to escapes.
The StackPromotion pass uses this to determine whether to stack-allocate objects.
Escape analysis for function argument side effects

For precision, escape analysis must compose across function boundaries. This is done by summarizing the escape information of each argument as a function side effect:
@_effects(noescape <argument>.<path>)

The effect is produced by escape analysis before module serialization. It may also be manually specified by the programmer, in which case escape analysis verifies the attribute at compile time.
To bypass verification, the programmer may disable verification:
@_effects(unsafe_noescape <argument>.<path>)

The programmer can override side effect analysis by specifying an unchecked attribute. This requires "unsafe" in front of the attribute. For example @_effects(unsafe_noread).
[REVIEW]: The current isEscaping API does not return function effects information. What's the intended API for function effects discovery and enforcement? Will it return a single projection path covering all escaping components?
[REVIEW]: This shows how we handle escape information about fields within the current value. Clarify how we handle escape information for compositions of the current value. A function summary needs to say that only certain fields escape!
[REVIEW]: Currently, @_effects(notEscaping ...) is being used as an unchecked assertion. Is this intentional? Do we always want to treat @_effects as unsafe? If so, we need to decide on the syntax for the regular, safe annotation. How can we make it clear that both have the same meaning but one is unchecked? The same syntax should be used for both the analysis summary and for the checked assertion, but not for the unchecked assertion.
[REVIEW]: Eventually, we'll want a public @nonescaping parameter annotation. This will be a mandatory analysis, checked in debug mode. Do we want to make a distinction between a mandatory non-escaping checked assertion, vs. an optimistic assertion? Does the mandatory annotation also need a path component?
[REVIEW]: Granular effects could standardize on the form "<no|may|must>". Such as "noescape", "noread", "nowrite", "norelease", "nosynchronize" etc. This is not all that important. We might never serialize the "may" or "must" forms. It's just nice to have consistency with the internal representation and with debug output.
Escape analysis for alias analysis

Alias analysis must determine whether an address is exposed such that the same location could be accessed via a different "storage identifier". To do so, alias analysis associates each address with a storage identifier and projection path. This is represented elsewhere in the compiler as AccessStorage and AccessPath. If the AccessStorage has a class type, then the address is exposed unless the class is both locally instantiated and non-escaping. Escape analysis is only relevant for class storage.
[REVIEW] The AccessStorage abstraction already does exactly this for one level in the object graph. It directly provides the uniquely identified storage of the accessed address. It's API indicates whether the storage is disjoint from any other storage. When extended by AccessPath, it also tells you whether fields within that storage are disjoint. This is an abstraction that is already used pervasively. It is straighforward to extend the alias analysis logic that uses AccessPath to consult escape analysis when necessary. For class storage, it would simply invoke escape analysis on reference root to determine whether the reference is uniquely identified. This is far simpler than adding flags and modes to escape analysis callbacks, which makes the escape analysis logic incomprehensible.
Verification

[REVIEW] For correctness, the upward and downard SSA traversals need to be round-trip verified, as done with AccessPath. Otherwise, alias analysis may consider a reference non-escaping, and thus uniquely identified, but fail to identify other accesses to the same object. An upward traversal from a different address may reach a different reference root, which is assumed not to alias. Furthermore, if alias analysis is composed from the address' AccessPath along with escape information, then both AccessPath construction and escape analysis must share the same SSA traversal logic.
Decomposing the current design

Most of the SSA traversal concepts are fundamental to the rest of the compiler. Only a few aspects are specific to escape analysis.
Address projections

Use-def traversal of address projections is central to SIL. Functionally, it does this:
(SIL address, projection path) -> (access base, projection path)

Access base identifies the storage. It either
(a) identifies some addressible variable in the current scope (argument, stack or global)
(b) derived from a reference, the stored property of a class
(c) derived from a raw pointer, which may be the tail storage of a class
The visited projections are appended to the incoming projection path.
Traversal visitors

A visitor can perform additional book-keeping, but almost never needs to check specific opcodes. Instead, the visitor callback can use a "covered switch" over few SIL abstractions.

For example:


AddressProjectionInstruction: handle address-to-address projection


AddressCastInstruction: handle "address/pointer casts"


To keep the visitor decoupled from the traversal, it's only choice is whether to stop or continue traversal.
A top-level API, like findAccessBase or isEscaping can wrap the visitor. Passes almost never need to define a visitor.
Address uses

Def-use traversal of address projections functionally does this:
(access base, projection path) -> (operand, projection path)

Customization points have the same covered set of cases as address projections.
Reference roots

Use-def traversal of references is also central to SIL.
Functionally it does this:
(nontrivial SIL value, projection path) -> (roots, projection path)

These are the typical customization points:


Phi


requirement: type == RawPointer


allows multiple bases


requirement: all bases are the same type: global_addr, or pointer_to_address


StorageCastInstruction

does this cast affect ownership?


Reference uses

Def-use traversal of reference functionally does thi:
(nontrivial SIL value, projection path) -> (operand, access base)

The address use traversal could reasonably be combined with the reference use traversal. But when access markers are preserved, most analyses will never need to traverse beyond the access base, which indicates the type of access, read or modify.
Walking up and down

Implementing the upward and downward walk for escape analysis only requires SSA traversal visitor with few customization points:


upward load: trigger an address use-def traversal


upward reference root: trigger a downward def-use reference traversal


downward load: trigger a reference def-use traversal


downward store: trigger an address use-def traversal


All visitors would share the same context, which would contain the single SmallProjectionPath.