CTMacUser/ArrayManifesto.md

## ArrayManifesto.md

      
    Raw
  

              ArrayManifesto.md
            
          
    Fixed-Size, Scoped-Storage Array Manifesto


Authors: Daryle Walker
Implementation: (none yet)
Previous attempts: NNNN-fixed-size-arrays.md

Introduction

This manifesto is for collecting the ideas needed to introduce the classic fixed-size array family of types to Swift.
Basic Outline

(I'm still learning about all the steps of the compiler, so I don't know how a new primitive type could be added.  Hint: trying to read the stuff on SIL still makes my eyes glaze over.)
The model of adding a new array type has to feature:

Declaring such types
Initializing instances of such types
Dereferencing elements
Looping over elements
Getting the parameters (element type, number of extents, value of each extent, etc.) of an array type
Determining the user-level support functions
Determining the SIL-level support functions
Figuring out the argument and/or return passing mechanisms, metadata, witness tables, etc.
Figuring out the IRGen interpretation (I don't even know if this is a valid concern.)

Motivation

I've always thought arrays are a neat idea, even the fixed-sized ones.  I was disappointed with how C handled them, especially after figuring out it was a downgrade from some older languages.  Then the incomplete features C arrays had, their chumminess with pointers, and the resulting numerous complaints over those issues lead to the hip languages starting from the 1990s dropping the feature altogether (with those designers figuring that std::vector equivalents were all you need).
The new languages for the past decade have been bringing back the feature.  Swift didn't, but that ended up being a good thing.  Those other languages, like Rust and Go, followed a path close to C but with saner semantics.  With Swift is going in fresh, we can break away from that mold (in multiple senses of the word).
Another interest point with me for C was how it handled compound types.  Originally, C only passed around singular types: numeric (including enumeration types as quirky integers) or pointer, anything that could be passed via a register.  Compound types, structures and arrays, had to be passed by a pointer to such.  This was changed for structures but never for arrays.  Back then, compound types couldn't be passed via a register (besides a pointer), but now compound types can be passed through registers (like SIMD representations), and I want to allow the freedom for that.
Proposed solution

Grid arrays are an anonymous homogenous compound product type where each instantiation has a fixed shape (and size) and each instance stores its elements inline within the scope of its owning code block or outer object.  Not only will concrete types be allowed, but existential forms can be used as generic parameter constraints.
(Future feature: let the existential forms be used as objects that can contain any value of a conforming non-existential type.  But this should be held off until generalized existentials are added across all types.)
Detailed design

Note: I don't know much about the deeper levels of the compilation process.  There'll be a lot of implementation questions.
Declaration

A grid array differs from a standard array by the addition of a grid specifier in the former:

GRAMMAR OF AN ARRAY TYPE
array-type → [ type ]

array-type → locked_opt [ grid-specifier type ]
grid-specifier → grid-extents_opt ;
grid-extents → ...

grid-extents → grid-extent-list

grid-extents → grid-extent-list , ..._opt

grid-extent-list → grid-extent | grid-extent , grid-extent-list
grid-extent → _

grid-extent → private | open

grid-extent → expression

A grid array type declaration is a grid array existential if either the element type is an opaque type, at least one extent is existential, or both.  An extent is existential if it's the ellipsis punctuator or the wildcard.  A grid array type that isn't an existential is a definitive grid array type.  A fully definitive grid array type uses expressions for all the extents (if any), with none being private or open.
Using private or open for extents is forbidden if there is also an existential extent and/or an opaque element type.
The locked specifier is allowed only when the rest of the type is a grid array existential.  Grid array existential types that differ only in the presence/absence of the locked specifier are distinct types.
Interpretation

A definitive grid array sets up a fixed-size block of scoped storage for the given count (which is the product of all the extent sizes) of the given element type.  When not laid out in a SIMD register, it should take a block with the same alignment as the element type and have a stride of the element type's stride multiplied by the element count.  An empty (i.e. one with zero elements) array should take zero space if it's a sub-object, and minimal (but still aligned) space otherwise.
A private extent means that it will be either implied by the initializer (if given) or specified in a later statement.  The evaluation must be done by the time types' structures are laid out; a compile-time error occurs if a specified expression is run-time based or otherwise can't be evaluated in time.  It's a compile-time error if multiple initialization paths exist and their values for the extent differ.
Since the value of a private extent must be realized by type-layout time, it has the same affect on layout as an expression-based extent.
If a grid array with a private extent does not appear with an initializer, including contexts where one can't ever be added, then it can't be (part of) an ABI-public declaration.  Otherwise, it can appear in a code block (including a file's top-level block), a function parameter or return type, a stored property of a class or struct, or part of an enum case's payload.  Such grid arrays can also be the element type of another definitive grid array, a tuple member, or a nested tuple member, as long as the outermost array or tuple is under the same restrictions as stand-alone grid arrays with private extents.  An alias to a grid array type cannot have the type (or a nested array type) use a private extent.
An open extent means that it will be either implied by the initializer (if given) or specified in a later statement.  But this time the specified value can be run-time based, and can differ among separate initialization paths.  (The extent still can't be resized once initialized.)
Since the size and shape of a grid array with an open extent has its full layout delayed until runtime, it'll probably need a different implementation.  This flexibility prevents struct, enum, and tuple types from supporting such grid arrays.  I guess the code block or class's internals will keep accounting information on the open grid array's shape, and the memory offsets of later sibling properties.
A grid array type with an open extent can appear in a code block (including a file's top-level block), a function parameter (but not return type), as a stored property of a class, or the element type of another definitive grid array.  When such grid arrays are another's element type, the outermost array is under the same restrictions as stand-alone grid arrays with open extents.
Expression-based extents are non-negative Int values.  But if there is exactly one extent, it may be a one-dimensional Int grid array instead; such an extent expression works as if its elements were exploded to the base declaration.
An ABI-public expression-based extent is either a literal or must have its components also be ABI-public.
An empty extent list acts as a zero-dimensional grid array, which have exactly one element.
A grid array existential with the locked specifier means its matching definitive grid array types must have their size locked at layout time.  This means that a matching definitive array must not have an open extent and the element type must be either not a grid array type or a grid array type that itself would be permitted under a locked specifier.
A grid array, definitive or existential, is a subtype of a given grid array existential if the source's shape equals or is a subtype of the existential's shape and the source's type equals or is a subtype of the existential's type.

An opaque type matches element types that conform to or subtype the constraint.
An ellipsis matches all shapes, regardless of count or value, including another ellipsis.
A wildcard matches all values, including private, open, or another wildcard.
An expression matches the same value.  Extents that are private get resolved before match testing, while an open extent always fails to match.
A locked existential is a subtype of the unlocked existential with the same extents and element type.
A type with an open extent, or has a nested grid array with an open extent as its element type, can never match a locked existential.

By these rules, [... ; some Any] is the super type of all grid array types.
Note: ABI-public declarations are ones that are open, public, or @usuableFromInline internal.
Note: Extents that are private or open have to resolve to non-negative Int values too.  A runtime error occurs if an open bound is set to a negative value.
Query: Should matching an open extent to an expression-based existential bound be allowed with runtime checks?  Maybe with as/as?/as!?
Query: For a grid array with an open extent in a class, do we allow any number of such stored properties, anywhere in the declaration, and anywhere in the hierarchy?  Or should it be restricted to one instance, only for final classes, and only as the last stored property?
Note: A grid array (possibly nested) chain of private and open extents results in a type with both restrictions; it can only be used in a code block, class ABI-hidden stored property, or a parameter for an ABI-hidden function.
Opaque Types

The list of possible constraints for an opaque type shall be extended to include grid array existentials.
Initialization

An instance of a grid array is definitively sized if its declaration has an initializer or its type is a fully definitive grid array.  Declared instances that are not definitively sized must be either initialized by a single expression, which automatically sizes them, or reified to a shape and element type before its elements can be separately initialized.
Grid array reification is encapsulated with a reification statement:

GRAMMAR OF A REIFICATION STATEMENT
reification-statement → reify expression as type

Where the expression must name a non-nested definitive grid array in scope or a definitive grid array member of a (possibly nested) tuple in scope.  A qualifying instance-level stored property can only be reified in its owning (nominal) type's initializers.  The destination type must be a fully definitive grid array.  When the grid array's element type is itself a definitive grid array, that nested type must be fully definitive (i.e. gets reified) within its wrapping type's reification.
If a grid array is reified more than once, the subsequent statements must use the same destination type as the first statement.
A reified grid array can then be initialized:

In whole by an assignment
Each element individually with assignments to static dereferences.
Each element in a collective with a for-inout statement.

Query: Should we allow individual element or subset definitive initializations via dynamic (i.e. subscript) dereferences?  Can the DI system be made to track this?
Query: What should happen if a for-inout loop is doesn't initialize every element, but in such a way that the declared array is still around?  (A for-inout loop that ends with a throw that that tears down the loop and the matching array's scope wouldn't count, for example.)  That means that there will still be uninitialized elements.  Can the DI system be made to track this?  I think the recent feature list needed to support SwiftUI forbids loops that could end with break/continue; maybe we could specify something similar.
Literal Assignment & Storage

Besides expressions evaluating to a grid array type, an array literal can be used as the source of values.

For a one-dimensional array, terms fill from the one for index 0 upward.
For a zero-dimensional array, there can only be one term, which is assigned to the sole element.
For multiple inline dimensions, the terms fill in a row-major order.  The first term is assigned to the element with all zero-valued indices.  Then the rightmost index is increased by one, except it resets to zero if that index was already at its maximum and the term to its left increases one step (which may cascade if multiple indices were at their maximum).

All the terms for an inline-multidimensional grid array are flat without nesting, since a nested syntax could cause conflicts when the element type itself supports array-literal syntax.
If a literal is used as a declaration initializer, and the grid array uses exactly one private or open extent, then the compiler can use the term count to determine the free extent.
// The extent is implied to be 5.
let myArray1: [private ; Int] = [1, 2, 3, 4, 5]

// The first extent is implied to be 2.
let myArray2: [open, 3 ; Int] = [1, 2, 3, 4, 5, 6]
If the number of terms cannot match the specified shape, then an error occurs.
let myArray3: [3 ; Int] = [1, 2, 3, 4]  // Error
let myArray4: [3 ; Int] = [1, 2]  // Error (The remaining elements are not default-initialized!)

let myArray5: [private, 3 ; Int] = [1, 2, 3, 4, 5]  // Error (5 % 3 != 0, so the first extent cannot be figured out.)
For a grid array of grid arrays, if the inner type uses private or open extents, then at the outer array level, those extents for each element must resolve to the same values.
let myArray6: [2 ; [open ; Int]] = [[1, 2, 3], [4, 5]]  // Error
let myArray7: [2 ; [private ; Int]] = [[1, 2], [3, 4]]  // OK
Note: This was already required for private extents, but here we're specifying it for open extents.  In other words, open cannot be used to create ragged nested arrays.
When a grid array uses a memory block, its elements are stored in the same row-major order above.  Storage when a SIMD register is used is unspecified.
An initialized grid array can have its elements read (and possibly written) as a flattened buffer with the following global functions:
func withFlattening<T: [... ; some Any], R>(of: T, body: (UnsafeBufferPointer<T.Element>) throws -> R) rethrows -> R

func withMutableFlattening<T: [... ; some Any], R>(of: inout T, body: (UnsafeMutableBufferPointer<T.Element>) throws -> R) rethrows -> R
The elements in the buffer are in row-major storage order.
A grid array implemented with a SIMD register is converted forth and back with use of the flattening functions.  Use of a SIMD register should be an optimization when the compiler can figure out that a grid array won't escape its code block (not counting the use of the flattening functions above).
Query: I don't know if I should impose any restrictions of when the compiler can layout a grid array as a SIMD type.  Or if I'm doing the right restrictions.
Dereference

An element of a grid array can be accessed with a static dereference to its tuple member number.  For non-empty arrays, the numbers start from 0 going upward to one below the element count, which each number representing its element's (row-major) storage offset.  An element can be initialized through this kind of reference.
let myArray8: [; Int]
let myArray9: [2, 2 ; Int]
myArray8.0 = 6
myArray9.0 = 1
myArray9.1 = 2
myArray9.2 = 3
myArray9.3 = 4
The compiler can check if a tuple number is in range for arrays without an open extent.  Otherwise, the check has to occur at run-time and causes a run-time error on violation.
The subscript operator can be used for dynamic dereference.  It can't be used for initialization, but can still allow reading and writing.  A grid array has two forms, one subscript form takes Int arguments with the same count as the array's extents.
var myArray10: [private ; Int] = [1, 2, 3, 4, 5]
print(myArray10[0])  // Prints "1"
myArray10[0] = myArray9[1, 1]
print(myArray10[0])  // Prints "4"
There is a second subscript form that takes a one-dimensional Int grid array, where the index array has the length equal to the source array's extent count.
myArray10[[1]] = myArray8[[]]
print(myArray10[[1]] - myArray9[[1, 0]])  // Prints 6 - 3 i.e. "3"
Note that a subscript call cannot use zero arguments, so a zero-dimensional array can only have its sole element referenced either by myArray.0 or myArray[[]].
Looping

With flattening, a grid array can be treated as a Collection and used with a regular for-in loop:
let myArray11 = //...
withFlattening(of: myArray11) {
    for e in $0 {
        //...
    }
}
That doesn't give you the corresponding element indices, though.  There is an instance property that will, as some Sequence of the indices in the row-major storage order used in flattening.
And so you can iterate together:
var myArray11Iterator = myArray11.indices.makeIterator()
withFlattening(of: myArray11) {
    for e in $0 {
        let i = myArray11Iterator.next()!
        //...
    }
}

// Or, better:
for i in myArray11.indices {
    // do something with myArray11[i]...
}
All of these assume the grid array is already initialized.  Otherwise, the new for-inout loop can be used to initialize all the array's elements.
let myArray12: [3 ; Int]
for e inout myArray12 {
    e = 2
}
The index of the current element can be probed by the new #indexOf primary expression.
let myArray13: [4 ; [3, 5; Int]]
for f inout myArray13 {
    for g inout f {
        g = #indexOf(f).0 + #indexOf(g).toArray().reduce(1, *)
    }
}
The argument to #indexOf is any in-scope for-inout element placeholder.  (If two loops in a nesting use the same identifier for their placeholder, #indexOf will reference the innermost one.)  It has a return type of [T.extentCount ; Int], where T is the grid array's type.  A successfully initialized element is recognized as such by Swift's definitive initialization system.
A mutable grid array can go through a for-inout loop multiple times, where any initialized elements get reassigned and ones that skipped initialization finally get initialized.
var myArray14: [6 ; Int]
for e in myArray14 {
    guard #indexOf(e).0 < 3 else { break }
    
    e = #indexOf(e).0 * 2
}
for f in myArray14 {
    guard #indexOf(f).0 >= 3 else { continue }

    f = #indexOf(f).0 * 15
}
But you should probably avoid writing loops that could skip elements during initialization but still keep the elements around.
An immutable array can go through a for-inout loop after its initialization, whether that initialization happened through a previous for-inout loop or not, but the placeholder is immutable too.  This means that a for-inout loop that initializes an immutable array but skips some elements can only have the array's initialization completed by individual element initializations (using static dereferences).

GRAMMAR OF A FOR-INOUT STATEMENT
for-inout-statement → for identifier inout expression code-block


GRAMMAR OF AN ITERATION INDEX EXPRESSION
iteration-index-expression → #indexOf ( identifier )

Note: The indices property expresses a some Sequence that is a value type that vends [T.extentCount ; Int] values, where T is the receiver's type.  Since a given grid array instance has a fixed shape, possibly determined only at runtime, we could make indices be an opaque Collection instead.
Note: List for-inout statements under loop-statement.  List iteration-index expressions under primary-expression.
User-Level Interface

New statements for supporting grid arrays are the already mentioned reification and for-inout statements.
The already mentioned iteration index expression is the only new expression for supporting grid arrays.
For new global functions in the standard library supporting grid arrays, withFlattening(of: body:) and withMutableFlattening(of: body:) have already been mentioned.  Other functions include:

If variadic generics get supported: func makeArray<T, variadic U>(_ t: (T, U...)) -> [1 + #length(U) ; T] where U == T.  This converts the given homogenous tuple to a one-dimensional grid array of the same length.
If macros get supported: func makeTuple<T: locked [@constexprVisible _, some Any]>(_ a: T) -> ( #dup(T.staticCount ; T.Element) ).  This converts the given one-dimensional grid array to a homogenous tuple.
One that takes two one-dimensional locked grid arrays of the same element type and concatenates them to a longer one-dimensional grid array.  When variadic generic parameters are supported, change the function to take any positive count of one-dimensional locked grid arrays with the same element type.

Besides the already mentioned indices instance property, static dereference tuple number members, and dynamic dereference subscript operators, there are other interface bits attached to each grid array type or instance:

Element is a type-level associated type that aliases the grid array's element type.
DeepElement is the same as Element if the latter is not a grid array, but equals Element's DeepElement otherwise.
Optional: If strong type-aliases get supported, and/or we recognize current tricks that have that effect (like a struct with one stored instance property), RawElement is the same as DeepElement if the latter isn't a poser for a grid array, but equals DeepElement's RawElement if it is.
extentCount is a type-level Int property equaling the number of extents for the array type.
extents is an instance-level [Self.extentCount ; Int] property that is a list of the instance's extents.  (It's instance-level due to open extents requiring runtime support.)
count is an instance-level Int property with the count of receiver's elements.
isEmpty is an instance-level Bool property checking if count is zero.
For lockable grid array types (i.e. ones that can conform to a locked existential), staticExtents, staticCount, and staticIsEmpty are type-level properties of types [Self.extentCount ; Int], Int, and Bool respectively that equal their instance-level analogues.
Only for zero- and one-dimensional arrays, toArray() is an instance-level method returning the elements of the array, in order, as a standard array (in other words, [Element]).
For grid array types that can be locked, a map method that takes a conversion closure and returns an array of the same shape but with its element type the same as the closure's return type.
Versions of the reduce(_: _:) and reduce(into: _:) methods from Sequence.

To-Do: Once constexpr and value-based generic parameters are supported, add a global function to stitch a given list of lockable grid arrays with the same element type, number of extents, and values of those extents except at a given index of T.staticExtents.  The generic value argument is that index.  The stitching occurs along that given extent axis.
To-Do: Add function that collapses a lockable array of arrays to a grid array with the two sets of extents joined together.  Add a variant that fuses all nesting levels.
To-Do: Add function to transform an inline multi-dimensional lockable array to nested one-dimensional arrays.  Add a variant the de-layers a nested grid array type first.
To-Do: Add variants of extents, count, isEmpty, staticExtents, staticCount, and staticIsEmpty for deeper elements (like going from Element to DeepElement).
SIL-Level Interface

???
Other Behind-the-Scene Mechanisms

The witness table for grid arrays would have to at least include:

pointer to the element's witness table entry
a cache copy of the number of total elements (synthesized from the product of all the extents)
the number of extents
a flexible array of the extents, probably as the last part of the entry

For a non-nested grid array with an open extent, the owning scope probably will have to keep something like a witness table entry for that instance's runtime type.  A nested grid array with an open extent would still need a custom with table entry, but the pointers may cross between the owning scope's accounting information and the binary's witness table.
Otherwise, like other parts of the witness table, other metadata, argument and/or return-value passing mechanisms, etc.: ???
IR Generation

The implementation should use whatever optimized but still workable mechanisms to (eventually) emit to byte code, buffer pointer and length, array primitive, SIMD primitive, etc.
Besides that: ???  (I don't even know if this should be a separate section from the previous behind-the-scene mechanisms.)
C Conversions

C only supports one-dimensional arrays, and the extent must be a literal.  C++ allows extents to be constexpr expressions.  These types are trivial to convert between C(++) and Swift.
Their "multidimensional" arrays are just nested arrays, like Swift copied for our standard arrays.  These can be translated trivially, but there's another option.  C's design prevents the recognition of partitions between groups of extents.  (For example, a 2x3 array of a 4x5 int would be int[2][3][4][5], throwing away the partition point between the 3 and 4.)  A custom translation from C to Swift can restore partitions between extents with inline-multidimensional arrays.  Conversely, a translation from Swift to C loses partition information unless the would-be inner array type is replaced with a same-representation struct.
A C array with unknown bound/size can be translated back & forth with a Swift array with an open bound.  C arrays with variable bound, like for a function parameter, can also be translated to a Swift array with an open bound.  This may replace the current adaptation of such function arguments to Array.  Due to reification, a C function with variable length array parameters can be implemented in Swift.
constexpr and the Like

In C, array extents had to be literals (possibly hidden behind a macro) and therefore had their values visible early.  C++ added constexpr support for extents and could delay knowledge of an array type using such an extent until the type's layout was needed.  The Swift model described above also delays need of an array types extents until layout time, but right now that excludes anything like C++'s constexpr, since Swift doesn't have that in general.  I tried to design the grid array feature to not require constexpr.  However, if constexpr/consteval support gets added and a parameter can be determined early, like AST/SIL-time, then grid arrays of possible-constexpr types can themselves be used in further constexpr contexts.
constexpr contexts can't be done for all arrays due to runtime support for variable-length arrays, i.e. grid arrays with open extents.
Any other part of the process I missed

???
Source compatibility

Since grid arrays are a new feature, the only source compatibility concerns are the identifiers that will be pulled to be (conditional) keywords.  And like other conditional keywords, use of back ticks can force a keyword to be acknowledged into an identifier context.  Current use of grid array syntax should results in errors.
Array literals should still match to instances of the Array nominal type by default before matching grid arrays, unless the context demands the destination is a grid array.  (This is the similar to other literals in code having default matching types, like Int for non-fractional numbers, Double for other numbers, etc.)
Effect on ABI stability

Since this is a new feature, it shouldn't have any effect on existing code.  We will need to discuss what kind of backwards compatibility we want.  Could a function change from having a parameter taking something from the Unsafe* pointer family to a grid array and have the array's address silently convert to a pointer?  If not for completely shaped-locked grid arrays, what if we restrict the array to having at least one open extent; would it be OK then?
But this being a new primitive with new witness table types and support code means grid arrays have to be premiered in a new major version of Swift.  This would probably restricted to an (at least) semi-major version of macOS and other Apple platforms.
Effect on API resilience

Since existing code doesn't reference grid arrays, grid arrays shouldn't break their ABI compatibility.  Adding a reference to a grid array to existing ABI-visible items would break API and ABI.
Alternatives considered

Describe alternative approaches to addressing the same problem, and
why you chose this approach instead.