Skip to content

Instantly share code, notes, and snippets.

@CTMacUser
Last active April 28, 2021 17:37
Show Gist options
  • Save CTMacUser/cfffa526b971d0e1f3a079f53c6819bb to your computer and use it in GitHub Desktop.
Save CTMacUser/cfffa526b971d0e1f3a079f53c6819bb to your computer and use it in GitHub Desktop.
Swift proposal for fixed-size array types

Fixed-Size Arrays

During the review process, add the following fields as needed:

Introduction

This proposal adds fixed-size arrays to the language.

Swift-evolution threads:

Motivation

Statically sized arrays are a classic programming type not currently in Swift. Parts of this feature can be simulated with the current arrays (which are collections) and tuples (with homogenous members). Current arrays have the array interface, but have their elements in remote storage and can't be optimized with low-level features like vector-processing types. Homogenous tuples have the reverse trade-offs; they store their elements in local storage but can't use the array interface. Also with tuples, any array optimizations would have to be a (semi-)secret "array mode" and would forfeit any possible tuple optimizations. The user would have no control which set of optimizations happens, since it may be non-obvious if tuple members have types hidden behind aliases.

This lack of statically sized arrays also makes interfacing with code from other languages that have and use their version of arrays awkward.

Proposed solution

The solution is to add fixed-size arrays as a separate set of types at the user (and ABI) level. These arrays are compound types parameterized on their element type, the number of coordinates needed to reference an element, and number of valid values for each coordinate. [Since we currently have neither variadic generic parameters nor value-based generic parameters, many manipulations on fixed-size array indexes aren't possible. So we should add both as soon as possible.] The count of coordinates and length per coordinate of an array are that array's dimensions. A particular array's set of dimensions is that array's shape.

A fixed-size array type is written as the shorthand array syntax, for what we now call flexible-size arrays, but with the bound(s) preceding the element type:

[6; Int]

[4, 5; Double]

Each bound is a nonnegative integer, but zero is allowed only for empty arrays. There must be at least one bound, as a zero-dimensional array of type T can be represented as type T directly. A semicolon separates the dimensions from the element type; a comma separates the dimensions. An element is referenced with the subscript operator:

a[3]

b[1, 4]

Valid values of a subscript index are at least zero to one less than the length of the corresponding dimension. The number of indexes must match the number of dimensions; the indexes are co-equal and partial specification has no meaning.

The indexes of a subscript can be passed in individually or as a one-dimensional integer array:

a[ [;3] ]

b[ [;1, 4] ]

This helps when the index set is passed as a single object (like in for-in loops).

Accessing an element for a non-existent index triggers the same run-time error doing it for a flexible-size array does.

An element can be statically indexed with a single number, like tuple fields. The number can range from zero to one less than the number of elements. The static index refers to an element's storage order, where index zero is the element at the same address as the start of the array proceeding further into memory. For a six-element array x, its elements can be statically indexed as:

x.0 x.1 x.2 x.3 x.4 x.5

Storage order ignores the shape of an array, so the above element indexing set works for [6; T], [2, 3; T], [3, 2; T], and arrays extending these types with extra bounds of value 1 (to keep the product at 6). For multi-dimensional arrays, mappings to storage order work as if the dimensions are in row-major order (i.e. the first dimension has the widest span between consecutive index values and the last dimension has elements with consecutive index values adjacent in memory).

A subscripted or statically-indexed element can be read and, if the array object is a variable instead of a constant, written. The element participates in this via assignment, inout parameter passing, or any other operation that element type supports. The array object as a whole can also participate in assignment or inout parameter passing.

Nested Arrays

Almost any type can be a fixed-size array's element type, including another fixed-size array type. This is referred to as nesting:

[3; [7; Int]]

An object of this type is a three-element array. Each element is itself a seven-element integer array. A fixed-size array has a nesting of at least one. Precisely, it's one plus the nesting of the element type, with types that are not fixed-size arrays having a nesting level of zero. For any fixed-size array type, the element type found after removing all nesting is the inner non-array type. (Flexible-size arrays are not considered here, so [4; [Int]] has a nesting level of 1 instead of 2, and [[4; Int]] has a nesting level of 0.)

Empty Arrays

An empty fixed-size array is one that specifies no elements. This may only be done with a one-dimensional array with a dimension-length of zero. Objects of these types take zero space (but still keep their alignment requirements).

Empty arrays are the only fixed-size array types that cannot be an element type of another fixed-size array type.

Array Values

A sized array literal starts and ends with the square brackets, like dictionary and standard array literals, but is partitioned with a semicolon within. The semicolon separates the (possibly implied) array bounds on the left and any possible initialization terms on the right.

An initialization term can be:

  • A plain expression, like the terms in a standard array literal. The expression's value initializes the element whose storage order is at the same position as the term is in the literal.
  • A single number (compile-time integer expression) followed by a colon and then a value. The number to the left of the colon is the index of the element to receive the value on the right of the colon.
  • A one-dimensional fixed-size array of integers (compile-time integer expression(s)) followed by a colon and then a value. The integer array to the left of the colon is the index set of the element to receive the value on the right of the colon.
  • The keyword default followed by a colon and then a value. Any element of the array not covered by any other initialization term receives the value on the right of the colon.
  • The keyword func followed by a colon and then a closure. The closure takes a one-dimensional fixed-size array of integers and returns a value. The closure is called for each element, passed that element's index (set), and the element is set to the return value.

Elements must be covered by at most one term. Elements not covered by any term are uninitialized.

There can be at most one function term, default term, array-index term with a specific index set value, or single-number term with a specific index value in a sized array literal, while considering an array-index term with an index of length one to be equivalent to a single-number term with an index of the same value. A function term cannot co-exist with any other term. A default term, if present, must be the last term in the literal. All array-index terms in a literal must have the same index set length. Single-number and array-index terms with index sets of length one can co-exist only with each other and themselves (and a possible default term). Plain-expression terms can co-exist only with themselves and a possible default term.

The bounds of the literal work the same as the bounds on a fixed-size array type declaration. However, the bounds (but not the separating semicolon) may be omitted, meaning the array literal's shape is a one-dimensional array of a length equal to the number of initialization terms. A literal with an implied length can only have plain-expression terms; any other kind of term can only be present in a literal with an explicit bound (set). Array-index and single-number terms must have their index be a valid index (set) value relative to the bounds.

The number of plain-expression terms cannot exceed the number of elements in the array.

The element type of the array is determined by either the return type of a function term's closure, or by the type consolidated from all the term values (like in dictionary and standard-array literals).

A literal with no terms indicates an empty array. Such a literal can have either an implied bound or an explicit bound consisting of a single zero value. Like empty dictionary and standard-array literals, the element type cannot be inferred without additional context.

Reshaping Arrays

If two fixed-size array types share a inner non-array type and are of the same size (i.e. count of sub-objects of the inner non-array type), then an object of one type may be cast to the other type with unconditional as. The corresponding elements match in storage order.

let a = [;1, 2, 3, 4]
assert(a[0] == 1)
assert(a[1] == 2)
assert(a[2] == 3)
assert(a[3] == 4)
let b = a as [2, 2; Int]
assert(b[0, 0] == 1)
assert(b[0, 1] == 2)
assert(b[1, 0] == 3)
assert(b[1, 1] == 4)
let c = a as [2; [2; Int]]
assert(c[0][0] == 1)
assert(c[0][1] == 2)
assert(c[1][0] == 3)
assert(c[1][1] == 4)

Uninitialized Values

An object's set of initializable sub-objects is the union of the following:

  • The object itself
  • If the object is of a tuple type, the union of the sets of initializable sub-objects for each member
  • If the object is of a fixed-size array type, the union of the sets of initializable sub-objects for each element

Objects of named or function types are completely initialized in one shot, by either an initializer call (except for function types) or a possibly delayed initialization-assignment. Tuple and fixed-size array objects are initialized when all of their inner neither-tuple-nor-array sub-objects are initialized. A compound object's innermost sub-objects can be initialized individually and/or in larger enclosing groupings (including the entire tuple or array at once). The existing deterministic initialization (DI) checker makes sure each (sub-)object is initialized before it's read for the first time.

In other words, DI effectively ignores tuple and fixed-size array objects proper for tracking their innermost sub-objects instead. Maintaining DI discipline is not a problem for tuples, but may be for fixed-size arrays since a common interaction mode:

var test: [5; Int]
for ii in 0..<5 {
    test[ii] = ii
}

uses indexes determined at run-time, so DI cannot detect when an element accessed this way is initialized. Work-arounds are:

  • Move the loop to be the closure of a function term of a sized array literal initializing the array. This requires all of the data needed to initialize any and every element (that can't be generated within the loop) to be prepared before initialization.
  • Before the loop, initialize the array with dummy value(s). Since these dummies are immediately replaced with the real initial data, the time generating the dummies is wasted.
  • Use static indexing to initialize the elements. This forgoes using a loop.

[Future Directions: Under restricted circumstances (local arrays within functions, and stored instance array properties within designated initializers), allow run-time DI. This would be something like a Boolean array of the same shape as the target array that indicates which elements of the target are initialized. This could be associated with a type attribute for function arguments that would pass the target inout by reference and secretly bring the initialization map along (also by reference), so the code to initialize an array's elements doesn't have to be in one chunk.]

Element Traversal

Besides addressing each element individually, traversal can be automated. The for-in loop works for any fixed-size array type:

var total = 0
let data = [2, 2; 1, 2, 4, 8]  // [2, 2; Int]
for x in data {
    total += x
}
print(total)  // Prints "15"

The order the elements are visited is unspecified, and visitations may overlap. Individual dimensions need to be looped over to control the order:

precondition(ArrayStats.dimensions(ofValue: data) == [2, 2])
for i in 0..<2 {
    for j in 0..<2 {
        print("data[\(i), \(j)] == \(data[i, j])")
    }
}

/*
    Prints:
        data[0, 0] == 1
        data[0, 1] == 2
        data[1, 0] == 4
        data[1, 1] == 8
 */

The ArrayStats generic type provides static members to give parameters of a fixed-size array's shape and elements.

If the index (set) of an iterated element is needed, but without any traversal constraints, the current index can be returned with the #indexOf operator as a fixed-size integer array of the number of dimensions of the targeted loop's source. The argument to the operator is a label to a current for-in loop that goes over a fixed-size array. For example:

import Foundation

let offset = [4, 5, 6; func: {
    Double($0.0 * $0.0 + $0.1 * $0.1 + $0.2 * $0.2)  // dot product of the index vector with itself
}]
var moreData: [3, 2; [4, 5, 6; Double]] = [3, 2; func: { outerIndex in
    let exponent = outerIndex.0 + outerIndex.1
    return [4, 5, 6; func: { innerIndex in
        let base = innerIndex.0 * innerIndex.1 * innerIndex.2
        return pow(base, exponent) - offset[innerIndex]
    }]
}]
outer: for x in moreData {
    inner: for y in x {
        var outerIndexRaw = #indexOf(outer)  // Normally, this would be in the outer loop...
        var innerIndexRaw = #indexOf(inner)
        let outerIndex = withUnsafeFlattening(of: &outerIndexRaw) { Array($0) }  // ...and this too.
        let innerIndex = withUnsafeFlattening(of: &innerIndexRaw) { Array($0) }
        print("moreData\(outerIndex)\(innerIndex) == \(y)")
    }
}

Nested loops can access the index sets of enclosing array loops.

The library provides two functions, withUnsafeMutableFlattening(of: _:) and withUnsafeFlattening(of: _:), to go over a fixed-size array as a buffer. These functions can be used to pass along an array to an algorithm that takes a block of objects with a specific element type (or non-specific if the algorithm itself is generic) but any element count and/or any array shape. A non-fixed-size array object counts as a zero-dimensional array here and can be passed along as a buffer with a singular element.

let sumOfInteriorDoubles = withUnsafeFlattening(of: &moreData) {
    $0.map { a in withUnsafeFlattening(of: &a) { b in b.reduce(0, +) } }.reduce(0, +)
}

Multi-dimensional arrays are flattened to a single index per element. An element type that itself is a fixed-size array type is not de-structured. (Like the preceding example, a nested call to a flattening function is needed per nested array type.) If you need the specific index an element has relative to the original array, you need to call enumerated() on the buffer and then reverse-engineer the original index set from the storage offset.

[Should overloads of these functions be provided for flexible-size arrays?]

Element-wise Cast

If there exists a cast between two types, then a similar cast exists between two fixed-size arrays with those original types as their respective element types, as long as the array-level types are of the same shape. If the two element types cast with unconditional as, so do the array types. If the element types cast with as?, so do the array types, but the cast fails with nil only if at least one element-level cast fails. If the element types cast with as!, then act as a forced unwrapping of the result of an array-level as? cast.

let d = [4, 3; 1, 2, 3, default: 0]
let e = d as [4, 3; Double]

Changing the element type and array shape requires two casts in a row. The implementation should elide these to a single operation if possible.

let f = d as [4, 3; Double] as [3; [2, 2; Double]]
let g = d as [3; [2, 2; Int]] as [3; [2, 2; Double]]

[Future Directions: The shape of the source array has to be manually repeated in the destination type. If variadic generic parameters and value-based generic parameters are added, there should be global functions that perform these conversions, corresponding to as, as?, and as!. They would take a function argument of the source array type and an explicit generic argument of the destination element type (and implicit generic arguments of the shape and source element type), returning an instance of the destination array type or an (implicitly unwrapped) optional thereof. Maybe there should be overloads for flexible-size arrays (needing only the destination element type explicitly given).]

Tuple Conversion

Any empty fixed-size array type can be cast to or from the empty tuple type using unconditional as. A non-empty array type can be cast to or from tuple types with members that are of type T, tuple types that match this rule with the same T, and/or fixed-size array types that match this rule with the same T as a direct or nested element type, where T is one of the first array type's nested types between its direct element type and its inner non-array type (inclusive). The array and tuple types must have the same count of sub-objects of type T. Correspondences between sub-objects are storage order for the array type and (recursively-defined) declaration order in the tuple.

[For a tuple type, declaration order currently matches storage order, but changing that is a theoretical possibility given in the Swift ABI documentation.]

let h = d as (Int, Int, Int, Int, Int, Int, [2; (Int, Int, Int)])
assert(h.0 == 1)
assert(h.1 == 2)
assert(h.2 == 3)
assert(h.3 == 4)
assert(h.4 == 0)
assert(h.5 == 0)
assert(h.6.0.0 == 0)
assert(h.6.0.1 == 0)
assert(h.6.0.2 == 0)
assert(h.6.1.0 == 0)
assert(h.6[1].1 == 0)
assert(h.6.1.2 == 0)
let i = h as [2; [3, 2; Int]]

Detailed design

Bit-Code Considerations

Arrays should use the bit-code translator's array primitive. For example, an array of eight 16-bit integers would use the [8 x i16] primitive in LLVM.

A multi-dimensional array would be translated as a series of nested array primitives, with the first bound being the outermost length and subsequent bounds going inward. (This is row-major order.) For example, a three-by-four array of 16-bit integers would use the [3 x [4 x i16]] primitive. (There will be more nesting if the element type is also an array type.)

Vector-Mode Implementation

If the vector attribute is applied to the type of an array object, its data layout is computed like the attribute was absent, then replacement proceeds as:

  1. If the array is empty, vector attribution is not applied.
  2. If there is a vector primitive matching the innermost array type's element type and count, use that primitive.

[Future Direction: Expand the search to include all array nesting levels. Exact multiples of a vector primitive's length are acceptable. But the boundaries between vector primitives have to match boundaries of the lower-nested array; no straddling boundaries. However, doing this would require more of the index computation happen at higher layers.]

For example, we translate [4; [3, 4; Int16]] while the bit-code translator supports both 4- and 16-count of a 16-bit integer in vector mode.

  • Non-vectorized bit-code representation: [4 x [3 x [4 x i16]]].
  • The [4 x i16] is translated to <4 x i16>.
  • The result is [4 x [3 x <4 x i16>]].

There are 48 total 16-bit integer elements, so a [3 x <16 x i16>] could have been possible. But such a layout would have the boundaries of the vector primitive not line up with the boundaries of the user-level view. That is not allowed since it may screw up element-block slicing (by straddling vectors).

ABI Considerations

An array is a block with the same alignment as its element type and a stride of its element type's stride multiplied by the element count. When addressing each element by its layout order, a given element's byte offset is the layout index multiplied by the element type's stride.

For a multi-dimensional array, the layout index of an element is determined by the dot product of the element's index vector and the array type's stride vector. The stride vector is determined by:

  • Have a copy of the dimension-length vector.
  • Set the last element of the stride vector to 1.
  • For each element of the stride vector from last to first:
    • Multiply the current element of the stride vector by the corresponding element of the dimension-length vector.
    • Set the resulting product as the value of the previous element of the stride vector. (Drop the product from the first element, which equals the array's element count, to oblivion.)

(The stride vector of a one-dimensional array is just [1].)

An array is trivial if its element type is. An array is bitwise-movable if its element type is. Zero-sized arrays are allowed in structs and tuples, but do not take any storage in their enclosing aggregates.

The common metadata layout needs two new kinds for arrays. The kinds differ in that one uses a vector-unit for data implementation and the other does not. The two kinds transparently interact at the user level, using ABI-level functions to facilitate conversions. Specific data for either kind of fixed-size arrays should be the dimension-length vector, cached copies of the element count and the stride vector (so they won't have to be re-derived from the dimension-length vector), and a reference to the element type's metadata.

[I'm not sure how much mangling information is needed for arrays. Tuples only got one mention in that part of the ABI document.]

Declaration Attributes

vector

Apply this attribute to the type of a fixed-size array to implement the elements of the innermost nested array layer [Future Direction: innermost layers] as a processor-level vector-unit. The substitution will not occur if the platform does not support any type/length combinations usable for implementation.

[Future Directions: Add a parallel attribute to the declaration of a scoped property or object to mandate parallel processing of elements during initialization and for-in loops. Add a fragmentary attribute to the declaration of a function parameter or a scoped property or object to suspend static DI for the array's sub-objects and use dynamic DI to track which sub-objects have been initialized.]

Grammar

Change the only current production of "Grammar of an Array Type":

array-type[ array-bound-specifier_opt type ]

Add these productions to that same grammar:

array-bound-specifierarray-bound-list ;

array-bound-listarray-bound | array-bound , array-bound-list

array-bounddecimal-literal

If compiler constant expressions are added, an array bound can be a nonnegative integer constant expression.

Add to the "Grammar of a Primary Expression":

primary-expressioniteration-index-expression

Add to "Grammar of a Literal Expression":

array-literal-headerarray-literal-item , array-literal-header_opt

defaulted-array-literal-itemdefault : array-literal-item

sized-array-literal[ array-bound-list_opt ; array-literal-header_opt array-literal-item_opt ]

sized-array-literal[ array-bound-specifier array-literal-header_opt defaulted-array-literal-item ]

sized-array-literal[ array-bound-specifier dictionary-literal-header_opt dictionary-literal-item ]

sized-array-literal[ array-bound-specifier dictionary-literal-header defaulted-array-literal-item_opt ]

sized-array-literal[ array-bound-specifier func : closure-expression ]

dictionary-literal-headerdictionary-literal-item , dictionary-literal-header_opt

Change these current productions in the same grammar:

array-literal[ array-literal-header_opt array-literal-item_opt ]

dictionary-literal[ dictionary-literal-header_opt dictionary-literal-item ] | [ dictionary-literal-header ] | [ : ]

Remove the "array-literal-items" and "dictionary-literal-items" productions.

Add a new "Grammar of an Iteration Index Expression":

iteration-index-expression#indexOf ( label-name )

Library Support

/**
    The array-related statistics of a type, including its element count and type.
 
    You can use `ArrayStats` to find the bounds of a fixed-size array.  This type considers a flexible-size array to be a non-array type.
 
    - parameter T: the type to be analyzed.
 */
enum ArrayStats<T> {

    /// If the analyzed type is a fixed-size array type, then an alias to the array's subscript return type.  Else, the analyzed type itself.
    typealias Element
    /// The (non-inclusive) upper bound for each index shaping the array.  Empty for non-fixed-size array types.
    static var dimensions: [???; Int] { get }
    /// The count of elements for the array.  Same as the product of the elements of `dimensions`.
    static var elementCount: Int { get }
    /// The count of dimensions of the array.  Same as the length of `dimensions`.
    static var cardinality: Int { get }

    /// The count of consecutive fixed-size array subscripts that can be applied to an object of this type.  Zero for non-array types (including flexible-size arrays).
    static var layers: Int { get }
    /// If the analyzed type is a fixed-size array type, then an alias to `ArrayStats<Element>.CoreElement` (i.e. the inner non-array type).  Else, the analyzed type itself.
    typealias CoreElement

    /**
        The (non-inclusive) upper bounds for each index shaping the given object.

        Unlike the property version, a flexible array is returned.  This eases the enabling of further analysis.  For instance, functions equivalent to `elementCount` and `cardinality` are not provided since `dimensions(a).reduce(1, *)` and `dimensions(a).count` work, respectively.

        - parameter value: A value representative of the type to describe.

        - returns: The list of bounds shaping the given value's type.
     */
    static func dimensions(ofValue value: T) -> [Int]
    /**
        The number of times the subscript operation for fixed-size array can be applied to the given object before encountering a flexible-size array or a non-array object, i.e. the nesting layer count.
     
        - parameter value: A value representative of the type to describe.
     
        - returns: The length of extents the given value's type can receive (stopping at non-fixed-size array types).
     */
    static func layers(ofValue value: T) -> Int

}

/**
 Invokes the given closure with a mutable buffer pointer to the given argument.

 The `withUnsafeMutableFlattening(of:_:)` function is useful for accessing every element of the input fixed-size array as a `RandomAccessCollection` and `MutableCollection` instead of using a `for-in` loop.  Since a collection has its elements on a single level, multi-dimensionality is "flattened" away.  If `arg` is not a fixed-size array, then the argument is treated as a "zero-dimensional" array with a reference to `arg` as its sole element.

 The pointer argument to `body` is valid only for the lifetime of the closure. Do not escape it from the closure for later use.

 - Parameters:
   - arg: An instance to temporarily use via pointer.
   - body: A closure that takes a mutable buffer pointer to the elements of `arg` as its sole argument. If the closure has a return value, it is used as the return value of the `withUnsafeMutableFlattening(of:_:)` function. The pointer argument is valid only for the duration of the closure's execution.
 - Returns: The return value of the `body` closure, if any.

 - SeeAlso: `withUnsafeFlattening(of:_:)`
 */
func withUnsafeMutableFlattening<T, Result>(of arg: inout T, _ body: (UnsafeMutableBufferPointer<ArrayStats<T>.Element>) throws -> Result) rethrows -> Result

/**
 Invokes the given closure with a buffer pointer to the given argument.

 The `withUnsafeFlattening(of:_:)` function is useful for accessing every element of the input fixed-size array as a `RandomAccessCollection` instead of using a `for-in` loop.  Since a collection has its elements on a single level, multi-dimensionality is "flattened" away.  If `arg` is not a fixed-size array, then the argument is treated as a "zero-dimensional" array with `arg` as its sole element.

 The pointer argument to `body` is valid only for the lifetime of the closure. Do not escape it from the closure for later use.

 - Parameters:
   - arg: An instance to temporarily use via pointer.
   - body: A closure that takes a buffer pointer to the elements `arg` as its sole argument. If the closure has a return value, it is used as the return value of the `withUnsafeFlattening(of:_:)` function. The pointer argument is valid only for the duration of the closure's execution.
 - Returns: The return value of the `body` closure, if any.

 - SeeAlso: `withUnsafeMutableFlattening(of:_:)`
*/
func withUnsafeFlattening<T, Result>(of arg: inout T, _ body: (UnsafeBufferPointer<ArrayStats<T>.Element>) throws -> Result) rethrows -> Result

[We should have overloads of withUnsafeMutableFlattening(of: _:) and withUnsafeFlattening(of: _:) that take a flexible-size array [T] as the first argument and a closure that takes a Unsafe(Mutable)BufferPointer<T> as its argument for the second argument. This lets flexible-size arrays be traversed the same way as fixed-size arrays, instead of being treated as a singular object.]

Interoperability

The translation of C code with arrays is changed from homogenous tuples to fixed-size arrays with the original bounds as numbers:

MyCType[6] → [6; MyCTypeTranslated]
MyCType2[5][2] → [5; [2; MyCType2Translated]]

(Multi-dimensional bounds, i.e. [5, 2; MyCType2Translated], cannot be created since there is no way to mechanically determine the partitions between dimension sets.) The casting rules between tuples and arrays serve as a transition with code using older C translations.

A fixed-size array with element type Type can be passed as an argument for a function with a UnsafePointer<Type> or UnsafeMutablePointer<Type> parameter, just like flexible arrays. The function argument is passed a pointer to the start of the array. For mutable access, the fixed-size array can only have a single (top-level) dimension.

An unsafe pointer to a fixed-size array type is compatible with unsafe pointers to the inner non-array type and with unsafe pointers to other fixed-size array types that share that inner non-array type.

Source compatibility

Since this feature is additive, older code is mostly unaffected. The exception is imports from C-language files that use arrays. Now that such C-arrays will now be translated to fixed-size arrays, there may be incompatibilities with older translated code that used manually homogenous tuples as the C-array equivalent. Either the older translation needs to be re-translated to use fixed-size arrays, or casts need to be used to cushion two pieces of code that use the different interfaces.

It may be advantageous to offer migration of homogenous tuples longer than two elements to fixed-size arrays.

Effect on ABI stability

This feature is additive to the ABI. But, similar to source compatibility, any translated C-language code that has its C-array translation updated from homogenous tuples to fixed-size arrays would represent an ABI change. This includes the standard library if it has any applicable homogenous tuple references.

Effect on API resilience

If there are any instances of long homogenous tuples in the standard library, then they could be changed to fixed-size arrays, which would change the API and its corresponding ABI. Otherwise, fixed-size arrays wouldn't change the API except the new type and functions and through the additions it enables.

Alternatives considered

The main alternative is to do nothing. Homogenous tuples would still be the substitute. That includes the main problem that the count of elements has to be manually expanded out instead of using a number, and subsequent similar problems of unwieldiness during usage. And stuffing an array-mode into tuples means that either no array-oriented optimizations can be applied at the bit-code level, or said optimizations are always applied to homogenous tuples, even when undesired (either by matching by coincidence, which can happen when the field types are behind aliases, or when wanting tuple-oriented optimizations instead).

Another alternative is adding array syntax at the user level, but still using homogenous tuples as the bit-code level. Although users are helped by accessing array programming, the conflict between array- versus tuple-oriented optimizations still occurs. This costs performance if the compiler guesses wrong.

Syntax Alternatives

Why is the syntax the reverse of Rust?

Swift uses [Element-Type] for (flexible) arrays while Rust uses [Element-Type ; Element-Count] for its fixed-size arrays. When nested arrays are dereferenced in Rust, the left-to-right indices go from the outermost layer to the innermost, while declaration order is innermost to outermost. This is unlike C, which uses left-to-right for outermost-to-innermost in both declaration and dereference. To maintain that advantage in Swift, the count lexically precedes the type. (Go uses a variant of the C array declaration syntax that also puts the count before the type.)

[A previous version of this proposal used the colon as the separator between the bounds and element type. But a revised sized array literal syntax uses the semicolon, colon and comma as separators, using the semicolon to partition the bounds from the initialization terms. Now the separator after the bounds in fixed-size array type declarations is the semicolon to match.]

Why use a semicolon instead of "x," "*," "of," etc. as the separator between the bounds and element type?

(All these suggestions involved only single-dimension support.)

Alternate suggestions were using "x" or "*" to multiply the element type by the length (instead of "adding" each explicit mention of the element type) or to use "of" as a real-life-grammatically more-accurate introduction of the length and element type. However, they are all unreserved tokens, and using one of them for array syntax would change it to semi-reserved (at least) for a mechanical purpose that is already covered by the semicolon. This is worsened by that token not being used as a separator anywhere else in the grammar.

Why use a variant of the existing array syntax instead of the existing tuple syntax?

  • Principle of least surprise. Explain to new learners why adding fixed bounds to an array means not tweaking the flexible array syntax, but abandoning it for tweaking tuple syntax. Making a separate, third syntax has similar issues.
  • I took a quick look at some programming languages' syntax. Almost all of them had distinct syntaxes between structure and array declarations and structure and array usage. (Lua used similar usage syntax for arrays and records, but one didn't copy the other; they both used dictionary syntax.) So using a variant of tuple syntax for arrays would go against a half-century of prior art.
  • It's been argued that parentheses are over-overloaded already, possibly causing cross-parsing problems, so let's not add another purpose.

Some of your older proposals allowed overriding of the storage order. What happened?

Languages that have only single-dimension arrays, but allow nested arrays, use row-major order for element storage. This means that one-unit changes in the first index have the widest spans in memory compared to a similar change in a later index (the last index deals with consecutive elements). C and similarly restricted languages have to follow this model, plus any language with multi-dimensional arrays but also with C compatibility.

Fortran, which always(?) had multi-dimensional arrays, reverses the priority of its indices, called column-major order. The first index deals with consecutive elements while the last one has the widest span in memory for one-unit changes. A mixed C and Fortran program has to take account which way an array stores its elements. When dealing with an array of the non-native type, either the elements can be shuffled around to match the expected native order, or the list of indexes can be reversed before dereferencing. The latter is usually easier.

Row- and column-major are the only two options for two-dimensional arrays. For N dimensions, there are factorial(N) ways of prioritizing each dimension with the array's storage. For the overrides, I realized that practically no one shuffles their data around for indexing, and the ones that do only do so for Fortran compatibility. And that last ability can be added to the library once value-based generic parameters and variadic generic parameters are added to the language, using a generic structure that reverses the bounds during allocation.

Some of your older proposals allowed (sub-ranges of) arbitrary types for array indexes. What happened?

Like the previous question, once value-based generic parameters are added to the language, and the types supported as parameters extends to any value type that doesn't use remote storage, then arbitrary indexes can be represented with a generic structure that takes countable ranges.

Some of your older proposals had both compound and named array types. What happened?

I realized that, like tuples, the natural interface for an array is sort-of limited. Having a nominal version of arrays isn't much of a value-add. The situation of nominal arrays was slightly better when they have exclusive access to storage order and arbitrary index support, which were later transferred to compound-type arrays, then dropped altogether. The "wrapper around a basic array" idea was moved to a more generalized concept, as part of a separate "strong typedef" proposal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment