beccadax/NNNN-for-borrow.md Secret

## NNNN-for-borrow.md

      
    Raw
  

              NNNN-for-borrow.md
            
          
    Introduce for borrow and for inout to provide non-copying collection iteration


Proposal: SE-NNNN
Authors: Becca Royal-Gordon
Review Manager: TBD
Status: Awaiting implementation

During the review process, add the following fields as needed:

Implementation: apple/swift#NNNNN or apple/swift-evolution-staging#NNNNN
Decision Notes: Rationale, Additional Commentary
Bugs: SR-NNNN, SR-MMMM
Previous Revision: 1
Previous Proposal: SE-XXXX

Introduction

We propose adding for borrow and for inout loops which iterate a Collection or MutableCollection by its indices, providing direct access to its elements.
This pitch builds on the previous borrow and inout declaration keywords pitch, which would have to be accepted first.
Swift-evolution thread: Pitch: Introduce for borrow and for inout to provide non-copying collection iteration
Motivation

In Swift, for loops are statements which apply a piece of code to each element of an array or other sequence of elements. A for loop attempts to bind its pattern to each element of its collection; if the pattern matches, the loop's body is executed. Typically the pattern is just a single let binding which any element will match, so most for loops execute the body for every element of the collection.
A typical for loop uses the requirements of the Sequence protocol, and particularly its Iterator associated type, to extract the elements from its collection. That is, code like this:
// `n` is the pattern; `numbers` is the collection
for n in numbers {
    print(n)
}
Desugars into something like this:
do {
    // In this proposal's examples, `$` variables are implicit and not user-accessible.
    var $iterator = numbers.makeIterator()
    
    while let n = $iterator.next() {
    
        // Loop body goes here
        print(n)
    }
}
You can access the collection during the loop

There is nothing stopping the loop's body or the code it calls from accessing or even mutating its collection directly:
for n in numbers {
    numbers.append(n)       // Perfectly legal
}
Sequence does not actually require that this code do anything specific as long as it behaves safely, but in practice, most types store a copy of themselves in their iterators; this is especially common for types with copy-on-write semantics, since copying them is so cheap. This means that the for loop usually iterates over each element in the original collection, ignoring any mutations that occur during the loop.
Some programs intentionally use and rely on this behavior; others pass around the collection incidentally (say, because the loop calls a mutating method while iterating over a stored property of self) but nothing happens to mutate it during the loop. Because of that, this behavior must be preserved for source compatibility. However, it interacts poorly with several upcoming Swift features, and it makes for loops awkward for a very common use case.
C++ collections may have expensive copy operations

For Swift's native copy-on-write collections like Array and String, the "copy" made by the iterator is not very expensive unless the collection is mutated—it's just a retain and release of the underlying storage object. But many common C++ types, like std::vector or std::string, do not use copy-on-write and their iterators will have to eagerly copy their contents. This copying behavior will make for loops prohibitively slow with C++ types; we should provide a better alternative to use for them.
Making an iterator for a non-copyable collection consumes it

If a collection is non-copyable (or contains non-copyable elements, which usually amounts to the same thing), it will be impossible for the iterator to copy its contents; makeIterator() is a consuming method, so it will have to consume the collection, making it unusable after the for loop. While some for loops will probably want to consume their collections, clearly others will not.
It's difficult to mutate elements with a for loop

Swift has never had a great way to loop over a mutable collection and modify its elements. Instead, developers have always had to loop over the collection's indices and subscript the collection to access and mutate the elements—and this is surprisingly difficult to do correctly. For instance, they might write something like this:
// square all `numbers`
for i in numbers.indices {
    numbers[i] *= numbers[i]
}
Except that there's actually a subtle performance trap in that code: numbers.indices makes a copy of numbers, so the first time the loop body mutates numbers, it does not have a unique reference to its storage and must copy it. Developers who have noticed this problem use indirect patterns like:
// square all `numbers`
for i in numbers.startIndex..<numbers.endIndex {
    numbers[i] *= numbers[i]
}
Sometimes substituting 0 for numbers.startIndex and/or numbers.count for numbers.endIndex, which only works correctly for Array. For other collections, for cannot iterate over a Range of their indices, so even more complex code patterns must be employed, possibly abandoning for entirely. It's kind of a mess.
This is a lousy solution to a problem as basic as mutating a collection, and it will only get lousier when collections with non-copyable elements arrive. It's high time we provided a better solution in the language.
Proposed solution

We propose extending SE-NNNN's introduction of borrow and inout bindings by adding support for borrow and inout bindings to for loops:
// sum the contents of a C++ vector that's too expensive to copy
var sum = 0
for borrow n in cpp_number_vector {
    sum += n
}

// square all `numbers`
for inout n in &numbers {
    n *= n
}
These features require the loop's collection to be borrowable or inout-able and to conform to Collection or MutableCollection, respectively. The loop accesses its collection's elements by index; it desugars to something like this:
// sum the contents of a C++ vector that's too expensive to copy
var sum = 0

do {
    borrow $collection = cpp_number_vector
    var $i = $collection.startIndex
    while $i < $collection.endIndex {
        defer { $collection.formIndex(after: &$i) }
        borrow n = $collection[$i]
        
        // Loop body goes here
        sum += n
    }
}

// square all `numbers`
do {
    inout $collection = &numbers
    var $i = $collection.startIndex
    while $i < $collection.endIndex {
        defer { $collection.formIndex(after: &$i) }
        inout n = &$collection[$i]
        
        // Loop body goes here
        n *= n
    }
}
Detailed design

A for loop may now use inout or borrow bindings in its pattern. These may be at the top level of the pattern:
for inout n in &numbers { ... }
for borrow n in numbers { ... }
Or they may be nested inside a more complicated pattern:
for (key, inout value) in &pairs { ... }
for case (42, (borrow value)?) in pairs { .... }
A for loop which uses await or try await must not use inout or borrow bindings in its pattern. Doing so is an error.
Background

A for loop accesses its collection's elements using a certain "iteration strategy". Each iteration strategy requires the loop's collection to conform to a specific protocol, and generates a specific pattern of code which uses that conformance to extract each element in turn. Which iteration strategy a for loop uses can be determined syntactically.
Currently, for loops support two iteration strategies:

If the await or try await keywords appear between for and the pattern, it uses the AsyncSequence iteration strategy.
Otherwise, it uses the Sequence iteration strategy.

The Sequence iteration strategy treats the collection as an expression which is required to conform to the Sequence protocol; it calls makeIterator() on the collection, binds the resulting iterator to a temporary var, and then uses the iterator's next() method to retrieve each element in turn. The AsyncSequence strategy follows the same basic pattern but requires a conformance to AsyncSequence and uses its requirements and associated types instead of Sequence.
New iteration strategies

To support this feature, we introduce two new iteration strategies: MutableCollection and Collection. A for loop's iteration strategy is now chosen as follows (new steps bolded):

If the await or try await keywords appear between for and the pattern, it uses the AsyncSequence iteration strategy.
Otherwise, if its pattern uses any inout bindings, it uses the MutableCollection iteration strategy.
Otherwise, if its pattern uses any borrow bindings, it uses the Collection iteration strategy.
Otherwise, it uses the Sequence iteration strategy.

When the MutableCollection strategy is used, the for loop's collection must be valid as the right-hand side of an inout binding, and when the Collection strategy is used, its collection must be valid as the right-hand side of a borrow binding. In particular, this means that a for inout loop's collection must start with the prefix & operator.
Exclusivity and scope

When a for loop uses the MutableCollection or Collection strategy, its collection is accessed through an inout or borrow binding, respectively, for the duration of the loop. That is:


When the MutableCollection strategy is used, the loop's collection cannot be acccessed through any other binding for the duration of the loop.


When the Collection strategy is used, the loop's collection cannot be accessed through any other mutable binding for the duration of the loop.


Exclusive access is necessary to ensure that the for loop iterates over its collection safely. If the loop's collection could be mutated while it is executing, then the current element's index could become invalid and then be used in a subsequent access or index computation. At best this could cause bounds checking violations; at worst it could cause memory or type safety bugs. Enforcing exclusivity prevents these safety bugs. (Types conforming to MutableCollection guarantee that the subscript setter does not invalidate the index passed to it, so the mutations allowed by the MutableCollection strategy are guaranteed to be safe.)
Code generation

The MutableCollection and Collection iteration strategies work as follows:


The for loop's collection is bound using an inout binding (MutableCollection) or borrow binding (Collection) for the duration of the loop. We will call this binding $collection.


$collection.startIndex is extracted and stored in a temporary var. We will call this binding $i.


A loop is created with the condition $i < $collection.endIndex.


Inside the loop:


$collection[$i] (with a prefix & for the MutableCollection strategy) is bound to the loop's pattern. (If its pattern is refutable, the next step will only be performed if it matches the element.)


The where clause, if present, is evaluated. If there was no where clause or it evaluated to true, then the body of the loop is executed.


The lifetime of any bindings in the pattern end.


If the loop condition will be tested, the operation $collection.formIndex(after: &$i) is performed. (It is a compiler implementation detail whether this call is executed when you use break or other control flow features that terminate the loop early.)


The lifetime of $i ends.


The lifetime of $collection ends.


Source compatibility

This proposal is additive and should not change the interpretation of existing code.
Effect on ABI stability

This proposal will not introduce any ABI compatibility issues, even for code which adopts the features it proposes. for borrow and for inout loops use only protocols and requirements which have existed for as long as Swift has been ABI-stable.
Effect on API resilience

Resilient modules will have to take care not to use for borrow or for inout in inlinable code that is expected to be understood by previous Swift compilers. This proposal otherwise has no effect on resilience.
Alternatives considered

Don't use inout, or don't use &

In the current design, requiring both the inout keyword and the & operator is technically redundant; the compiler only needs one marker to tell it to use an inout binding. However, the same is true of inout in if and switch, and the same logic for why those use both indicators applies equally to the for loop.
Use the new iteration strategies more aggressively

If the Collection iteration strategy is so great, why not use it for everything? It turns out that it has three important limitations:


It requires the for loop's collection to conform to Collection. Some types only conform to Sequence; supporting only Collection would be a source-breaking change.


It requires exclusive access to the for loop's collection. Some for loops are not compatible with these exclusivity requirements; that is, they mutate the collection while the loop is executing. Although code which modifies the collection while (a copy of) it is being iterated is perhaps confusing or unwise, it is nonetheless safe to do with the current for loop, and there is code in the wild that does it. Breaking compatibility with this code would be a source-breaking change.


It is actually less efficient for certain collection types. Some Collections provide custom iterators which are more efficient than using their indices. For instance, Set and String have custom iterators that are faster than incrementing indices would be, because the iterator can assume that its private state is valid but the collection has to validate the indices it's passed. These needless performance regressions would not be acceptable.


We considered three specific alternatives that fall into this category:
Make for borrow the default in Swift 6

In Swift 6 mode, we could make plain for use a borrow binding by default and require something like for let to use an iterator. This would protect Swift 5 code from the source breaks.
However, just because Swift 6 can break source compatibility doesn't mean it should. We don't think these source breaks are sufficiently justified to motivate a change of defaults, or at least we don't think it's obvious enough that they are to be worth including in this proposal. A follow-up proposal could always add for let and change the default.
Automatically convert plain for to for borrow when possible

We could have the compiler automatically select the Collection pattern for for loops which are not affected by the issues above. For example, if the compiler determines that a for loop's collection:


Conforms to Collection as well as Sequence,


Cannot be or is not accessed in a way that could violate exclusivity (e.g. its collection is in a local variable and the loop body doesn't call any mutating members), and


Uses a standard library Iterator that's known to be slow (such as IndexingIterator, which is known to have at least as much overhead as the Collection strategy, or CxxIterator, which is known to be frequently used with types that are slow to copy),


Then it could choose to use the Collection pattern even though the Sequence pattern would work just fine.
This could provide free performance wins in some code, especially code which uses C++ interop. However, it seems like an unnecessary complication which could be added later, perhaps without even needing an Evolution proposal (since it would not cause an observbale change in behavior).
Automatically convert plain for to for inout when the collection is mutable

In addition to running afoul of the second and third points above, this would also be a strange thing to do in a language that encourages the use of immutable variables whenever possible. Just because you can mutate a collection doesn't mean you want to; methods on objects, or mutating methods on structs, are particularly likely to loop over mutable collections without modifying them.
Future directions

ReferenceMutableCollection protocol

Some types act as references to mutable collections; that is, even when the instance is immutable, you can still mutate the elements it provides access to. One example is the standard library's UnsafeMutableBufferPointer type: mutating its elements never changes the baseAddress or count of the buffer pointer itself, so its subscript has a nonmutating setter and its elements can be mutated even if it's used through a let or borrow binding. However, the MutableCollection protocol requires a mutating subscript setter, so Swift does not currently have a way to model this situation generically, and for inout will unnecessarily require the instance to be a var or inout.
We could address this by providing a ReferenceMutableCollection protocol which refines MutableCollection by making the subscript's setter nonmutating. With that protocol available, for inout could be modified so that when the collection conformed to ReferenceMutableCollection, it would only require the for loop's collection to be usable with a borrow binding. We might have users omit the collection's & in this situation, or we might decide the difference is too subtle to be surfaced in that way.
This problem may become more urgent in the future if we provide a mutable non-copyable BufferView. Such a type would probably contain an inout binding but would be read-only itself, and supporting it in for inout may become important.
New protocols for borrowing or mutating iterators

This proposal requires any type which wants to be usable with for borrow or for inout to support all of the features of the Collection protocol. This forces them to adopt certain properties which might not be desirable, such as multi-pass iteration and finite lengths.
We could instead design new protocols which don't require these properties and extend for borrow and for inout to use them when available. However, there is an important complication to consider: library evolution does not allow us to modify an existing protocol to add new refinements, so it would be impossible to make all MutableCollections support MutableSequence or whatever it's called. This may mean we need to type-check the loop's collection and determine what protocols it conforms to before we can chose an iteration strategy for it.
Acknowledgments

(Incomplete, but should definitely include Karoy and Zoe)