- Proposal: SE-NNNN
- Authors: Becca Royal-Gordon
- Review Manager: TBD
- Status: Awaiting implementation
During the review process, add the following fields as needed:
- Implementation: apple/swift#NNNNN or apple/swift-evolution-staging#NNNNN
- Decision Notes: Rationale, Additional Commentary
- Bugs: SR-NNNN, SR-MMMM
- Previous Revision: 1
- Previous Proposal: SE-XXXX
We propose adding for borrow
and for inout
loops which iterate a Collection
or MutableCollection
by its indices, providing direct access to its elements.
This pitch builds on the previous borrow
and inout
declaration keywords pitch, which would have to be accepted first.
Swift-evolution thread: Pitch: Introduce for borrow
and for inout
to provide non-copying collection iteration
In Swift, for
loops are statements which apply a piece of code to each element of an array or other sequence of elements. A for
loop attempts to bind its pattern to each element of its collection; if the pattern matches, the loop's body is executed. Typically the pattern is just a single let
binding which any element will match, so most for
loops execute the body for every element of the collection.
A typical for
loop uses the requirements of the Sequence
protocol, and particularly its Iterator
associated type, to extract the elements from its collection. That is, code like this:
// `n` is the pattern; `numbers` is the collection
for n in numbers {
print(n)
}
Desugars into something like this:
do {
// In this proposal's examples, `$` variables are implicit and not user-accessible.
var $iterator = numbers.makeIterator()
while let n = $iterator.next() {
// Loop body goes here
print(n)
}
}
There is nothing stopping the loop's body or the code it calls from accessing or even mutating its collection directly:
for n in numbers {
numbers.append(n) // Perfectly legal
}
Sequence
does not actually require that this code do anything specific as long as it behaves safely, but in practice, most types store a copy of themselves in their iterators; this is especially common for types with copy-on-write semantics, since copying them is so cheap. This means that the for
loop usually iterates over each element in the original collection, ignoring any mutations that occur during the loop.
Some programs intentionally use and rely on this behavior; others pass around the collection incidentally (say, because the loop calls a mutating
method while iterating over a stored property of self
) but nothing happens to mutate it during the loop. Because of that, this behavior must be preserved for source compatibility. However, it interacts poorly with several upcoming Swift features, and it makes for
loops awkward for a very common use case.
For Swift's native copy-on-write collections like Array
and String
, the "copy" made by the iterator is not very expensive unless the collection is mutated—it's just a retain and release of the underlying storage object. But many common C++ types, like std::vector
or std::string
, do not use copy-on-write and their iterators will have to eagerly copy their contents. This copying behavior will make for
loops prohibitively slow with C++ types; we should provide a better alternative to use for them.
If a collection is non-copyable (or contains non-copyable elements, which usually amounts to the same thing), it will be impossible for the iterator to copy its contents; makeIterator()
is a consuming
method, so it will have to consume the collection, making it unusable after the for
loop. While some for
loops will probably want to consume their collections, clearly others will not.
Swift has never had a great way to loop over a mutable collection and modify its elements. Instead, developers have always had to loop over the collection's indices and subscript the collection to access and mutate the elements—and this is surprisingly difficult to do correctly. For instance, they might write something like this:
// square all `numbers`
for i in numbers.indices {
numbers[i] *= numbers[i]
}
Except that there's actually a subtle performance trap in that code: numbers.indices
makes a copy of numbers
, so the first time the loop body mutates numbers
, it does not have a unique reference to its storage and must copy it. Developers who have noticed this problem use indirect patterns like:
// square all `numbers`
for i in numbers.startIndex..<numbers.endIndex {
numbers[i] *= numbers[i]
}
Sometimes substituting 0
for numbers.startIndex
and/or numbers.count
for numbers.endIndex
, which only works correctly for Array
. For other collections, for
cannot iterate over a Range
of their indices, so even more complex code patterns must be employed, possibly abandoning for
entirely. It's kind of a mess.
This is a lousy solution to a problem as basic as mutating a collection, and it will only get lousier when collections with non-copyable elements arrive. It's high time we provided a better solution in the language.
We propose extending SE-NNNN's introduction of borrow
and inout
bindings by adding support for borrow
and inout
bindings to for
loops:
// sum the contents of a C++ vector that's too expensive to copy
var sum = 0
for borrow n in cpp_number_vector {
sum += n
}
// square all `numbers`
for inout n in &numbers {
n *= n
}
These features require the loop's collection to be borrowable or inout-able and to conform to Collection
or MutableCollection
, respectively. The loop accesses its collection's elements by index; it desugars to something like this:
// sum the contents of a C++ vector that's too expensive to copy
var sum = 0
do {
borrow $collection = cpp_number_vector
var $i = $collection.startIndex
while $i < $collection.endIndex {
defer { $collection.formIndex(after: &$i) }
borrow n = $collection[$i]
// Loop body goes here
sum += n
}
}
// square all `numbers`
do {
inout $collection = &numbers
var $i = $collection.startIndex
while $i < $collection.endIndex {
defer { $collection.formIndex(after: &$i) }
inout n = &$collection[$i]
// Loop body goes here
n *= n
}
}
A for
loop may now use inout
or borrow
bindings in its pattern. These may be at the top level of the pattern:
for inout n in &numbers { ... }
for borrow n in numbers { ... }
Or they may be nested inside a more complicated pattern:
for (key, inout value) in &pairs { ... }
for case (42, (borrow value)?) in pairs { .... }
A for
loop which uses await
or try await
must not use inout
or borrow
bindings in its pattern. Doing so is an error.
A for
loop accesses its collection's elements using a certain "iteration strategy". Each iteration strategy requires the loop's collection to conform to a specific protocol, and generates a specific pattern of code which uses that conformance to extract each element in turn. Which iteration strategy a for
loop uses can be determined syntactically.
Currently, for
loops support two iteration strategies:
- If the
await
ortry await
keywords appear betweenfor
and the pattern, it uses theAsyncSequence
iteration strategy. - Otherwise, it uses the
Sequence
iteration strategy.
The Sequence
iteration strategy treats the collection as an expression which is required to conform to the Sequence
protocol; it calls makeIterator()
on the collection, binds the resulting iterator to a temporary var
, and then uses the iterator's next()
method to retrieve each element in turn. The AsyncSequence
strategy follows the same basic pattern but requires a conformance to AsyncSequence
and uses its requirements and associated types instead of Sequence
.
To support this feature, we introduce two new iteration strategies: MutableCollection
and Collection
. A for
loop's iteration strategy is now chosen as follows (new steps bolded):
- If the
await
ortry await
keywords appear betweenfor
and the pattern, it uses theAsyncSequence
iteration strategy. - Otherwise, if its pattern uses any
inout
bindings, it uses theMutableCollection
iteration strategy. - Otherwise, if its pattern uses any
borrow
bindings, it uses theCollection
iteration strategy. - Otherwise, it uses the
Sequence
iteration strategy.
When the MutableCollection
strategy is used, the for
loop's collection must be valid as the right-hand side of an inout
binding, and when the Collection
strategy is used, its collection must be valid as the right-hand side of a borrow
binding. In particular, this means that a for inout
loop's collection must start with the prefix &
operator.
When a for
loop uses the MutableCollection
or Collection
strategy, its collection is accessed through an inout
or borrow
binding, respectively, for the duration of the loop. That is:
-
When the
MutableCollection
strategy is used, the loop's collection cannot be acccessed through any other binding for the duration of the loop. -
When the
Collection
strategy is used, the loop's collection cannot be accessed through any other mutable binding for the duration of the loop.
Exclusive access is necessary to ensure that the for
loop iterates over its collection safely. If the loop's collection could be mutated while it is executing, then the current element's index could become invalid and then be used in a subsequent access or index computation. At best this could cause bounds checking violations; at worst it could cause memory or type safety bugs. Enforcing exclusivity prevents these safety bugs. (Types conforming to MutableCollection
guarantee that the subscript setter does not invalidate the index passed to it, so the mutations allowed by the MutableCollection
strategy are guaranteed to be safe.)
The MutableCollection
and Collection
iteration strategies work as follows:
-
The
for
loop's collection is bound using aninout
binding (MutableCollection
) orborrow
binding (Collection
) for the duration of the loop. We will call this binding$collection
. -
$collection.startIndex
is extracted and stored in a temporaryvar
. We will call this binding$i
. -
A loop is created with the condition
$i < $collection.endIndex
. -
Inside the loop:
-
$collection[$i]
(with a prefix&
for theMutableCollection
strategy) is bound to the loop's pattern. (If its pattern is refutable, the next step will only be performed if it matches the element.) -
The
where
clause, if present, is evaluated. If there was nowhere
clause or it evaluated totrue
, then the body of the loop is executed. -
The lifetime of any bindings in the pattern end.
-
If the loop condition will be tested, the operation
$collection.formIndex(after: &$i)
is performed. (It is a compiler implementation detail whether this call is executed when you usebreak
or other control flow features that terminate the loop early.)
-
-
The lifetime of
$i
ends. -
The lifetime of
$collection
ends.
This proposal is additive and should not change the interpretation of existing code.
This proposal will not introduce any ABI compatibility issues, even for code which adopts the features it proposes. for borrow
and for inout
loops use only protocols and requirements which have existed for as long as Swift has been ABI-stable.
Resilient modules will have to take care not to use for borrow
or for inout
in inlinable code that is expected to be understood by previous Swift compilers. This proposal otherwise has no effect on resilience.
In the current design, requiring both the inout
keyword and the &
operator is technically redundant; the compiler only needs one marker to tell it to use an inout
binding. However, the same is true of inout
in if
and switch
, and the same logic for why those use both indicators applies equally to the for
loop.
If the Collection
iteration strategy is so great, why not use it for everything? It turns out that it has three important limitations:
-
It requires the
for
loop's collection to conform toCollection
. Some types only conform toSequence
; supporting onlyCollection
would be a source-breaking change. -
It requires exclusive access to the
for
loop's collection. Somefor
loops are not compatible with these exclusivity requirements; that is, they mutate the collection while the loop is executing. Although code which modifies the collection while (a copy of) it is being iterated is perhaps confusing or unwise, it is nonetheless safe to do with the currentfor
loop, and there is code in the wild that does it. Breaking compatibility with this code would be a source-breaking change. -
It is actually less efficient for certain collection types. Some
Collection
s provide custom iterators which are more efficient than using their indices. For instance,Set
andString
have custom iterators that are faster than incrementing indices would be, because the iterator can assume that its private state is valid but the collection has to validate the indices it's passed. These needless performance regressions would not be acceptable.
We considered three specific alternatives that fall into this category:
In Swift 6 mode, we could make plain for
use a borrow
binding by default and require something like for let
to use an iterator. This would protect Swift 5 code from the source breaks.
However, just because Swift 6 can break source compatibility doesn't mean it should. We don't think these source breaks are sufficiently justified to motivate a change of defaults, or at least we don't think it's obvious enough that they are to be worth including in this proposal. A follow-up proposal could always add for let
and change the default.
We could have the compiler automatically select the Collection
pattern for for
loops which are not affected by the issues above. For example, if the compiler determines that a for
loop's collection:
-
Conforms to
Collection
as well asSequence
, -
Cannot be or is not accessed in a way that could violate exclusivity (e.g. its collection is in a local variable and the loop body doesn't call any mutating members), and
-
Uses a standard library
Iterator
that's known to be slow (such asIndexingIterator
, which is known to have at least as much overhead as theCollection
strategy, orCxxIterator
, which is known to be frequently used with types that are slow to copy),
Then it could choose to use the Collection
pattern even though the Sequence
pattern would work just fine.
This could provide free performance wins in some code, especially code which uses C++ interop. However, it seems like an unnecessary complication which could be added later, perhaps without even needing an Evolution proposal (since it would not cause an observbale change in behavior).
In addition to running afoul of the second and third points above, this would also be a strange thing to do in a language that encourages the use of immutable variables whenever possible. Just because you can mutate a collection doesn't mean you want to; methods on objects, or mutating
methods on structs, are particularly likely to loop over mutable collections without modifying them.
Some types act as references to mutable collections; that is, even when the instance is immutable, you can still mutate the elements it provides access to. One example is the standard library's UnsafeMutableBufferPointer
type: mutating its elements never changes the baseAddress
or count
of the buffer pointer itself, so its subscript has a nonmutating
setter and its elements can be mutated even if it's used through a let
or borrow
binding. However, the MutableCollection
protocol requires a mutating subscript setter, so Swift does not currently have a way to model this situation generically, and for inout
will unnecessarily require the instance to be a var
or inout
.
We could address this by providing a ReferenceMutableCollection
protocol which refines MutableCollection
by making the subscript's setter nonmutating
. With that protocol available, for inout
could be modified so that when the collection conformed to ReferenceMutableCollection
, it would only require the for
loop's collection to be usable with a borrow
binding. We might have users omit the collection's &
in this situation, or we might decide the difference is too subtle to be surfaced in that way.
This problem may become more urgent in the future if we provide a mutable non-copyable BufferView
. Such a type would probably contain an inout
binding but would be read-only itself, and supporting it in for inout
may become important.
This proposal requires any type which wants to be usable with for borrow
or for inout
to support all of the features of the Collection
protocol. This forces them to adopt certain properties which might not be desirable, such as multi-pass iteration and finite lengths.
We could instead design new protocols which don't require these properties and extend for borrow
and for inout
to use them when available. However, there is an important complication to consider: library evolution does not allow us to modify an existing protocol to add new refinements, so it would be impossible to make all MutableCollection
s support MutableSequence
or whatever it's called. This may mean we need to type-check the loop's collection and determine what protocols it conforms to before we can chose an iteration strategy for it.
(Incomplete, but should definitely include Karoy and Zoe)