DougGregor/preventing-data-races.md

## preventing-data-races.md

      
    Raw
  

              preventing-data-races.md
            
          
    Preventing Data Races in the Swift Concurrency Model

One of the goals of the concurrency effort is to prevent data races. This document describes the approach taken to preventing data races overall, by categorizing the sources of data races and describing how they are addressed with other proposals in the Swift Concurrency effort.
Data races

A data race occurs when two threads access the same memory concurrently and at least one of the accesses can change the value. Within the safe subset of Swift (e.g., ignoring the use of UnsafeMutablePointer and related types), the memory in question is always a stored property. There are several different categories of stored properties that need to be considered for data races:


Global and static stored properties:
var globalCounter: Int = 0

struct MyStruct {
  static var instanceCounter: Int = 0
}
Global and static stored properties can be accessed from nearly anywhere (subject to normal access control rules), and therefore data races can occur with effectively any code that accesses them.


Stored properties in a class instance:
class BankAccount {
  var balance: Double = 0.0
}
The stored properties of a class instance can only be accessed through that instance. Because classes are reference types, however, the instance can be referenced from any stored property in the program that has a compatible type. To reason about data races on the stored properties of a class instance, we need to reason about where the references to that instance can be stored.


Local stored properties:
func f() {
  var array = [1, 2, 3]
  doSomething { i in
    array.append(i)
  }
}
Local stored properties can be referenced in the scope of the function in which they are introduced and scopes nested within it. This includes closures and local functions that capture the property. To reason about data races on local stored properties, one must consider whether the body of the function in which it is declared, or the local functions or closures it is captured in, can execute concurrently.


Not mentioned above are the stored properties in structs and the associated values of enums. In both cases, the memory associated with the stored properties and associated values is stored "inline" with the struct or enum; it cannot be addressed except through the struct or enum as a whole, so it suffices to reason about the stored property that encloses the instance of the struct or enum.
let stored properties

A stored property may be introduced with either var or let. A var may be mutated arbitrarily, so accessing var properties concurrently is prone to data races. The memory associated with a let cannot be mutated once it has been initialized. This immutability means that the memory associated with the let can be safely accessed concurrently. However, this does not mean that all accesses through the let are free of data races: a let may end up referring to a class instance, which could be independently modified:
let account: BankAccount = lookupAccountReference(...)
// 'account.balance' can be mutated even though the 'account' reference cannot
The class instance may be stored within a struct or enum, so the data race could come from accesses to a reference-type member of a struct instance . For example:
struct Transaction {
  var id: Int
  var amount: Double
  var fromAccount: BankAccount
  var toAccount: BankAccount
}

let transaction: Transaction = lookupTransaction(...)
// 'transaction.fromAccount.balance' can be mutated even though 'transaction'
// and 'transaction.balance' cannot be
The immutability of the memory directly associated with a let is important for allowing safe concurrent accesses to certain stored properties. However, as the examples above illustrate, it is not sufficient to make such accesses safe.
Shareable types

A shareable type is one that for which separate copies of the same value can be used concurrently. There are several general classes of such types:

Value-semantic types provide value semantics, which means that a copy of a value acts completely independently from the original value: a modification to either the original or the copy will not effect the other. Structs and enums that store only other value-semantic types are value-semantic types. Class types are not value-semantic types, but one can wrap a class type in a struct or enum type using Copy-on-Write techniques to make that struct or enum a value-semantic type. This is how the Standard Library implements Array, String, Dictionary, and Set, for example.
Immutable types are such that once an instance of such type a is created, it cannot be modified at all. A struct, class, or enum type containing only let stored properties of immutable types are immutable types.
Synchronized types protect their state by introducing some form of synchronization, which could be a lock or a serial queue. Swift currently has no way to guarantee the correctness of such type, although the Swift Concurrency effort introduces synchronized types in the form of actors.

A let stored property of shareable type is safe to access concurrently, regardless of whether it is a local property, global property, or class instance property.
Modelling shareable types

The protocol-based actor isolation pitch proposes the use of a protocol (ActorSendable) to describe types that are safe to transfer across actor boundaries. It is a promising model; we're using the term "shareable" here because the problem isn't limited to crossing actor boundaries. For example, capturing a non-shareable value in a closure that executes concurrently will have the same problem without crossing an actor boundary.
var stored properties

A var stored property of shareable type will be free from data races so long as access to the stored property itself is guaranteed to not occur on two threads simultaneously. This can be achieved by ensuring one of two things at compile time:

That every access to the stored property goes through a mechanism that ensures serial execution, or
That the stored property is guaranteed to only ever be visible to a single thread of execution.

This section describes how the Swift Concurrency model uses the above approaches to protect var stored properties.
Actor instance properties

Actors introduce a new kind of class that protects its stored instance properties. An actor class is a synchronized type, and it maintains that synchronization by serializing access to its stored instance properties: any use of an actor instance member must either be on self or must be performed asynchronously. The non-self accesses will be scheduled using a serial executor, which ensures that only one thread will execute code on a given actor instance at any one time. Therefore, the model prevents data races on actor instance properties.
Global and static properties

To eliminate data races on global and static stored properties requires a mechanism that ensures serial execution of accesses to those properties. Global actors provide that mechanism. Any stored property can be specified to be part of the isolated state of a particular global actor by naming the global actor in an attribute. For example, a global counter might be annotated as being part of the global actor (call it MainActor) representing the main thread by adding the attribute @MainActor:
@MainActor var globalCounter: Int = 0
A declaration that is annotated with a global actor is treated as if it were an instance member of a singleton instance of that global actor. Therefore, any access a declaration that is annotated with a global actor must either come directly from another declaration with the same global-actor annotation or must be performed asynchronously. For example:
@MainActor func bumpGlobalCounter() {
  globalCounter = globalCounter + 1  // okay: bumpGlobalCounter is in the same global actor
}

func queryGlobalCounter() -> Int {
  return globalCounter  // error: globalCounter is isolated to the global actor `MainActor`
}
A global or static stored property annotated with a global actor is protected by the global actor, so it will always be accessed from only a single thread at a time.
Local stored properties

Local stored properties differ from the other kinds of stored properties discussed thus far because they can only be referenced from within a very narrow slice of the source code: the function (including closures) in which they are introduced, and any functions that are nested inside that function body. This allows us to determine whether they will be accessed concurrently by inspecting all of the code that references the local stored property.
Consider our earlier example:
func f() {
  var array = [1, 2, 3]
  doSomething { i in
    array.append(i)
  }
}
The closure passed to doSomething captures the local stored property array. Whether this is a data race or not depends on what doSomething, in fact, does with its closure parameter. If it synchronously calls the closure, there is no data race. If passes the closure over to some concurrently-executing thread to execute, there will be a data race. Therefore, we need to determine whether the closure may execute concurrently.
In general, we cannot look at the implementation of doSomething to determine whether it will make the closure execute concurrently with its caller, so we must use its declaration to determine the behavior. We propose a specific heuristic to determine whether a given closure will execute concurrently, which is based on an observation:

To call a closure on another thread, one must escape the value of the closure to another thread.

Swift already provides semantics for non-escaping closures, ensuring that they can only be passed "down" the stack. Any attempt to save a non-escaping closure elsewhere (e.g., into a local or global variable, or pass it to a function requiring an escaping closure) is already an error. Therefore, we consider a closure to execute concurrently if it is an escaping closure.

Note: The presence of withoutActuallyEscaping in the language undercuts this argument. In fact, the DispatchQueue.concurrentPerform operation does exactly this. We can introduce additional type system features, such as "concurrent" function types, to address such problems.

A local function is always assumed to execute concurrently, because they are always assumed to escape.
That a closure or local function is executes concurrently is not a problem by itself. We need to establish what other code it is executing concurrently with. We say that a given closure or local function g executes concurrently with  another function or closure (call it f) if that closure or local function or any function enclosing it, up to but not including f, executes concurrently. For example:
func f() {
  var array = [1, 2, 3]
  doSomethingOther {           // A
    var otherArray = [1, 2, 3]
    doSomethingMiddle {        // B
      doSomething { i in       // C
        array.append(i)
        otherArray.append(i)
      }
    }  
  }
}
The innermost closure (marked C) executes concurrently with the function f if any of the closures marked C, B, or A executes concurrently. Similarly, that same closure (marked C) executes concurrently with the closure marked A if either C or B executes concurrently.
If a closure or local function executes concurrently with the function that declares a given local var stored property that it captures, the program is ill-formed. In our example above, the access to array is ill-formed if closure C executes concurrently with f, while the access to otherArray is ill-formed if the closure C executes concurrently with the closure A.
Swift's bias toward non-escaping closures means that a lot of existing, correct code will already satisfy the constraints above.
Non-shareable types

Non-shareable types are prolific sources of data races, because if any instance of the type crosses the boundary from one thread to another, it can introduce a data race. Preventing this form of data race requires limiting the transfer of values of non-shareable type.
The protocol-based actor isolation pitch requires that an actor function used from outside the actor have parameter and result types that conform to the ActorSendable protocol (which aligns with our notion of shareable types here). This approach, generalized to also account for global-actor-qualified entities, would prevent a large class of data races.
In addition, we would need to prevent the capture or return of non-shareable values in any function or closure that executes concurrently with the location the value was produced:
func f(numbers: [Int]) async -> [Int] {
  let ns = NonShareable()
  let results = numbers.concurrentMap { // assume closure executes concurrently
    ns.computeNewValue($0)
  }
  return results
}
In general, the problem here is one of escape analysis: we need to determine whether a given non-shareable value can "escape" out of the current context and into a context where they might be used concurrently. The suggestions above likely to not account for all of the ways in which a value of non-shareable type can escape.
Eliminating data races completely

To completely eliminate data races within the safe subset of Swift, we need to enforce a few rules on a Swift program:

All global and static var stored properties must be protected by a global actor
Captured local var stored properties are not accessed from any function or closure that may concurrently execute with their declaring context
Values of non-shareable type are never shared to a context that may execute concurrently with their declaring context nor cross actor boundaries

A phased approach to eliminating data races

A lot of existing Swift code would be ill-formed under the rules above, even if it has been carefully created to be free of data races, because the tools to communicate the concurrency behavior don’t exist yet. The Swift Concurrency effort introduces those tools, but we can’t refactor the workld overnight.
Instead, we suggest taking a phased approach, where code that uses the new concurrency features (primarily async and actors) will be subject to the restrictions described above. However, existing code will continue to compile and run as it always has. This will allow Swift developers to adopt concurrency over time, gaining the safety benefits for new and migrated code that make use of the concurrency features.
Phase 1 of Swift Concurrency enforces the restrictions described above on any code that:

Uses a declaration that is part of an actor (whether an actor class or a global actor), or
Is lexically in an async function or closure, actor class, or global actor.

This approach leaves existing code unaffected, but when uses of the new concurrency features are staged in, the part of the code base that uses those entities provides safety from data races. For example, existing Swift code could have numerous data races on a global variable that is not annotated with a global actor. Adding a global actor annotation forces all uses of that global variable to do so safely, which generally means opting into the concurrency model, e.g., becoming part of the global actor themselves or becoming async to go through the actor's serial execution mechanism.
Introducing an actor class is similar: it is new code that’s required to stay within the constraints of the concurrency model. All code that interacts with the actor must use concurrency features to do so safely, extending the “bubble” of safety from data races.
Phase 2 introduces the full set of restrictions everywhere: global and static variables must be annotated, all captured locals are subject to data-race checks, and so on. This phase must be associated with a major language version bump, because it will break a significant amount of existing code.
The goal of the phased rollout is to allow code bases to evolve throughout phase 1 to move into the Swift Concurrency model incrementally. This should make the jump to “phase 2” a much smaller change for developers.