jckarter/property-behaviors.md Secret

## property-behaviors.md

      
    Raw
  

              property-behaviors.md
            
          
    Property Behaviors


Proposal: SE-NNNN
Author(s): Joe Groff
Status: Review
Review manager: TBD

Introduction

There are property implementation patterns that come up repeatedly.
Rather than hardcode a fixed set of patterns into the compiler,
we should provide a general "property behavior" mechanism to allow
these patterns to be defined as libraries.
Motivation

We've tried to accommodate several important patterns for property with targeted
language support, but this support has been narrow in scope and utility.
For instance, Swift 1 and 2 provide lazy properties as a primitive language
feature, since lazy initialization is common and is often necessary to avoid
having properties be exposed as Optional. Without this language support, it
takes a lot of boilerplate to get the same effect:
class Foo {
  // lazy var foo = 1738
  private var _foo: Int?
  var foo: Int {
    get {
      if let value = _foo { return value }
      let initialValue = 1738
      _foo = initialValue
      return initialValue
    }
    set {
      _foo = newValue
    }
  }
}
Building lazy into the language has several disadvantages. It makes the language
and compiler more complex and less orthogonal. It's also inflexible; there are many
variations on lazy initialization that make sense, but we wouldn't want to hardcode
language support for all of them. For instance, some applications may want the lazy
initialization to be synchronized, but lazy only provides single-threaded
initialization. The standard implementation of lazy is also problematic for value types. A lazy
getter must be mutating, which means it can't be accessed from an immutable value.
Inline storage is also suboptimal for many memoization tasks, since the cache cannot
be reused across copies of the value. A value-oriented memoized property implementation
might look very different:
class MemoizationBox<T> {
  var value: T? = nil
  init() {}
  func getOrEvaluate(fn: () -> T) -> T {
    if let value = value { return value }
    // Perform initialization in a thread-safe way.
    // Implementation of `sync` not shown here
    return sync {
      let initialValue = fn()
      value = initialValue
      return initialValue
    }
  }
}

struct Person {
  let firstName: String
  let lastName: String

  let _cachedFullName = MemoizationBox<String>()

  var fullName: String {
    return _cachedFullName.getOrEvaluate { "\(firstName) \(lastName)" }
  }
}
Lazy properties are also unable to surface any additional operations over a regular
property. It would be useful to be able to reset a lazy property's storage to be
recomputed again, for instance, but this isn't possible with lazy.
There are important property patterns outside of lazy initialization.
It often makes sense to have "delayed",
once-assignable-then-immutable properties to support multi-phase initialization:
class Foo {
  let immediatelyInitialized = "foo"
  var _initializedLater: String?

  // We want initializedLater to present like a non-optional 'let' to user code;
  // it can only be assigned once, and can't be accessed before being assigned.
  var initializedLater: String {
    get { return _initializedLater! }
    set {
      assert(_initializedLater == nil)
      _initializedLater = newValue
    }
  }
}
Implicitly-unwrapped optionals allow this in a pinch, but give up a lot of
safety compared to a non-optional 'let'. Using IUO for multi-phase initialization
gives up both immutability and nil-safety.
We also have other application-specific property features like didSet/willSet and
array addressors that add language complexity for limited functionality. Beyond
what we've baked into the language already, there's
a seemingly endless set of common property behaviors, including resetting,
synchronized access, and various kinds of proxying, all begging for language attention
to eliminate their boilerplate.
Proposed solution

I suggest we allow for property behaviors to be implemented within the language.
A var or let declaration can specify its behavior in parens after the
keyword:
var (lazy) foo = 1738
which acts as sugar for something like this:
var `foo.lazy` = lazy(var: Int.self, initializer: { 1738 })
var foo: Int {
  get {
    return `foo.lazy`[varIn: self,
                      initializer: { 1738 }]
  }
  set {
    `foo.lazy`[varIn: self,
               initializer: { 1738 }] = newValue
  }
}
Furthermore, the behavior can provide additional operations, such as clear-ing
a lazy property, by accessing it with property.behavior syntax:
foo.lazy.clear()
(The syntax for declaring and accessing the behavior is up for grabs; I'm offering
these only as a starting point.)
Property behaviors obviate the need for special language support for lazy,
observers, addressors, and other special-case property behavior, letting us
move their functionality into libraries and support new behaviors as well.
Examples

Before describing the detailed design, I'll run through some examples of potential
applications for behaviors.
Lazy

The current lazy property feature can be reimplemented as a property behavior:
public struct Lazy<Value> {
  var value: Value?

  public init() {
    value = nil
  }

  public subscript<Container>(varIn _: Container,
                              initializer initial: () -> Value) -> Value {
    mutating get {
      if let existingValue = value {
        return existingValue
      }
      let initialValue = initial()
      value = initialValue
      return initialValue
    }
    set {
      value = newValue
    }
  }
}

public func lazy<Value>(var type: Value.Type, initializer _: () -> Value)
    -> Lazy<Value> {
  return Lazy()
}
As mentioned above, lazy in Swift 2 doesn't provide a way to reset
a lazy value to reclaim memory and let it be recomputed later. A behavior can
provide additional operations on properties that use the behavior; for instance,
to clear a lazy property:
extension Lazy {
  public mutating func clear() {
    value = nil
  }
}

var (lazy) x = somethingThatEatsMemory()
use(x)
x.lazy.clear()
Memoization

Variations of lazy can be implemented that are more appropriate for certain
situations. For instance, here's a memoized behavior that stores the cached
value indirectly, making it suitable for immutable value types:
public class MemoizationBox<Value> {
  var value: Value? = nil
  init() {}
  func getOrEvaluate(fn: () -> Value) -> Value {
    if let value = value { return value }
    // Perform the initialization in a thread-safe way.
    // Implementation of 'sync' not shown here.
    return sync {
      let initialValue = fn()
      value = initialValue
      return initialValue
    }
  }
  func clear() {
    value = nil
  }

  public subscript<Container>(letIn _: Container,
                              initializer value: () -> Value) -> Value {
    return box.getOrEvaluate(value)
  }
}

public func memoized<Value>(let type: Value.Type, initializer: () -> Value)
    -> MemoizationBox<Value> {
  return MemoizationBox()
}
Which can then be used like this:
struct Location {
  let street, city, postalCode: String

  let (memoized) address = "\(street)\n\(city) \(postalCode)"
}
Delayed Initialization

A property behavior can model "delayed" initialization behavior, where the DI rules
for var and let properties are enforced dynamically rather than at compile time:
public func delayed<Value>(let type: Value.Type) -> Delayed<Value> {
  return Delayed()
}
public func delayed<Value>(var type: Value.Type) -> Delayed<Value> {
  return Delayed()
}

public struct Delayed<Value> {
  var value: Value? = nil

  /// DI rules for vars:
  /// - Must be assigned before being read
  public subscript<Container>(varIn container: Container) {
    get {
      if let value = value {
        return value
      }
      fatalError("delayed var used before being initialized")
    }
    set {
      value = newValue
    }
  }

  /// DI rules for lets:
  /// - Must be initialized once before being read
  /// - Cannot be reassigned
  public subscript<Container>(letIn container: Container) {
    get {
      if let value = value {
        return value
      }
      fatalError("delayed let used before being initialized")
    }
  }

  /// Behavior operation to initialize a delayed variable
  /// or constant.
  public mutating func initialize(value: Value) {
    if let value = value {
      fatalError("delayed property already initialized")
    }
    self.value = value
  }
}
which can be used like this:
class Foo {
  let (delayed) x: Int

  init() {
    // We don't know "x" yet, and we don't have to set it
  }

  func initializeX(x: Int) {
    self.x.delayed.initialize(x) // Will crash if 'self.x' is already initialized
  }

  func getX() -> Int {
    return x // Will crash if 'self.x' wasn't initialized
  }
}
Resettable properties

There's a common pattern in Cocoa where properties are used as optional
customization points, but can be reset to nil to fall back to a non-public default
value. In Swift, properties that follow this pattern currently must be imported
as ImplicitlyUnwrappedOptional, even though the property can only be set to
nil. If expressed as a behavior, the reset operation can be decoupled from
the type, allowing the property to be exported as non-optional:
public func resettable<Value>(var type: Value.Type,
                      initializer fallback: () -> Value) -> Resettable<Value> {
  return Resettable(value: fallback())
}
public struct Resettable<Value> {
  var value: Value?

  public subscript<Container>(varIn container: Container,
                              initializer fallback: () -> Value) -> Value {
    get {
      if let value = value { return value }
      return fallback()
    }
    set {
      value = newValue
    }
  }

  public mutating func reset() {
    value = nil
  }
}

var (resettable) foo: Int = 22
print(foo) // => 22
foo = 44
print(foo) // => 44
foo.resettable.reset()
print(foo) // => 22
Synchronized Property Access

Objective-C supports atomic properties, which take a lock on get and set to
synchronize accesses to a property. This is occasionally useful, and it can be
brought to Swift as a behavior:
// A class that owns a mutex that can be used to synchronize access to its
// properties.
//
// `NSObject` could theoretically be extended to implement this using the
// object's `@synchronized` lock.
public protocol Synchronizable: class {
  func withLock<R>(@noescape body: () -> R) -> R
}

public func synchronized<Value>(var _: Value.Type,
                                initializer initial: () -> Value)
    -> Synchronized<Value> {
  return Synchronized(value: initial())
}

public struct Synchronized<Value> {
  var value: Value

  public subscript<Container: Synchronizable>(varIn container: Container,
                                              initializer _: () -> Value)
      -> Value {
    get {
      return container.withLock {
        return value
      }
    }
    set {
      container.withLock {
        value = newValue
      }
    }
  }
}
NSCopying

Many Cocoa classes implement value-like objects that require explicit copying.
Swift currently provides an @NSCopying attribute for properties to give
them behavior like Objective-C's @property(copy), invoking the copy method
on new objects when the property is set. We can turn this into a behavior:
public func copying<Value: NSCopying>(var _: Value.Type,
                                      initializer initial: () -> Value)
    -> Copying<Value> {
  return Copying(value: initial().copy())
}

public struct Copying<Value> {
  var value: Value

  public subscript<Container>(varIn container: Container,
                              initializer _: () -> Value)
      -> Value {
    get {
      return value
    }
    set {
      value = newValue.copy()
    }
  }
}
Referencing Properties with Pointers

We provide some affordances for interfacing properties with pointers for C interop
and performance reasons, such as withUnsafePointer and implicit argument
conversions. These affordances come with a lot of caveats and limitations.
A property behavior can be defined that implements properties with manually-allocated
memory, guaranteeing that pointers to the property can be freely taken and used:
public func pointable<Value>(var _: Value.Type,
                             initializer initial: () -> Value)
    -> Pointable<Value> {
  return Pointable(value: initial())
}

public class Pointable<Value> {
  public let pointer: UnsafeMutablePointer<Value>

  init(value: Value) {
    pointer = .alloc(1)
    pointer.initialize(value)
  }

  deinit {
    pointer.destroy()
    pointer.dealloc(1)
  }

  public subscript<Container>(varIn _: Container,
                              initializer _: () -> Value)
      -> Value {
    get {
      return pointer.memory
    }
    set {
      pointer.memory = newValue
    }
  }
}

var (pointable) x = 22
var (pointable) y = 44

memcpy(x.pointable.pointer, y.pointable.pointer, sizeof(Int.self))
print(x) // => 44
(Manually allocating and deallocating a pointer in a class is obviously not ideal,
but is shown as an example. A production-quality stdlib implementation could use
compiler magic to ensure the property is stored in-line in an addressable way.)
Property Observers

A property behavior can also replicate the built-in behavior of didSet/willSet
observers:
typealias ObservingAccessor = (oldValue: Value, newValue: Value) -> ()

public func observed<Value>(var _: Value.Type,
                            initializer initial: () -> Value,
                            didSet _: ObservingAccessor = {},
                            willSet _: ObservingAccessor = {})
    -> Observed<Value> {
  return Observed(value: initial())
}

public struct Observed<Value> {
  var value: Value

  public subscript<Container>(varIn _: Container,
                              initializer _: () -> Value,
                              didSet didSet: ObservingAccessor = {},
                              willSet willSet: ObservingAccessor = {})
      -> Value {
    get { return value }
    set {
      let oldValue = value
      willSet(oldValue, newValue)
      value = newValue
      didSet(oldValue, newValue)
    }
  }
}
A common complaint with didSet/willSet is that the observers fire on
every write, not only ones that cause a real change. A behavior
that supports a didChange accessor, which only gets invoked if the property
value really changed to a value not equal to the old value, can be implemented
as a new behavior:
public func changeObserved<Value: Equatable>(var _: Value.Type,
                                             initializer initial: () -> Value,
                                             didChange _: ObservingAccessor = {})
    -> ChangeObserved<Value> {
  return ChangeObserved(value: initial())
}

public struct ChangeObserved<Value: Equatable> {
  var value: Value

  public subscript<Container>(varIn _: Container,
                              initializer _: () -> Value,
                              didChange didChange: ObservingAccessor = {}) {
    get { return value }
    set {
      if value == newValue { return }
      value = newValue
      didChange(oldValue, newValue)
    }
  }
}
This is a small sampling of the possibilities of behaviors. Let's look at how they
can be implemented:
Detailed design

A property declaration can declare a behavior after the var or let keyword
in parens:
var (runcible) foo: Int
(Possible alternatives to var (behavior) are discussed later.) Inside the parens
is a dotted declaration reference that must refer to a behavior function
that accepts the property attributes (such as its name, type, initial value (if
any), and accessor methods) as parameters. How attributes map to parameters is
discussed below.
When a property declares a behavior, the compiler expands this into a
backing property, which is initialized by invoking the behavior function
with the property's attributes as arguments. The backing property takes on
whatever type is returned by the behavior function. The declared property forwards
to the accessors of the backing property's
subscript(varIn:...) (or subscript(letIn:...)) member, with self as the
first argument (or () for a free variable declaration). The subscript may
also accept any or all of the property's attributes as arguments. Approximately, the
expansion looks like this:
var `foo.runcible` = runcible(var: Int.self)
var foo: Int {
  return `foo.runcible`[varIn: self]
}
with the fine print that the property directly receives the get,
set, materializeForSet, etc. accessors from the behavior's
subscript declaration.  By forwarding to a subscript instead of separate get and
set methods, property behaviors preserve all of the mutable property optimizations
we support now and in the future for free. The subscript also determines the mutability
of the declared property.
The behavior function is resolved by building a call with the following
keyword arguments, based on the property declaration:

The metatype of the declared property's type is passed as an argument labeled
var for a var, or labeled let for a let.
If the declared property provides an initial value, the initial value expression
is passed as a () -> T closure to an argument labeled initializer.
If the property is declared with accessors, their bodies are passed by named
parameters corresponding to their names. Accessor names can be arbitrary identifiers.

For example, a property with a behavior and initial value:
var (runcible) foo = 1738
gets its backing property initialized as follows:
var `foo.runcible` = runcible(var: Int.self, initializer: { 1738 })
A property that declares accessor methods:
var (runcible) foo: Int {
  bar { print("bar") }
  bas(x) { print("bas \(x)") }
}
passes those accessors on to its behavior function:
private func `foo.bar`() { print("bar") }
private func `foo.bas`(x: T) { print("bar") }

var `foo.runcible` = runcible(var: Int.self,
                              bar: self.`foo.bar`,
                              bas: self.`foo.bas`)
Contextual types from the selected behavior function can be used to infer types
for the accessors' parameters as well as their default names. For example, if the
behavior function is declared as:
func runcible<T>(var type: T.Type, bar: (newValue: T) -> ())
  -> RuncibleProperty<T>
then a bar accessor using this behavior can implicitly receive newValue as a
parameter:
var (runcible) x: Int {
  bar { print("\(newValue.dynamicType)") } // prints Int
}
Once the behavior function has been resolved, its return type is searched for a
matching subscript member with labeled index arguments:

The self value that contains the property is passed to a labeled
varIn argument for a var, or a letIn argument for a let.
This may be the metatype for a static property, or () for a global or local
property.
After these arguments, the subscript must take the same labeled initializer
and/or accessor closure arguments as the behavior function.

It is an error if a matching subscript can't be found on the type. By constraining
what types are allowed to be passed to the varIn or letIn parameter
of the subscript, a behavior can constrain what kinds of container it is
allowed to appear in.
By passing the initializer and accessor bodies to both the behavior function and
subscript, the backing property can avoid requiring storage for closures it
doesn't need immediately at initialization time. It would be unacceptable if
every lazy property needed to store its initialization closure in-line, for
instance. The tradeoff is that there is potentially redundant work done forming
these closures at both initialization and access time, and many of the arguments
are not needed by both. However, if the behavior function and subscript are both
inlineable, the optimizer ought to be able to eliminate dead arguments and simplify
closures. For most applications, the attribute closures ought to be able to be
@noescape as well.
Some behaviors may have special operations associated with them; for instance,
a lazy property may provide a way to clear itself to reclaim memory and allow the
value to be recomputed later when needed. The underlying backing property may be
accessed by referencing it as property.behavior.
var (lazy) x = somethingThatEatsMemory()

use(x)
x.lazy.clear() // free the memory
The backing property has internal visibility by default (or private if the
declared property is private). If the backing property should have higher
visibility, the visibility can be declared next to the behavior:
public var (public lazy) x = somethingThatEatsMemory()
However, the backing property cannot have higher visibility than the declared property.
The backing property is always a stored var property.
It is the responsibility of a let property behavior's implementation to provide the
expected behavior of an immutable property over it. A well behaved let should
produce an identical value every time it is loaded, or die trying, as in the
case of an uninitialized delayed let. A let should be safe to read concurrently
from multiple threads. (In the fullness of time, an effects system might be
able to enforce this, with escape hatches for internally-impure things like
memoization of course.)
Impact on existing code

By itself, this is an additive feature that doesn't impact existing code. However,
it potentially obsoletes lazy, willSet/didSet, and @NSCopying as
hardcoded language features.  We could grandfather these in, but my preference
would be to phase them out by migrating them to library-based property behavior
implementations. (Removing them should be its own separate proposal, though.)
It's also worth exploring whether property behaviors could replace the "addressor"
mechanism used by the standard library to implement Array efficiently. It'd be
great if the language only needed to expose the core conservative access pattern
(get/set/materializeForSet) and let all variations be implemented as library
features. Note that superseding didSet/willSet and addressors completely would
require being able to apply behaviors to subscripts in addition to properties, which
seems like a reasonable generalization.
Alternatives considered/to consider

Declaration syntax

Alternatives to the proposed var (behavior) propertyName syntax include:

An attribute, such as @behavior(lazy) or behavior(lazy) var.
This is the most conservative answer, but is clunky.
Use the behavior function name directly as an attribute, so that e.g. @lazy
works. This injects functions into the attribute namespace, which is
problematic (but maybe not as much if the function itself also has to be
marked with a @behavior_function attribute too).
Use a new keyword, as in var x: T by behavior.
Something on the right side of the colon, such as var x: lazy(T).  To me
this reads like lazy(T) is a type of some kind, which it really isn't.
Something following the property name, such as var x«lazy»: T or
var x¶lazy: T (picking your
favorite ASCII characters to replace «»¶). One nice thing about this approach
is that it suggests self.x«lazy» as a declaration-follows-use way of
accessing the backing property.

Syntax for accessing the backing property

The proposal suggests x.behaviorName for accessing the underlying backing property
of var (behaviorName) x. The main disadvantage of this is that it complicates name
lookup, which must be aware of the behavior in order to resolve the name,
and is potentially ambiguous, since the behavior name could of course also be the
name of a member of the property's type. Some alternatives to consider:

Reserving a keyword and syntactic form to refer to the backing property, such as
foo.x.behavior or foo.behavior(x). The problems with this are that reserving
a keyword is undesirable, and that behavior is a vague term that requires
more context for a reader to understand what's going on. If we support multiple
behaviors on a property, it also doesn't provide a mechanism to distinguish between
behaviors.
Something following the property name, such a foo.x«lazy» or foo.x¶lazy (choosing
your favorite ASCII substitution for «»¶, again), to match the similar
proposed declaration syntax above.
"Overloading" the property name to refer to both the declared property and its
backing property, and doing member lookup in both (favoring the declared property
when there are conflicts). If foo.x is known to be
lazy, it's attractive for foo.x.clear() to Just Work without annotation.
This has the usual ambiguity problems of overloading, of course; if the behavior's
members are shadowed by the fronting type, something incovenient like
(foo.x as Lazy).clear() would be necessary to disambiguate.

Defining behavior requirements using a protocol

It's reasonable to ask why the behavior interface proposed here is ad-hoc rather than
modeled as a formal protocol. It's my feeling that a protocol would be too
constraining:

Different behaviors need the flexibility to require different sets of
property attributes. Some kinds of property support initializers; some kinds
of property have special accessors; some kinds of property support many different
configurations. Allowing overloading (and adding new functionality via extensions
and overloading) is important expressivity.
Different behaviors place different constraints on what containers are
allowed to contain properties using the behavior, meaning that subscript
needs the freedom to impose different generic constraints on its varIn/
letIn parameter for different behaviors.

It's true that there are type system features we could theoretically add to support
these features in a protocol, but increasing the complexity of the type system has
its own tradeoffs. I think it's unlikely that behaviors would be useful in generics
either.
A behavior declaration

Instead of relying entirely on an informal protocol, we could add a new declaration
to the language to declare a behavior, something like this:
behavior lazy<T> {
  func lazy(...) -> Lazy { ... }
  struct Lazy { var value: T; ... }
}
Doing this has some potential advantages:

It provides clear namespacing for things that are intended to be behaviors.
If the functions and types that implement the behavior can be nested under the
behavior declaration somehow, then they don't need to pollute the global
function/type namespace.
The behavior declaration can explicitly provide metadata about the behavior,
such as what container and value types it supports, what kinds of accessors
properties can provide to it, that are all discovered by overload resolution
in this proposal. It'd also be a natural place to place extensions like how
a behavior behaves with overriding, what behaviors it can or can't compose
with, etc.

Naming convention for behaviors

This proposal doesn't discuss the naming convention that behaviors should follow.
Should they be random adjectives like lazy? Should we try to follow an -ing
or -able suffix convention? Does it matter, if behaviors have their own syntax
namespace?
TODO

When do properties with behaviors get included in the memberwise initializer of
structs or classes, if ever? Can properties with behaviors be initialized from
init rather than with inline initializers?
Can behaviors be composed, e.g. (lazy, observed), or (lazy, atomic)? How?
Composition necessarily has to have an ordering, and some orderings will be wrong;
e.g. one of (lazy, atomic) or (atomic, lazy) will be broken.
To be able to fully supplant didSet/willSet (and addressors), we'd need to be able
to give behaviors to subscripts as well. The special override behavior of
didSet/willSet in subclasses needs to be accounted for as well.
It's worth considering what the "primitive" interface for properties is; after all,
theoretically even computed properties could be considered a behavior if you unstack
enough turtles. One key thing to support that I don't think our current special-case
accessors handle is conditional physical access. For instance, a behavior might
want to pass through to its physical property, unless some form of transactionality
is enabled. As a strawman, if there were an inout accessor, which
received the continuation of the property access as an (inout T) -> Void parameter,
that might be expressed like this:
var _x = 0
var x: Int {
  inout(continuation) {
    // If we're not logging, short-circuit to a physical access of `x`.
    if !logging {
      continuation(&_x)
      return
    }
    // Otherwise, save the oldValue and log before and after
    let oldValue = x
    var newValue = x
    continuation(&newValue)
    print("--- changing _x from \(oldValue) to \(newValue)")
    _x = newValue
    print("--- changed! _x from \(oldValue) to \(newValue)")
  }
}

An implementation of inout as proposed like this could be unrolled into a
materializeForSet implementation using a SIL state machine
transform, similar to what one would do to implement yield or await, which
would check that continuation always gets called exactly once on all paths and
capture the control flow after the continuation call in the materializeForSet
continuation.