reuschj/memoization_swift_properties.md

## memoization_swift_properties.md

      
    Raw
  

              memoization_swift_properties.md
            
          
    Memoization of Swift properties


Authors: Justin Reusch

Introduction

Swift has a focus on being fast and efficient. To this goal, the memoization of computed properties would help to speed up some Swift programs relying on computed values (especially expensive ones) by only re-calculating the results when one of the properties they depend on has changed.
Anyone familiar with React and React hooks will know one of the most useful hooks is useMemo, which provides this exact functionality for rendering UIs. While memoization would be broadly applicable to any Swift program, it may be especially helpful for use in SwiftUI, where preventing unnecessary re-renders can help optimize app performance.
Motivation

Swift's use of computed properties is extremely convenient for dynamic properties that are derived from other properties and/or subject to change. That said, they can have a performance drawback if the programmer is not careful.
As an example, let's look at a simple struct for a Box. The height and width are stored Double values, but the area is calculated.
struct Box {
    var height: Double
    var width: Double
    var area: Double { width * height }
}
While the calculation of area is pretty trivial, we will use it as a proxy for more expensive calculations. The risk of this calculated property is that it's quite tempting for code using the Box to grab the area property at will without understanding or remembering that each use prompts a fresh evaluation.
var box = Box(height: 2, width: 4)
if box.area > 8 { // Area calculated
    print(box.area * box.area)    
} else if box.area > 4 { // Area calculated
    print(box.area * box.area + 5)  // Area calculated twice
}
The code above will print 69.0 to the console. However, the same area calculation was repeated four times. Perhaps this is not an issue here, but could be an issue as the computation gets more expensive or the program gets more complex.
A wise programmer might write the code more efficiently like this:
var box = Box(height: 2, width: 4)
let area = box.area // Area calculated and stored to `area` constant
if area > 8 { // Area constant read
    print(area * area)
} else if area > 4 { // Area constant read
    print(area * area + 5) // Area constant read twice
}
Now, the area is only calculated once (three calculations saved). However, it's not always quite this simple and this isn't always intuitive to do. The nature of a property feels like something that's there for me to grab as needed. Especially, if we didn't write Box ourselves. It may not be obvious that area is calculated and not stored.
Lazy properties

For situations where we need to calculate a value once, but would rather not do so again, Swift has the lazy keyword. This is quite nice for some situations, but if we try to create a LazyBox, it has a fatal flaw making it pretty worthless for this situation.
struct LazyBox {
    var height: Double
    var width: Double
    lazy var area: Double = { height * width }()
}
var box = LazyBox(height: 2, width: 4)
print(box.area) // Prints 8.0
print(box.area) // Prints 8.0
box.height = 4
print(box.area) // Prints 8.0 (❌ WRONG‼️)
So, lazy is great when the value is fixed after calculation, but unsuitable for more dynamic variables.
Memoization and React

What's the happy medium between lazy properties and computed properties? Memoization is the answer (Thanks, Dan!).
For this, let's take some inspiration from React. As mentioned, one of the most useful hooks for React is useMemo, which allows you to set a value based on a calculated factory function. Here's the brilliant part: you give it an array of dependencies. Anytime you call for the value, if none of those dependencies have changed since the last calculation, you get the memoized result (no computation necessary). If, however, any value does change, it recomputes as needed. So, it's the happy medium between flexibility and efficiency.
Here's an example (Let's stick to our box example to keep it simple. Again, just imagine that area is a far more expensive computation):
const Box: FunctionalComponent<BoxProps> = ({ height, width }: BoxProps) => {
    const area = useMemo(() => {
        return height * width;
    }, [height, width]); // <- This is the list of dependencies. If once of these is changed, area recomputes.
    return (
        <div style={{ height, width }}>
            <p>`I am a box with area: ${area}`</p>
        </div>
    );
};
Memoization and Swift

Now, let's look at how we can make something like this work in Swift.
One tempting thought might be to use property wrappers. Long story short, they aren't particularly helpful here. The reason is that the property wrapper struct needs to be initialized before the struct or class that uses it has fully initialized and is ready to pass self. So, property wrappers really can't capture and monitor other properties of it's parent. It's so tempting to think it could work, but it doesn't.
As I can see, there are currently two ways to implement this in Swift. Both are workable (for different scenarios), but with quite a bit of tedious boilerplate.
Non-lazy Memoizing

First, let's look at a box with memoized area. It stores a private area property that's first calculated on initialization (not lazy). The didSet observers for height and width will actively recompute the area after they are set.
struct Box {
    var height: Double {
        didSet { // On setting height, the updated area is actively recalculated
            setArea()
        }
    }
    var width: Double {
        didSet { // On setting width, the updated area is actively recalculated
            setArea()
        }
    }
    
    private(set) var area: Double = 0
    private mutating func setArea() {
        area = width * height
    }

    init(height: Double, width: Double) {
        self.height = height
        self.width = width
        setArea()
    }
}
Lazy Memoizing

Next, let's look at a box with memoized and lazy area. It stores a private area property that's only calculated on first use. The didSet observers for height and width don't actively recompute area. Rather, they invalidate the stored area by setting it to nil. When the getter for area finds a value, it uses it. When it finds nil, it recomputes the area and stores it in the private property for later use.
struct Box {
    var height: Double {
        didSet { // On setting height, the memoized area is invalidated
            _area = nil
        }
    }
    var width: Double {
        didSet { // On setting width, the memoized area is invalidated
            _area = nil
        }
    }

    // Private var to store memoized value
    private var _area: Double? = nil

    var area: Double {
        mutating get { // Area is calculated lazily, only as needed (though this implementation means it can't be used on `let` constants)
            guard let area = _area else {
                let newArea = width * height
                _area = newArea
                return newArea
            }
            return area
        }
    }

    init(height: Double, width: Double) {
        self.height = height
        self.width = width
    }
}
This option is ideal for types that will always be var variables, but unsuitable for use with types that could be declared as let constants. For example:
let box = Box(height: 2, width: 4)
print(box.area) // Error: Cannot use mutating getter on immutable value: 'box' is a 'let' constant
So, both implementations are useful for different situations. The non-lazy version would always be used where the type could be declared as a let constant. The lazy version would require that the type was always declared as a var variable (same limitation as types with lazy properties).
Proposed solution

The proposal is to create a simplified syntax to tell the compiler to synthesize the boilerplate outlined above. Let's take a look at what the structs above might look like with memo  and lazy memo keywords (this syntax is just my initial proposal, but there are other options I will discuss below):
struct Box {
    var height: Double
    var width: Double
    memo var area: Double { |width, height| in width * height }
}
... and the lazy version:
struct Box {
    var height: Double
    var width: Double
    lazy memo var area: Double = { |width, height| in width * height }()
}
I'll break down this proposed syntax. There are a lot of options which can be thoroughly bikeshedded with due time, but the basics are fairly strait-forward.
Keywords

We would need some keyword or attribute. For properties that are only memoized (non-lazy), I'm proposing a memo keyword. It's concise but descriptive enough that its function can be easily understood. It could be expanded to memoized, but I don't think the extra characters buy much additional clarity.
For memoized and lazy properties, I'm proposing a lazy memo keyword. The compiler would look for the combination of these two keywords and treat separately from either on its own. If that's problematic, it could be a single compound keyword like lazymemo.
Another option is to make them attributes: @memo and @lazyMemo, though this is less aligned with the existing lazy keyword.
Both of these would, of course, require declaration of the property as var. So, we could consider removing var entirely, since it could be implied. In that case, memo or lazy memo would simply go in place of the usual var/let.
Value declaration

I am proposing that value declaration look like a computed property:
memo var area: Double { |width, height| in width * height }
Another option would be to align with existing lazy syntax (perhaps doing so only for the lazy version):
lazy memo var area: Double = { |width, height| in width * height }()
Dependency declaration

This one is the trickiest. We need some way to explicitly tell the compiler what properties to observe before recomputing. This should be explicit, since it gives the user precise control over what triggers that calculation. We would not want to recalculate based on all captured values as some could be irrelevant to the result. For example:
struct Box {
    var label: String
    var height: Double
    var width: Double
    lazy memo var area: Double = {
        print("Updating area for \(label)")
        return width * height
    }()
}
Here, if we recalculated based on a change to any captured value, we'd also recalculate based on a change to the label property. While we captured it to do a print to console, it's meaningless to the result. So, it's necessary to point out the compiler: only watch width and height.
My proposal is to add this dependency list as a comma-separated list surrounded by a | on either side: |width, height|. The reasoning is to add within the closure itself, but in a way that's differentiated from a capture list or other existing closure elements. For example adding as [width, height] conflicts with a capture list. A user may want to define both a capture list and dependency list that are different. Surrounding with {} or <> would be another option, but these already have other associated meanings in Swift syntax.
Another option would be to add as params to an attribute (this may only be possible if using attributes vs. keywords):
@memo(width, height) var area: Double { width * height }
or
@lazyMemo(width, height) var area: Double { width * height }
Along these lines, another option would be to use attributes, but move the entire attribute to the closure body:
var area: Double { @memo(width, height) in width * height }
or
var area: Double { @lazyMemo(width, height) in width * height }
A caveat

As envisioned above, this proposal will work as long as all observed property types are value types, but not for reference types. Updating a memoized value based on a change to a reference type property may be more involved or not possible. As written, I don't have a proposal for how to do this, but it's possible someone else does. For the moment, I'll say that being able to add a reference type to the dependency list is a nice-to-have, but requiring all dependencies to be value types is acceptable.
The reason this doesn't work is that when a value type (including Array or Dictionary is modified, it will trigger the didSet observer. However, if a reference type is modified, didSet is not triggered.
As an example:
// Value type
struct Foo {
    var value: Int = 0
}

// Reference type
class Bar {
    var value: Int = 0  
    init(value: Int) {
        self.value = value
    }
}

struct MemoList {
    // Value type
    var list: [Int] {
        didSet {
            setSum()
        }
    }
    // Value type
    var foo: Foo {
        didSet {
            setSum()
        }
    }
    // Reference type
    var bar: Bar {
        didSet {
            setSum()
        }
    }
    private(set) var sum: Int = 0
    private mutating func setSum() {
        sum = list.reduce(0) { (acc, current) in acc + current } + foo.value + bar.value
    }
    
    init(_ list: [Int] = []) {
        self.list = list
        self.foo = Foo(value: 0)
        self.bar = Bar(value: 0)
        setSum()
    }
}

var memoList = MemoList([1,2,3])
print(memoList.sum) // Prints 6 ✅
memoList.list.append(4)
print(memoList.sum) // Prints 10 ✅ (list is a value type)
memoList.foo.value = 3
print(memoList.sum) // Prints 13 ✅ (foo is a value type)
memoList.bar.value = 10
print(memoList.sum) // Prints 13 ❌ (bar is a reference type, so sum did not update)
Source compatibility

As long as the final syntax is vetted to not interfere with any existing syntax, this should have no impact on source compatibility. As the underlying functionality is already possible, the synthesized result would just be valid Swift code today.
Conclusion

There is certainly much discussion to be had about the best way to achieve this and the best syntax. However, I hope to have convinced you that adding this capability is a useful addition to the language.