cloutiertyler/Value Semantics.md

## Value Semantics.md

      
    Raw
  

              Value Semantics.md
            
          
    Thoughts on Value Semantics

Value semantics is obviously a super deep topic and I’m certain a lot smarter people have thought about it more deeply than I can, but I just need to get a couple thoughts off my chest. I’ve been thinking about it on and off since Swift introduced me to value types/semantics!
Since this is a short document about a huge topic, I’m just going to restrict it to thinking about two problems that seem related and possible solutions.

Arbitrary value semantic data structures
Relocatable (trivially-serializable) data structures (https://www.youtube.com/watch?v=FPUBjPYBsGI)

(I also think that looking at value semantics through the lens of these two problems also leads to a more general way of looking at value semantics.)
Defining Value Semantics

For simplicity I’m also going to restrict the thought experiment only to a “pure” environment (i.e. no IO, shared memory, any external side effects etc).
In such an environment I propose that value semantics should be defined in the following way:
A value semantic variable represents the name of a value and not the location of the value. Thus, a variable x is value semantic if for every nested-variable that is reachable through it cannot be changed except by changing the nested-variables through x.
e.g.
var x = DataStructure()
x.a = 3
x.root.next.next.next.a = 3

var y = x
y.a = 2
y.root.next.next.next.a = 2

assert(x.a != y.a) // x.a is unchanged
assert(x.root.next.next.next.a != y.root.next.next.next.a) // x.root.next.next.next.a is unchanged
I also assert that this definition is semantically equivalent to saying that all value semantic variables are effectively recursively immutable (all nested variables that can be reached through the variable are also immutable), with the caveat that there is syntactic sugar that allows you to replace the value of a variable with a new value if you edit one of its nested variables.
e.g.
var x = DataStructure()
x.a = 2
is just syntactic sugar for
let x' = DataStructure()
let x = x'.copyingAndSetting(a: 2)
If I understand correctly this also essentially what Joe Groff was getting at in a Twitter post the other day. Essentially this is the equivalent of saying that a variable is value semantic if and only if there is no logically shared mutable state, reachable from the variable.

I grant that this is may be a controvertial definition for "value semantic" since some would argue that it's not necessarily true that all reachable state should contribute to its "value" (for example can two variables with different state have equivalent (x.equals(y)) values?). I argue that that is a different problem entirely, which is also useful to think about, but is unrelated to value semantics as defined as syntatic sugar for immutability.

Arbitrary Value Semantic Data Structures

If we take this definition for "value semantic", then just having immutability in functional languages makes code easier to reason about, it seems natural that I would want to be able to extend value semantics to arbitrary parts of my code. In particular, I would like to create arbitrary value semantic data structures.
Right now in Swift this isn't possible in a straightforward way. For example, consider the most obvious way to create a linked-list:
struct LinkedList<T> {
    struct Node<T> {
        value: T
        next?: Node<T>
    }
    var root: Node<T>
    // ... etc.
}
This is impossible in Swift because a value type cannot contain a value of it's own type (because in Swift value types contain all of their subvalues).

I'm aware that you can implement something like this with recursive enums in Swift, but these don't actually have the semantics I'm looking for.

I'd like to take a closer look at why exactly it is impossible though. If you could only use value types in Swift, is it possible to implement something conceptually similar to a linked list? Sure:
struct LinkedList<T> {

    struct Node<T> {
        value: T
        nextKey?: Int
    }
    
    var root: Node<T>
    var nextIndex: [Int:Node<T>] = [:]    

}
This is very similar in concept, except everything here is a pure value type. You can implement the all of the standard list operations of normal linked list, except instead of jumping directly to the next memory address you first have to move indirectly through the nextIndex (which is slower and not as nice).
However! You do get some really nice properties from with this version. The most important is that it's now a completely value semantic data structure, in the same sense that array is a value type:
var x = LinkedList<Float>()
var y = x
y.root.nextKey = 3
x.root.nextKey = 4
assert(y.root.nextKey == 3)
The other awesome thing about this data structure is one that causes much handwringing in essentially every nontrivial programming project: serialization.
Relocatable Data Structures

The above data structure can be trivilially serialized automatically by simply dumping out all of the values in the structure. In fact, the only thing (maybe) that makes this difficult for the vast majority of data structures in most code is the fact that some values are obscured from the programmer: memory addresses.
Memory addresses are secret values in every data structure that hold it together. They aren't surfaced to the programmer because they only have meaning in the context on the particular heap in which the program is running. The have no value semantic meaning in the context of the data structure itself.
In the above value semantic linked list, I replaced (only kind of I grant) the memory addresses with "semantic pointers" which formed part of the data structure itself (the keys of the dictionary).
To illustrate my point more concretely, I built a working version by hand. Below is a quick and dirty implementation of a value semantic linked list. Although it's a quick implementation, it is still quite efficient because I rely on Swift's copy-on-write implementation for arrays.
protocol Serializable {
    var serialized: String { get}
}

struct PrivateHeap<T> {

    var heap: [T?] = []

    mutating func alloc(value: T) -> Int {
        for (i, memory) in heap.enumerated() {
            if memory == nil {
                heap[i] = value
                return i
            }
        }
        heap.append(value)
        return heap.count - 1
    }

    mutating func free(_ key: Int) {
        heap[key] = nil
    }

    subscript(key: Int) -> T {
        get {
            return heap[key]!
        }
        set {
            heap[key] = newValue
        }
    }

}

struct LinkedList<T>: CustomStringConvertible, Serializable where T: CustomStringConvertible {

    var nextKey = 0

    struct Node<T>: Serializable where T: CustomStringConvertible {
        var value: T?
        var nextKey: Int?

        init(value: T?) {
            self.value = value
        }

        var serialized: String {
            if nextKey != nil {
                return """
                { value: \(value!.description), next: \(nextKey!) }
                """
            }
            return """
            { value: \(value!.description) }
            """
        }
    }

    // Initialize with a sentinel node
    var heap: PrivateHeap<Node<T>>

    var rootKey: Int = 0
    var length: Int = 0

    init () {
        heap = PrivateHeap()
        rootKey = heap.alloc(value: Node(value: nil))
    }

    var isEmpty: Bool {
        return length == 0
    }

    var serialized: String {
        var nextKey: Int? = heap[rootKey].nextKey
        var s = "[\n"
        while nextKey != nil {
            let n = nextKey!
            let node = heap[n]
            s += "  "
            s += node.serialized
            if (node.nextKey != nil) {
                s += ",\n"
            } else {
                s += "\n"
            }
            nextKey = node.nextKey
        }
        s += "]"
        return s
    }

    var description: String {
        get {
            var description = "["
            var nextKey: Int? = heap[rootKey].nextKey
            while nextKey != nil {
                let node = heap[nextKey!]
                description += node.value!.description
                nextKey = node.nextKey
                if (nextKey != nil) {
                    description += ","
                }
            }
            description += "]"
            return description
        }
    }

    mutating func prepend(value: T) {
        var node = Node(value: value)
        node.nextKey = heap[rootKey].nextKey
        heap[rootKey].nextKey = heap.alloc(value: node)
    }

    mutating func append(value: T) {
        var nextKey: Int? = rootKey
        while nextKey != nil {
            let n = nextKey!
            let node = heap[n]
            if (node.nextKey == nil) {
                let node = Node(value: value)
                heap[n].nextKey = heap.alloc(value: node)
                return
            }
            nextKey = node.nextKey
        }
    }

    mutating func removeAll(where shouldBeRemoved: (T) -> Bool) {
        var key: Int = rootKey
        repeat {
            let nextNode = heap[heap[key].nextKey!]
            if shouldBeRemoved(nextNode.value!) {
                let nextNextKey = nextNode.nextKey
                heap.free(heap[key].nextKey!)
                heap[key].nextKey = nextNextKey
            } else {
                key = heap[key].nextKey!
            }
        } while heap[key].nextKey != nil
    }

}

var list1 = LinkedList<Int>()

// append
list1.append(value: 1)
list1.append(value: 2)
list1.append(value: 3)
list1.append(value: 4)

print("Lists are the same:")
var list2 = list1 // Assigning list1 to list2 logically copies the value of list1
print(list1) // list1 contains 1,2,3,4
print(list2) // list2 contains the same elements as list1
print()

print("Appending 8 to list1 but not list2:")
list1.append(value: 8) // mutate list1
print(list1) // list1 is changed
print(list2) // list2 remains unchanged
print()

print("Prepending 0 to list1 but not list2:")
list1.prepend(value: 0) // mutate list1
print(list1) // list1 is changed
print(list2) // list2 remains unchanged
print()

print("Removing odd from list1 but not list2:")
list1.removeAll { $0 % 2 == 1 }
print(list1)
print(list2)
print()

print("Serializing lists 1 and 2:")
// We also get serialization for free
print(list1.serialized)
print(list2.serialized)
The question I have is: Can, or more importantly, should a "namespaced" or "private" heap be integrated into the language in such a way that allows the programmer to create value semantic data structures like the above without having to implement a private heap of there own (also can the keys be made "safe" by the compiler). I really don't know if it's something that would make programming a better experience, but the above shows that value semantics can be achieved efficiently even for data structures that typically rely on using references.
For example it might be cool if a programmer could implement their linked list (almost) exactly as they would if they were using reference types and have it all just work. The only detail is that the "keys" or "pointers" are in the user's data now, rather than hidden behind reference types.
Value type vs Value Context


TODO

Rather than having value types vs reference types, what about value context vs reference context: pure functions with value semantic parameters vs proceedures with reference (or essentially inout) semantic parameters