Skip to content

Instantly share code, notes, and snippets.

@cfilipov
Last active July 21, 2016 00:49
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cfilipov/52828c2c0ea70193f50b to your computer and use it in GitHub Desktop.
Save cfilipov/52828c2c0ea70193f50b to your computer and use it in GitHub Desktop.
Generic-Key Trie in Swift

Generic Trie in Swift

This is an attempt to implement a trie in Swift with a generic key. Instead of using just strings as keys, this trie can be keyed by any object that can generate its own prefixes. This is accomplished by using a protocol to define a trie key that can return a generator of Hashables. We need Hashables because the prefixes will be stored in a Dictionary.

Defining the Key Type

The TrieNode class will be a private implementation detail. It helps to separate the implementation into TrieNode and the wrapper for a few reasons:

  1. If we want to extend the trie to implement some common Swift collection protocols, many of those don't make sense for every node.
  2. We want references for nodes but the wrapper struct can help us simulate value semantics.
  3. This will become useful later when we implement Indexable below.
public protocol TrieKey: Equatable {
    typealias PrefixType: Hashable
    func generatePrefixes() -> AnyGenerator<PrefixType>
}

private final class TrieNode<Key: TrieKey, Value> {
    private var nodes: [Key.PrefixType: TrieNode] = [:]
    private var element: (Key, Value)?
}

Trie Implementation

The actual Trie struct implements all the desired behavior of a trie. It contains the root node and some functions for mutating and searching the trie. The methods updateValue(forKey:) and removeValueForKey() have the same semantics as their eponymous Dictionary counterparts. The comment docs are pretty much copied from Dictionary.

Note that find() is private. We could have made this method return a Value? instead and make it public, and we'd be done. But later on we'd like to implement some collection type and other protocols, and having find() return the node instead of just the value will be useful.

public struct Trie<Key: TrieKey, Value> {
    private var rootNode = TrieNode<Key, Value>()

    /// Update the value stored in the trie for the given key, or, if they
    /// key does not exist, add a new key-value pair to the trie.
    ///
    /// Returns the value that was replaced, or `nil` if a new key-value pair
    /// was added.
    public mutating func updateValue(value: Value, forKey key: Key) -> Value? {
        var cur = self.rootNode
        for c in key.generatePrefixes() {
            if let next = cur.nodes[c] {
                cur = next
            } else {
                let new = TrieNode<Key, Value>()
                cur.nodes[c] = new
                cur = new
            }
        }
        defer { cur.element = (key, value) }
        return cur.element?.1
    }

    /// Remove a given key and the associated value from the trie.
    /// Returns the value that was removed, or `nil` if the key was not present
    /// in the trie.
    public mutating func removeValueForKey(key: Key) -> Value? {
        guard let node = find(key), (_, value) = node.element else {
            return nil
        }
        defer { node.element = nil }
        return value
    }

    /// Returns the value for the given key, or `nil` if the key is not
    /// present in the trie.
    private func find(key: Key) -> TrieNode<Key, Value>? {
        var cur = rootNode
        for c in key.generatePrefixes() {
            if let next = cur.nodes[c] {
                cur = next
            }
        }
        if let (k, _) = cur.element where k == key {
            return cur
        }
        return nil
    }

    // This implements `DictionaryLiteralConvertible`. For some reason, putting this implementation in an extension crashes the Swift compiler.
    public init(dictionaryLiteral elements: (Key, Value)...) {
        for (k,v) in elements {
            self.updateValue(v, forKey: k)
        }
    }
}

Conforming String to TrieKey

That's it! One last thing to do before we can use the trie, we need something that implements the TrieKey protocol. Let's say we want to key off String:

extension String: TrieKey {
    public func generatePrefixes() -> AnyGenerator<Character> {
        return anyGenerator(self.characters.generate())
    }
}

Using the Trie

Note the lines below that use find() will only work when in the same source file as Trie because of the private membership.

var trie1 = Trie<String, Int>()
assert( trie1.updateValue(1, forKey: "foo") == nil )
assert( trie1.updateValue(3, forKey: "bar") == nil )
trie1.find("foo")
trie1.find("bar")
assert( trie1.updateValue(2, forKey: "foo") == 1 )
assert( trie1.removeValueForKey("bar") == 3 )
assert( trie1.find("bar") == nil )

Make it Indexable

To really get the most out of a collection we'd like to implement some of Swift's built in collection protocols. This will give us subscripts, literals and all the good stuff that comes for free for collections. It's natural to make a trie indexable so we'll implement that protocol.

An index into a Trie is an opaque object which hides a reference to a node as well as a queue of future nodes to visit. Think of TrieIndex as an encapsulation of the state of a breadth-first search. Each TrieIndex instance represents the state for a particular iteration of the search. For each node we visit we would add the child nodes to a queue, then dequeue and repeat. TrieIndex carries the queue with it, along with a reference to the current node. Succeeding indexes are generated by creating a new index with the next state of the queue.

public struct TrieIndex<Key: TrieKey, Value> {
    private let node: TrieNode<Key, Value>
    private let queue: [TrieNode<Key, Value>]
}

private extension TrieIndex {
    private init(node: TrieNode<Key, Value>) {
        self.node = node
        self.queue = node.nodes.map { $0.1 }
    }
}

The end index is represented by an index that has no element and an empty queue of future nodes. We use identity to equate all other nodes.

extension TrieIndex: Equatable {}
public func == <Key, Value>(lhs: TrieIndex<Key, Value>, rhs: TrieIndex<Key, Value>) -> Bool {
    if (lhs.node.element == nil && rhs.node.element == nil)
        && (lhs.queue.count == 0 && rhs.queue.count == 0) {
        return true
    }
    return (lhs.node === rhs.node)
}

successorByNode() is a private implementation detail of the index. When we implement successor(), we want each index returned to be an index of a node with a key and value. But not all nodes in the prefix tree have values. successorByNode() is similar to successor() as defined by ForwardIndexType, but it simply returns the index for the next node regardless of if it has a value.

private extension TrieIndex {
    private func successorByNode() -> TrieIndex<Key, Value>? {
        guard let next = queue.first else { return nil }
        let car = queue.dropFirst()
        let cdr = next.nodes.map { $0.1 }
        return TrieIndex(node: next, queue: car + cdr)
    }
}

We'll make TrieIndex conform to ForwardIndexType because that seems to be a natural fit for how we would traverse a tree (would it even make sense if it were RandomAccessIndexType?).

successor() recursively calls successor() on the index returned from successorByNode() until it finds an index with a value or there are no more nodes.

extension TrieIndex: ForwardIndexType {
    public func successor() -> TrieIndex<Key, Value> {
        guard let suc = self.successorByNode() else {
            return TrieIndex(node: TrieNode(), queue: [])
        }
        if let _ = suc.node.element {
            return suc
        }
        return suc.successor()
    }
}

Note we haven't explicitly conformed to Indexable, we've just created the type we need for that protocol. The next section we make Trie conform to CollectionType which extends Indexable.

Conforming to CollectionType

Now that we have something to represent an index we can conform to CollectionType. We define the endIndex and startIndex by creating TrieIndex instances with an empty node and root node respectively. Note the root node may not actually contain a value, in which case we get the successor() which we have defined to return the index of the next node with a value.

subscript(position:) contains the only occurrence of the force-unwrap operator throughout the implementation of this data structure. We're following the same pattern as Dictionary here by not returning an optional for index subscripting and consider it a fatal error to use an invalid subscript.

extension Trie: CollectionType {
    public typealias Index = TrieIndex<Key, Value>
    public typealias Element = (Key, Value)
    public var endIndex: Index { return TrieIndex(node: TrieNode()) }

    public var startIndex: Index {
        let index = TrieIndex(node: rootNode)
        guard let _ = index.node.element else {
            return index.successor()
        }
        return index
    }

    public subscript (position: Index) -> Element {
        return position.node.element!
    }

    public subscript (key: Key) -> Value? {
        get {
            return find(key)?.element?.1
        }
        set(newVal) {
            if let newVal = newVal {
                updateValue(newVal, forKey: key)
            } else {
                removeValueForKey(key)
            }
        }
    }
}

As mentioned previously, putting the implementation for DictionaryLiteralConvertible in an extension crashes the Swift compiler so we left that in the struct definition itself.

extension Trie: DictionaryLiteralConvertible {}

Using the Dictionary syntax

Now we're ready to use the trie with richer syntax.

var trie2 = Trie<String, Int>()
assert( trie2["foo"] == nil )
trie2["foo"] = 1
assert( trie2["foo"] == 1 )
trie2["foo"] = nil
assert( trie2["foo"] == nil )
trie2["bar"] = 2
trie2["baz"] = 12
trie2["qux"] = 64

You can iterate over all the key and values in the trie.

for (key, value) in trie2 {
    print("key: \(key), value: \(value)")
}

Using indexing:

for i in trie2.startIndex..<trie2.endIndex {
    let (key, value) = trie2[i]
    print("key: \(key), value: \(value)")
}

We can use dictionary literals to create an immutable trie.

let trie3 = ["foo": 1, "bar": 12, "baz": 64] as Trie

assert( trie3["foo"] == 1 )
assert( trie3["bar"] == 12 )
assert( trie3["baz"] == 64 )

So you can't change the values.

//trie3["foo"] = 2 // can't modify trie3, it's defined with `let`

Using Int as a Key

But we didn't go through all this trouble to define a generic trie just to use it with strings! Let's use Int as a key now. I'll cheat a little bit and consider the "prefix" of the Int to be the least significant digit.

extension Int: TrieKey {
    public func generatePrefixes() -> AnyGenerator<Int> {
        var tail = self
        return anyGenerator {
            defer { tail /= 10 }
            let dig = tail % 10
            return (dig == 0) ? nil : dig
        }
    }
}

let trie = [1:"Foo", 42:"Bar"] as Trie

assert( trie[1] == "Foo" )
assert( trie[2] == nil )
assert( trie[42] == "Bar" )

Making the actual generic trie data structure was easy, most of the hard work was conforming to the protocols necessary for the nice syntax.

@cfilipov
Copy link
Author

It should be noted that removeValueForKey is incomplete because it does not prune the tree if its children do not contain values, so this tree will always grow. Easy to fix, but this should serve as a warning to anyone stumbling across this code from a google search or something, don't use random code on the internet unless you understand exactly what it does. The main point of this gist was to play around with Swift's type system and built-in collection types using a well known data structure.

@cfilipov
Copy link
Author

We can also very easily have this trie work with any collection as a key:

public extension TrieKey where Self: CollectionType, Self.Generator.Element == PrefixType {
    func generatePrefixes() -> AnyGenerator<PrefixType> {
        return anyGenerator(self.generate())
    }
}

Note that I intentionally didn't do this for SequenceType even though you could. SequenceType doesn't guarantee that iterating over items isn't destructive, so it seems CollectionType is more appropriate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment