Skip to content

Instantly share code, notes, and snippets.

@regexident
Last active October 12, 2017 09:00
Show Gist options
  • Star 5 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save regexident/1b8e84974da2243e5199e760508d2d25 to your computer and use it in GitHub Desktop.
Save regexident/1b8e84974da2243e5199e760508d2d25 to your computer and use it in GitHub Desktop.
Swift Evolution Proposal Draft for replacing `Hashable` with `Hasher` and `HashVisitable`.

HashVisitable

During the review process, add the following fields as needed:

Introduction

Replace Hashable with a more modern, versatile and easier to learn HashVisitable protocol.

Swift-evolution thread: Discussion thread topic for that proposal

Motivation

Implementing Hashable for a custom type is a rather error prone endeavor for anybody not accustomed to the peculiarities of the math behind and requirements of hash functions. At the same time the current state of Hashable is somewhat limiting in its capabilities.

  1. Application developers should not have to embark on the task of implementing hashing algorithms. We have experts for that. For good reasons. There are probably hundreds of implementations of hashValue that couldn’t even be considered useful or at least valid general-purpose hashes, let alone cryptographic ones.
  2. Many types (Float, Double, String, …) require deep domain-knowledge in order to correctly implement Hashable for them. Traps everywhere.
  3. Badly implementing hashValue cripples performance of Set and Dictionary and opens the doors for DDOS attacks when being exposed to the web, as it happens when using Swift on the server.
  4. One cannot exchange the hashing algorithm used by a type one does not own. And even for the types one does indeed own exchanging the hashing algorithm is far more involved than passing it a different implementation.
  5. One cannot implement more than one hashing algorithm at a time for a given type.

Proposed solution

Instead of coupling the hashing algorithm with each and every Swift type we should provide a hashing API based on the visitor-pattern.

This would:

  1. Free application developers of the burden of having to implement var hashValue.
  2. Not require any hashing-related domain knowledge. The type-system would tell you when you're on the wrong track.
  3. Allow one to write web-exposed APIs without having to worry about opening the door for DDOS attacks.
  4. Exchange the hashing function based on the use-case and security/performance requirements.
  5. Allow one to use for more than one hash function for a type (or even a single instance). This is essential for being able to apply optimizations from 4) selectively within a code base (secure where it matters, fast where it doesn't). As well as for implementing things like Bloom Filters.

The stdlib would then provide a set of hashing implementations, batteries included.

A possible choice for hashing algorithms would be the reasonably fast SipHash-1-3 or SipHash-2-4, and the resonably secure SipHash-4-8.

Detailed design

User-facing API

One would add the following to the stdlib:

import Foundation

// A hashable type.
protocol HashVisitable {
    // Passes relevant members of `self` into `hasher`
    func hash<H: HasherProtocol>(_ hasher: H)
}

// A hasher for arbitrary bytes slices.
protocol HasherProtocol: class {
    // Generates final hash value.
    // Note: A hasher must not be re-used after calling `finish()` on it.
    func finish() -> Int
    
    // Passes a given byte slice into the hasher
    func write(bytes: UnsafeRawBufferPointer)
}

// A factory for pre-configured hashers.
protocol HasherBuilder {
    // Type of the hasher that `self` is capable of generating.
    associatedtype Hasher: HasherProtocol
    
    // Creates a new hasher builder.
    init()
    
    // Builds a new pre-configured hasher.
    // Note: Repeated calls of `build()` must return equally
    //       configured, yet non-identical instances of Hasher.
    func build() -> Hasher
}

// A factory for pre-configured instances of `DefaultHasher`.
struct DefaultHasherBuilder: HasherBuilder {
    typealias Hasher = DefaultHasher
    
    // The configuration used by hashes returned from `build()`.
    private let keys: (UInt, UInt) = {
        func randomKey() -> UInt {
            let lower = UInt(arc4random())
            let upper = UInt(arc4random())
            return lower | (upper << 32)
        }
        return (randomKey(), randomKey())
    }()
    
    static let shared: DefaultHasherBuilder = .init
    
    func build() -> Hasher {
        return DefaultHasher(keys: self.keys)
    }
}

// An implementation of SipHash 1-3 provided by the stdlib.
final class SipHasher13: HasherProtocol {
    init(keys: (UInt, UInt)) {  /* ... */ }
    func finish() -> Int { /* ... */ }
    func write(bytes: UnsafeRawBufferPointer) { /* ... */ }
}

// An implementation of SipHash 2-4 provided by the stdlib.
final class SipHasher24: HasherProtocol {
    init(keys: (UInt, UInt)) {  /* ... */ }
    func finish() -> Int { /* ... */ }
    func write(bytes: UnsafeRawBufferPointer) { /* ... */ }
}

// An implementation of SipHash 4-8 provided by the stdlib.
final class SipHasher48: HasherProtocol {
    init(keys: (UInt, UInt)) {  /* ... */ }
    func finish() -> Int { /* ... */ }
    func write(bytes: UnsafeRawBufferPointer) { /* ... */ }
}

// The default hasher of stdlib which hides its internal logic.
final class DefaultHasher: HasherProtocol {
    // Choice of inner hasher is an implementation detail
    // and should thus not be relied upon:
    private var inner: SipHasher24
    
    internal init(keys: (UInt, UInt)) {
        self.inner = SipHasher24(keys: keys)
    }
    
    func finish() -> Int {
        return self.inner.finish()
    }
    
    func write(bytes: UnsafeRawBufferPointer) {
        return self.inner.write(bytes: bytes)
    }
}

// Implementation of `HashVisitable` for UInt8.
// Note: An equivalent implementation would get added to:
//       - `UInt8/16/32/64`,
//       - `Int8/16/3/64`,
//       - `UInt`, `Int`,
//       - `Float`, `Double`
//       - `Bool`
//       - …
extension UInt8: HashVisitable {
    func hash<H: HasherProtocol>(_ hasher: H) {
        var mutableSelf = self
        try! Swift.withUnsafeBytes(of: &mutableSelf) {
            hasher.write(bytes: $0)
        }
    }
}

Implementing HashVisitable for a custom type

Given a type …

struct GridPoint {
    var x: Int
    var y: Int
}

extension GridPoint: Equatable {
    static func == (lhs: GridPoint, rhs: GridPoint) -> Bool {
        return lhs.x == rhs.x && lhs.y == rhs.y
    }
}

… as found in the documentation for Hashable, one would implement HashVisitable like this:

extension GridPoint : HashVisitable {
    func hash<H: HasherProtocol>(_ hasher: H) {
        self.x.hash(hasher)
        self.y.hash(hasher)
    }
}

No delicate domain-knowledge about hashing necessary. Fool-proof.

Implementing HasherProtocol for a custom hasher

FNV-1A is another popular semi-secure but blazingly fast hash algorithm, which could be implemented as follows:

class Fnv1aHash {
    fileprivate var state: UInt
    init(seed: UInt) {
        self.state = seed &+ 14695981039346656037
    }
}

extension Fnv1aHash: HasherProtocol {
    func write(bytes: UnsafeRawBufferPointer) {
        for byte in bytes {
            self.state = (self.state ^ UInt(byte)) &* 1099511628211
        }
    }
    func finish() -> Int {
        return unsafeBitCast(self.state, to: Int.self)
    }
}

Changes to Set<E>, Dictionary<K, V>

In order to reap the benefits of this proposal both Set<E> and Dictionary<K, V> would have to be changed from:

public struct Set<Element> : SetAlgebra, Hashable, Collection, ExpressibleByArrayLiteral where Element : Hashable { /* … */ }
public struct Dictionary<Key, Value> : Collection, ExpressibleByDictionaryLiteral where Key : Hashable { /* … */ }

… to:

public struct Set<Element, Hasher = DefaultHashBuilder> : SetAlgebra, Hashable, Collection, ExpressibleByArrayLiteral where Element : Hashable, Hasher: HasherBuilder { /* … */ }
public struct Dictionary<Key, Value, Hasher = DefaultHashBuilder> : Collection, ExpressibleByDictionaryLiteral where Key : Hashable, Hasher: HasherBuilder { /* … */ }

Making HashSet/HashMap generic over a Hasher is a pattern also found in C++ and Rust and has proven to be quite versatile.

General use

On a normal day one would only ever get in contact with HashVisitable.

The only work/knowledge required would for implementing HashVisitable for a custom type would involve calling .hash() on relevant members of self:

func hash<H: HasherProtocol>(_ hasher: H) {
    self.foo.hash(hasher)
    self.bar.hash(hasher)
    // …
}

Everything else in this proposal just forms the foundation for making all of this flexibility possible.

A proper Swift IDE (such as Xcode) could even easily provide one-click implementation scaffolding of hash(_:) for custom types based on its stored properties. Or going a step further Swift itself could provide some kind of Auto-Impl for HashVisitable based on introspection, which could then be overriden by a custom implementation.

Source compatibility

While migrating Set and Dictionary from Hashable to HashVisitable would be a breaking change there are several possible ways to provide backwards compatible backports to make the migration less bumpy.

a) A backport implementation of HashVisitable in terms of Hashable:

extension Hashable: HashVisitable {
    func hash<H: HasherProtocol>(_ hasher: H) {
        self.hashValue.hash(hasher)
    }
}

This would only allow for backwards compatibility, while bringing none of the benefits of HashVisitable. It should emit a compiler warning for existing implementations, urging developers to remove existing implementations of Hashable and properly implement HashVisitable instead.

b) A backport implementation of Hashable in terms of HashVisitable:

extension HashVisitable: Hashable {
    var hashValue: Int {
        let builder = DefaultHashBuilder().shared
        let hasher = builder.build()
        self.hashValue.hash(hasher)
        return hasher.finish()
    }
}

Effect on ABI stability

‼️ MISSING Please feel free to give input!

Effect on API resilience

‼️ MISSING Please feel free to give input!

Alternatives considered

value.hash(hasher) vs. hasher.hash(value)

While hasher.hash(self.member) would have a far more intuitive, than self.member.hash(hasher), the logic of HashVisitable in the end has to be implemented on the type being hashed, to allow for the inclusion of private members.

Thus a convenience method on HasherProtocol would have to call value.hash(hasher):

extension HasherProtocol {
    func hash<H: HashVisitable>(_ value: H) {
        value.hash(self)
    }
}

Which would open up the possibility of infinite call cycles when a programmer mistakenly implements HashVisitable like this:

extension Foo: HashVisitable {
    func hash<H: HasherProtocol>(_ hasher: H) {
        hasher.hash(self)
    }
}

class Hasher vs. struct Hasher + inout

Due to Swift's current lack of ergonomic and safe referencing of structs a design for HashVisitable is either forced to implement HasherProtocol as reference type (class) or to pass hashers to hash(_:) as inout.

While the latter has the additional benefit of clear value-semantics (avoiding unintential shared use of a single hasher) it led to quite a bit of concerns on the Swift-Evolution mailing list with regards to beginner-friendlyness.

Single default Hasher vs. Generics

Instead of making HashVisitable generic over Hashable one could have the stdlib provide a single default hasher type.

While this would address points 1), 2), 3) from the Motivation, it would not address points 4) and 5) at all.

Point 4) however is crucial for being able to customize bottlenecks or security-critical APIs for both, best possible performance and security.

Point 5) is crucial for being able to apply optimizations from 4) selectively within a code base (secure where it matters, fast where it doesn't).

Proven design?

The design of this very proposal has in fact been strongly inspired by Rust's use of Hash, Hasher & BuildHash. Graydon Hoare and Huon Wilson from the Swift Core team should be able to provide an educated verdict on how successful the design has proven itself for Rust. From my own experience I can tell that it hardly ever gets in the way and one hardly ever has to interact with Hasher & BuildHash beyond fn hash<H: Hasher>(&self, state: &mut H).

Howard E. Hinnant, member of the C++ standards committee submitted a paper of similar design in 2014, which, as far as I know, is still in active discussion for inclusion in C++17 or later:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment