- Proposal: SE-NNNN
- Authors: Vincent Esche
- Review Manager: TBD
- Status: Awaiting review
During the review process, add the following fields as needed:
Replace Hashable
with a more modern, versatile and easier to learn HashVisitable
protocol.
Swift-evolution thread: Discussion thread topic for that proposal
Implementing Hashable
for a custom type is a rather error prone endeavor for anybody not accustomed to the peculiarities of the math behind and requirements of hash functions. At the same time the current state of Hashable
is somewhat limiting in its capabilities.
- Application developers should not have to embark on the task of implementing hashing algorithms. We have experts for that. For good reasons. There are probably hundreds of implementations of
hashValue
that couldn’t even be considered useful or at least valid general-purpose hashes, let alone cryptographic ones. - Many types (
Float
,Double
,String
, …) require deep domain-knowledge in order to correctly implementHashable
for them. Traps everywhere. - Badly implementing
hashValue
cripples performance ofSet
andDictionary
and opens the doors for DDOS attacks when being exposed to the web, as it happens when using Swift on the server. - One cannot exchange the hashing algorithm used by a type one does not own. And even for the types one does indeed own exchanging the hashing algorithm is far more involved than passing it a different implementation.
- One cannot implement more than one hashing algorithm at a time for a given type.
Instead of coupling the hashing algorithm with each and every Swift type we should provide a hashing API based on the visitor-pattern.
This would:
- Free application developers of the burden of having to implement
var hashValue
. - Not require any hashing-related domain knowledge. The type-system would tell you when you're on the wrong track.
- Allow one to write web-exposed APIs without having to worry about opening the door for DDOS attacks.
- Exchange the hashing function based on the use-case and security/performance requirements.
- Allow one to use for more than one hash function for a type (or even a single instance). This is essential for being able to apply optimizations from 4) selectively within a code base (secure where it matters, fast where it doesn't). As well as for implementing things like Bloom Filters.
The stdlib would then provide a set of hashing implementations, batteries included.
A possible choice for hashing algorithms would be the reasonably fast SipHash-1-3 or SipHash-2-4, and the resonably secure SipHash-4-8.
One would add the following to the stdlib:
import Foundation
// A hashable type.
protocol HashVisitable {
// Passes relevant members of `self` into `hasher`
func hash<H: HasherProtocol>(_ hasher: H)
}
// A hasher for arbitrary bytes slices.
protocol HasherProtocol: class {
// Generates final hash value.
// Note: A hasher must not be re-used after calling `finish()` on it.
func finish() -> Int
// Passes a given byte slice into the hasher
func write(bytes: UnsafeRawBufferPointer)
}
// A factory for pre-configured hashers.
protocol HasherBuilder {
// Type of the hasher that `self` is capable of generating.
associatedtype Hasher: HasherProtocol
// Creates a new hasher builder.
init()
// Builds a new pre-configured hasher.
// Note: Repeated calls of `build()` must return equally
// configured, yet non-identical instances of Hasher.
func build() -> Hasher
}
// A factory for pre-configured instances of `DefaultHasher`.
struct DefaultHasherBuilder: HasherBuilder {
typealias Hasher = DefaultHasher
// The configuration used by hashes returned from `build()`.
private let keys: (UInt, UInt) = {
func randomKey() -> UInt {
let lower = UInt(arc4random())
let upper = UInt(arc4random())
return lower | (upper << 32)
}
return (randomKey(), randomKey())
}()
static let shared: DefaultHasherBuilder = .init
func build() -> Hasher {
return DefaultHasher(keys: self.keys)
}
}
// An implementation of SipHash 1-3 provided by the stdlib.
final class SipHasher13: HasherProtocol {
init(keys: (UInt, UInt)) { /* ... */ }
func finish() -> Int { /* ... */ }
func write(bytes: UnsafeRawBufferPointer) { /* ... */ }
}
// An implementation of SipHash 2-4 provided by the stdlib.
final class SipHasher24: HasherProtocol {
init(keys: (UInt, UInt)) { /* ... */ }
func finish() -> Int { /* ... */ }
func write(bytes: UnsafeRawBufferPointer) { /* ... */ }
}
// An implementation of SipHash 4-8 provided by the stdlib.
final class SipHasher48: HasherProtocol {
init(keys: (UInt, UInt)) { /* ... */ }
func finish() -> Int { /* ... */ }
func write(bytes: UnsafeRawBufferPointer) { /* ... */ }
}
// The default hasher of stdlib which hides its internal logic.
final class DefaultHasher: HasherProtocol {
// Choice of inner hasher is an implementation detail
// and should thus not be relied upon:
private var inner: SipHasher24
internal init(keys: (UInt, UInt)) {
self.inner = SipHasher24(keys: keys)
}
func finish() -> Int {
return self.inner.finish()
}
func write(bytes: UnsafeRawBufferPointer) {
return self.inner.write(bytes: bytes)
}
}
// Implementation of `HashVisitable` for UInt8.
// Note: An equivalent implementation would get added to:
// - `UInt8/16/32/64`,
// - `Int8/16/3/64`,
// - `UInt`, `Int`,
// - `Float`, `Double`
// - `Bool`
// - …
extension UInt8: HashVisitable {
func hash<H: HasherProtocol>(_ hasher: H) {
var mutableSelf = self
try! Swift.withUnsafeBytes(of: &mutableSelf) {
hasher.write(bytes: $0)
}
}
}
Given a type …
struct GridPoint {
var x: Int
var y: Int
}
extension GridPoint: Equatable {
static func == (lhs: GridPoint, rhs: GridPoint) -> Bool {
return lhs.x == rhs.x && lhs.y == rhs.y
}
}
… as found in the documentation for Hashable
, one would implement HashVisitable
like this:
extension GridPoint : HashVisitable {
func hash<H: HasherProtocol>(_ hasher: H) {
self.x.hash(hasher)
self.y.hash(hasher)
}
}
No delicate domain-knowledge about hashing necessary. Fool-proof.
FNV-1A is another popular semi-secure but blazingly fast hash algorithm, which could be implemented as follows:
class Fnv1aHash {
fileprivate var state: UInt
init(seed: UInt) {
self.state = seed &+ 14695981039346656037
}
}
extension Fnv1aHash: HasherProtocol {
func write(bytes: UnsafeRawBufferPointer) {
for byte in bytes {
self.state = (self.state ^ UInt(byte)) &* 1099511628211
}
}
func finish() -> Int {
return unsafeBitCast(self.state, to: Int.self)
}
}
In order to reap the benefits of this proposal both Set<E>
and Dictionary<K, V>
would have to be changed from:
public struct Set<Element> : SetAlgebra, Hashable, Collection, ExpressibleByArrayLiteral where Element : Hashable { /* … */ }
public struct Dictionary<Key, Value> : Collection, ExpressibleByDictionaryLiteral where Key : Hashable { /* … */ }
… to:
public struct Set<Element, Hasher = DefaultHashBuilder> : SetAlgebra, Hashable, Collection, ExpressibleByArrayLiteral where Element : Hashable, Hasher: HasherBuilder { /* … */ }
public struct Dictionary<Key, Value, Hasher = DefaultHashBuilder> : Collection, ExpressibleByDictionaryLiteral where Key : Hashable, Hasher: HasherBuilder { /* … */ }
Making HashSet
/HashMap
generic over a Hasher
is a pattern also found in C++ and Rust and has proven to be quite versatile.
On a normal day one would only ever get in contact with HashVisitable
.
The only work/knowledge required would for implementing HashVisitable
for a custom type would involve calling .hash()
on relevant members of self
:
func hash<H: HasherProtocol>(_ hasher: H) {
self.foo.hash(hasher)
self.bar.hash(hasher)
// …
}
Everything else in this proposal just forms the foundation for making all of this flexibility possible.
A proper Swift IDE (such as Xcode) could even easily provide one-click implementation scaffolding of hash(_:)
for custom types based on its stored properties. Or going a step further Swift itself could provide some kind of Auto-Impl for HashVisitable
based on introspection, which could then be overriden by a custom implementation.
While migrating Set
and Dictionary
from Hashable
to HashVisitable
would be a breaking change there are several possible ways to provide backwards compatible backports to make the migration less bumpy.
a) A backport implementation of HashVisitable
in terms of Hashable
:
extension Hashable: HashVisitable {
func hash<H: HasherProtocol>(_ hasher: H) {
self.hashValue.hash(hasher)
}
}
This would only allow for backwards compatibility,
while bringing none of the benefits of HashVisitable
.
It should emit a compiler warning for existing implementations,
urging developers to remove existing implementations of Hashable
and properly implement HashVisitable
instead.
b) A backport implementation of Hashable
in terms of HashVisitable
:
extension HashVisitable: Hashable {
var hashValue: Int {
let builder = DefaultHashBuilder().shared
let hasher = builder.build()
self.hashValue.hash(hasher)
return hasher.finish()
}
}
While hasher.hash(self.member)
would have a far more intuitive, than self.member.hash(hasher)
, the logic of HashVisitable
in the end has to be implemented on the type being hashed, to allow for the inclusion of private members.
Thus a convenience method on HasherProtocol
would have to call value.hash(hasher)
:
extension HasherProtocol {
func hash<H: HashVisitable>(_ value: H) {
value.hash(self)
}
}
Which would open up the possibility of infinite call cycles when a programmer mistakenly implements HashVisitable
like this:
extension Foo: HashVisitable {
func hash<H: HasherProtocol>(_ hasher: H) {
hasher.hash(self)
}
}
Due to Swift's current lack of ergonomic and safe referencing of structs a design for HashVisitable
is either forced to implement HasherProtocol
as reference type (class
) or to pass hashers to hash(_:)
as inout
.
While the latter has the additional benefit of clear value-semantics (avoiding unintential shared use of a single hasher) it led to quite a bit of concerns on the Swift-Evolution mailing list with regards to beginner-friendlyness.
Instead of making HashVisitable
generic over Hashable
one could have the stdlib provide a single default hasher type.
While this would address points 1), 2), 3) from the Motivation, it would not address points 4) and 5) at all.
Point 4) however is crucial for being able to customize bottlenecks or security-critical APIs for both, best possible performance and security.
Point 5) is crucial for being able to apply optimizations from 4) selectively within a code base (secure where it matters, fast where it doesn't).
The design of this very proposal has in fact been strongly inspired by Rust's use of Hash
, Hasher
& BuildHash
.
Graydon Hoare and Huon Wilson from the Swift Core team should be able to provide an educated verdict on how successful the design has proven itself for Rust. From my own experience I can tell that it hardly ever gets in the way and one hardly ever has to interact with Hasher
& BuildHash
beyond fn hash<H: Hasher>(&self, state: &mut H)
.
Howard E. Hinnant, member of the C++ standards committee submitted a paper of similar design in 2014, which, as far as I know, is still in active discussion for inclusion in C++17 or later: