There is a danger with typeclasses only on functions because they permit a lack of coherence. This danger is particularly acute with collections, where the definition of a particular interface---such as hashable or ord---may be used in structuring the data structure. This renders the data structure corrupt when used with a different implementation.
To see the problem, consider a hashtable defined something like this:
mod ht {
iface hash {
fn hash() -> uint;
fn eq(other: self) -> bool;
}
type t<K,V> = { buckets: [bucket<K,V>] }
fn create<K:hash,V>() -> @t<K,V> {
...
}
fn put<K:hash,V>(ht: @t<K,V>, k: K, v: V) {
...
}
fn get<K:hash,V>(ht: @t<K,V>, k: K) -> V {
... // fails if not present
}
}
This looks reasonable. Now we create two modules, Module1
and
Module2
, which both share a reference to the same hashtable, but
which have different implementations of the hash
interface in scope:
mod Module1 {
impl hash for uint {
fn hash() -> uint { ret self; }
fn eq(other: int) -> bool { ret self == other; }
}
...
fn foo() -> str {
let h = ht::create();
ht::put(ht, 3u, "hi"); // 3u.hash() == 3u here
ret Module2::bar(h);
}
}
mod Module2 {
impl hash for uint {
fn hash() -> uint { ret self / 2; }
fn eq(other: int) -> bool { ret self == other; }
}
...
fn bar(h: @ht::t<uint, str>) -> str {
ret ht_get(ht, 3u); // fails because 3u.hash() == 1u here
}
}
Here, Module2::bar()
will fail to find any value for 3u
because it
is using an incompatible hash implementation. The annoying problem
here is that both modules are correct in and of themselves, they are
just using mutually inconsistent definitions of hash for uint
.
We cannot completely rule out the possibility of writing code like this. I believe it is inherent in any solution that involves both scoped implementations (i.e., not global, as in Haskell) and open types (i.e., methods can be declared even after the type has been defined, unlike Java or C#).
However, we can make it possible to write code that detects such mismatches. The idea has two parts. First, we allow type variables in types to have interfaces associated with them:
type HT<K : hash,V> = { ... }
Next, we say that the value for a type variable is not just a type,
like uint
, but rather a tuple type : impl*
of a type and zero or
more implementations, one for each interface bound. Therefore, the
full type of the hashtable h
in our example would not be
HT<uint,str>
but rather HT<uint : Module1::hash, str>
. However,
the user does not (usually) have to write the implementations by hand;
they are supplied by the compiler based on what is in scope for the
given type.
Now, armed with this new idea, the fully specified form of the
function bar()
from Module2
would be:
fn bar(h: @ht::t<uint: Module2::hash, str>) { ... }
As a result, the compiler will specify a type error at the call
Module2::bar(h)
in Module1::foo()
, because the type of the
parameter h
in bar()
does not match the type of the value being
provided (the implementation modules do not match).
When the value of a type variable is written explicitly, it can be
written either ty:impl1,impl2
(explicit implementations for each
interface) or just ty
. If only a type is specified, the compiler
will look for imported implementations and insert the default value.
Note that when the value for a type argument is a type variable, as
with the variable K
here:
fn put<K:hash,V>(ht: @t<K,V>, k: K, v: V);
Defaults do not come into play here. Remember that the "value" of a
type variable is now the tuple ty:impl*
. So the type @t<K,V>
is
actually fully specified. (One caveat: we allow an implicit upcast
from ty:impl1, impl2
to ty:impl1
or ty:impl2
or just ty
. In
other words, you are allowed to use K
in a location where fewer
interfaces, or just a plain type, is required).
Now, you might argue that it is still weird that the imports in
Module2::bar()
are affecting what hash function is used for a
hashtable that is being passed in as a parameter, for which the hash
function has already (presumably) been specified when the hashtable
was created. There are pluses and minuses to this.
The plus is that this full specification would allow for a very high degree of optimization if we did monomorphizing: there would be no dynamic calls at all involved in using the hashtable. Also, so long as you use a consistent set of implementations across most of your methods, there is no annotation overhead since the detail implementations that the compiler supplies will all match.
However, it would still be nice to be able to write a function that
operates over any hashtable that uses uint
as its key type,
regardless of what hash function is in use. This is basically
polymorphism. Thankfully, we have an answer to that: interfaces. So,
we define an interface where the K
type variant is not necessarily
hashtable (after all, this interface could also be fulfilled by a
binary tree or some other kind of map):
iface map<K,V> {
fn put(k: K, v: V);
fn get(k: K) -> V;
}
And now we have to implement the interface for the type ht::t
:
impl<K:hash,V> map<K,V> for ht::t<K,V> {
fn put(k: K, v: V) { ret ht::put(self, k, v); }
fn get(k: K) { ret ht::get(self, k); }
}
This interface hides the precise hash implementation, encapsulating it
so to speak. Now the Module2::bar()
function can be rewritten:
fn bar(m: map<uint,str>) -> str {
ret m.get(3);
}
And the foo
function can be written:
fn foo() -> str {
let h = ht::create();
ht::put(ht, 3u, "hi");
ret Module2::bar(h as map);
}
There are some niggling implementation details to be worked out, as
the compiler will (in a dynamic setting) have to stash the hash
implementation pointer somewhere so it is available for use in the
put()
and get()
functions. In a monomorphic setting, customized
variants of put()
and get()
would be generated in any case.
I love the solution you proposed here. It is both simpler and more general than the current orphan rules. Was it ever considered for inclusion? Were there difficulties in implementing it?