Skip to content

Instantly share code, notes, and snippets.

@nakshay
Forked from Kimundi/java_rust_generic.md
Created March 6, 2020 03:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nakshay/93148ee8079658bf7d74294a547f822a to your computer and use it in GitHub Desktop.
Save nakshay/93148ee8079658bf7d74294a547f822a to your computer and use it in GitHub Desktop.
A light comparison between Rust and Java generics and type system features.

Introduction

If you are familiar with Java's generics, and are coming to Rust, you might be lead to assume that its generics are working the same way.

However, due to the different type systems, and different implementation details, there are quite a few differences between generic code in both languages.

This document tries to give a short summary about those differences:

Core functionality

Java

In Java, generics are implemented using type erasure - that means that if you have code like this:

<T> void generic(T t) { ... }

void main() {
    generic<Integer>(5);
    generic<String>("...");
}

Then what happens is that the compiler checks the types of the generic instantiations at compile time, and then forgets about the specific types, using Object for the compiled bytecode. In other words, at runtime the code just looks like this:

void generic(Object t) { ... }

void main() {
    generic((Object) 5);
    generic((Object) "...");
}

This was done for backwards compability (code using generics compiles to the same as old code using Object), but introduces performance overhead, because the compiler for example has to introduce typecasts from and to Object in the compiled code, and because every unique instantiation of a generic methods goes through the Object-using code, allowing not much room for type-individual optimizations.

Rust

Rust on the other hand uses monomorphization for generic code. It means that if you have code like this:

fn generic<T>(t: T) { ... }

void main() {
    generic::<int>(5);
    generic::<~str>(~"...");
}

Then what happens is that the compiler checks at compile time that all types match, and then emits a specialized variant of the function for each type it got instanciated with. Which means, in the resulting binary the code looks more like this:

fn generic_int(t: int) { ... }
fn generic_owned_str(t: ~str) { ... }

void main() {
    generic_int(5);
    generic_owned_str(~"...");
}

This has the advantage that the compiler can then optimize each specialized function independently for optimal performance, but has the disadvantage of more code being emitted. However, usually the speed advantage way outweighs the size disadvantage of the binary, so that's not really bad.

Optimization differences

Java

Java's type system is build on Objects and inheritance - you pass around references to instances of classes, can do casting between sub and superclasses, etc.

On a low level, this flexibility is gained by making all object references the same size, which is done by making them only pointers to the content of the individual class instances. Explaining the details of that is beyond the scope of this document, but for example this site gives a bit more information: http://www.programcreek.com/2011/11/what-do-java-objects-look-like-in-memory/

Now, what matters for a useful comparison to Rust is the fact that accessing a field or calling a method goes through pointers at all: It means the CPU has to load data or code from a arbitrary address in memory, which can be slow. Either because it might not be in the CPU cache, or because in the case of calling a method the compiler can't do compile time optimizations, as the actual code to call is only known at run time (virtual call). In other words, all usage of Java objects, inclusive generics, usually involves a lot of slow memory accesses.

This is a simplified view though - the Java compiler actually does a lot of runtime optimization and dynamic recompilation to transform the code to faster one on the fly - however it's usually not clear to programmer when such a optimization applies and when not.

Rust

Now, in Rust values can be unboxed. This basically means that unlike a object reference in Java, there is no uniform pointer to some memory involved - the data is just right there in the local variable.

This of course means that unlike in Java, local variables can be all different sizes, depending on the type stored in it. It also means that you have to emitt different versions of functions for each generic type instantiation - you can't tread them all the same way like Java does, because different types have different sizes.

This has an immense advantage though - because there is a specific version of a function for each type, all methods and functions you apply to those types are already exactly known at compiletime - which means the compiler can emitt optimal code for calling them, and doesn't have to look them up at runtime.

Bounds on generics

In both Rust and Java, you have the ability to restrict generic types to a certain subset of types, in order to be able to call specific methods on them.

Java

In Java, you can bound a type parameter to only accept instances of a class or interface (or their subclasses) like this:

<T extends MyInterfaceOrClass> void generic(T t) { ... }

By restricting the type parameter in that way, you are now allowed to call all methods of MyClass on T, because no matter what type ends up in it, it will implement the necessary methods.

This is again possible by having a uniform representation of T as a Object at runtime, and casting it around.

Rust

In Rust, restricting a generic type works similar. However, Rust doesn't have objects - it only has Traits, which can basically be though of like an interface in Java:

fn generic<T: MyTrait>(t: T) { ... }

This function would only accept types that implement the trait Mytrait, similar to the Java function above.

Now, the difference to Java's interfaces is that you define a type and the implementation of a trait seperately - which means that your type doesn't care if there are zero or a hundredth different traits implemented for it, as it's none of his direct concern.

Because of this freedom, any type could theoretically implement any trait without any conflict, as every trait is basically just a list of functions. This also means that Rust generics are free to demand arbitrary trait bounds for their generic paramters:

fn generic<T: MyTrait + MyOtherTrait + SomeStandardTrait>(t: T) { ... }

This generic function takes a T by value, and can call any method defined in MyTrait, MyOtherTrait or SomeStandardTrait on it. And because T is a unboxed value, and Rust generates an individual version of the function for each specific T, all those methods are exactly know at compiletime and can be optimized for each individual type.

Incidentally, Rust also supports casting a specific type into a trait object, which behaves similarly to an interface reference in Java. In trait objects, methods are looked up at run time, and different specific types can be treated uniformly - so you can actually choose to some extend which style of generics to use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment