josevalim/maps_and_structs.md Secret

## maps_and_structs.md

      
    Raw
  

              maps_and_structs.md
            
          
    Proposal for maps and structs

This is a new proposal for maps. This is completely different from the previous proposals as we are not trying to integrate "Erlang maps" into Elixir but come up with Elixir's own definition of maps. The root of this proposal was two questions asked in the mailing list:

What you would do if you were designing Elixir today? How would maps look like?
Do we need to make maps and records look like?

This proposal provides different answers to those questions.
Why do we need records?

I have realized that, if I had maps when first designing Elixir, the maps implementation wouldn't change much than the ones being proposed so far. However, the records implementation would likely be very different!
For this reason, it is important to ask: why do we need records? Records bring:

A way to organize data by fields
Efficient in memory and opereations
Compile-time errors
In Elixir they provide "types" which are used on protocol dispatch

Maps address 1) and 2). Could we address 3) and 4) with maps too? Let's find out.
Maps

Before we explore how we can add compile-time errors to maps. Let's see how the basic record syntax is going to look like. Creating a map can be done as:
user = %{ :name => "josé", :age => 27 }
Maps will also allow the keywords shortcut syntax for atom keys:
user = %{ name: "josé", age: 27 }
Maps can be used in pattern match and they also provide a update syntax:
user = %{ user | name: "josé", age: 27 }
Updates the user record only if it has the fields :name and :age. Not having those fields will lead to an error. Furthermore, this syntax has been inspired by the records syntax in order functional languages.
This definitely addresses 1 and 2. For the reasons why Maps are efficient, I would recommend reading Joe's blog post.
Structs (annotated maps)

The question is: can we add compile-time guarantees (about field names) and also allow annotations to be used in protocols dispatch to maps? Yes, we can! Here is the proposed syntax:
defmodule User do
  def __struct__ do
    %{ name: "default", age: 0 }
  end
end

%User{} == %{ __struct__: User, name: "default", age: 0 }
By defining a __struct__ function in a module, it allows the following syntax to be used: %User{}. The map created will have all the default values returned by User.__struct__/0.
Now we can do compile-time checks. The following will fail during compilation time with unknown field foo:
%User{ foo: 1 }
We can also provide the same guarantees for updates by simply adding the annotation:
%User{ user | name: "josé", age: 27 }
The following will raise an unknown field error as well:
%User{ user | foo: 1 }
At runtime, the update operation will check user has a __struct__: User field and guarantee all fields being updated already exist. Furthermore, since some maps now have a __struct__ field, we can actually implement polymorphism via protocols too.
Polymorphic structs

Calling Enumerable.count/1 with a struct will now dispatch to the struct implementation. For example, Enumerable.count(%User{}) dispatches to Enumerable.User.count(%User{}).
This is similar to the runtime polymorphism we have for records but without the drawbacks. Since records are simply tuples, the only requirement for a polymorphic dispatch is that the first element of the tuple to be an atom. This triggers a lot of false positives which slows down protocol dispatching.
Since there are so many false positives, protocols need to have a rather complicate fallback mechanism. If you invoke Enumerable.count(User[]), it will first try Enumerable.User.count/1, then Enumerable.Tuple.count/1 and then Enumerable.Any.count/1. The rate of false positives and the complicated fallbacks is what makes the inspect implementation slow (unless you consolidate protocols).
For maps/structs, we can use the __struct__ key with an atom value as a clearer indication, avoiding false positives and improving the performance of some protocols. The fallback mechanism can be simpler too: Enumerable.count(%User[]) will invoke Enumerable.User.count/1 and then Enumerable.Any.count/1.
That said, structs not only bring the same benefits as records, but they also provide a much simpler, faster and better foundation for polymorphism!
defstruct

So far we have defined our struct as a simple function in a module:
defmodule User do
  def __struct__ do
    %{ name: "default", age: 0 }
  end
end

%User{} == %{ __struct__: User, name: "default", age: 0 }
This is great because it shows the simplicity in defining structs compared to records. Defining a record requires a whole module definition behind the scenes with many accessor functions. In particular, this aspect of records has always caused confusion and has been a topic of frequent discussions in the mailing list: should we define extra functions in a record or not? Structs completely put an end to this discussion. Structs are only data, there is no way to attach behaviour to them.
We also propose the introduction of the defstruct macro:
defmodule User do
  defstruct %{name: "default", age: 0}
end
The defstruct macro simply:

defines the __struct__ function
defines a type t
allows deriving definitions

The deriving definitions is one of the features that are hard to implement for records, due to their rigidness, but easy to implement for structs. For example, a User struct does not implement Enumerable by default, since it dispatches to Enumerable.User. However, there already exists one implementation of Enumerable for maps, at Enumerable.Map, and we can simply use it in our structs too:
defmodule User do
  @derive [Enumerable, Access]
  defstruct %{name: "default", age: 0}
end
Note the @derive attribute is being used here as an example, its name and/or shape will likely change. Also, note that you can only define structs inside modules and only one per module.
Benchmarks

For those interested, very simple benchmarks comparing records and maps (i.e. with only atom keys) can be found here:
https://gist.github.com/josevalim/a2c71a6dfd93e2acb938
Results show that:

Creating maps are twice slower than creating records
Matching on maps is slightly slower than records
Updating maps are slightly faster than records

For 2) and 3) the difference is mostly insignificant. Note thought that those benchmarks measure maps with compile-time constructs for records. In Elixir, most people access their records using the polymorphic user.name syntax. In such cases, maps are about 30% faster for access and 40% faster for updates!
Also this is the just the release candidate implementation of maps. More improvements are bound to come as the map implementation gets more mature.
Other considerations

Maps will be defined with their own special form %{} and they won't be valid literals in the AST. The update operator % seen in the previous proposal has been completely dropped and will be used for structs.
As in the previous proposal, the guards is_map/1 and map_size/1 will be added to Kernel while a whole new Map module will be added to work with Dict. ListDict will be deprecated and completely removed.
Elixir records on the way out

After maps and structs are introduced, records as seen today will be phased out. We will still have small conveniences for loading and working with Erlang records, but records backed by modules will be removed from the language.
Of course, this deprecation will take a long time. In particular, we will stick with our plan where short after Erlang R17 RC is released, a new Elixir v0.12.x release will be made. This release, and therefore the v0.12 branch will continue to be maintained for those already using Elixir production. Meanwhile work on the v0.13 series will start for those eager to play with maps.
No backwards incompatible change will be introduced in the v0.13 series, with the exception that support to previous Erlang versions will be dropped. The goal is to make a v0.13.0 release once R17B is out, allowing developers to jump from the latest version in v0.12 to v0.13.0 without any issues.
Once such migration is ready, we will slowly deprecate existing functionality. First starting with the obvious ones, like ListDict, and then slowly changing new code to be based on maps and structs. For example, Macro.Env will become a struct, and so on. Records will be effectively phased out just in a later stage after the ecossystem has migrated to structs.
Summary

This proposal introduces maps and structs. Structs allows us to bring the compile time and polymorphic dispatch features of records while addressing many of the downsides and complications brought by records today.
Records will be slowly phased out but they will still exist, with much less important than seen today, existing mostly for interacting with Erlang applications.