Instantly share code, notes, and snippets.

@josevalim /maps_and_structs.md Secret
Last active Aug 29, 2015

Embed
What would you like to do?
Proposal for maps and structs

Proposal for maps and structs

This is a new proposal for maps. This is completely different from the previous proposals as we are not trying to integrate "Erlang maps" into Elixir but come up with Elixir's own definition of maps. The root of this proposal was two questions asked in the mailing list:

  1. What you would do if you were designing Elixir today? How would maps look like?
  2. Do we need to make maps and records look like?

This proposal provides different answers to those questions.

Why do we need records?

I have realized that, if I had maps when first designing Elixir, the maps implementation wouldn't change much than the ones being proposed so far. However, the records implementation would likely be very different!

For this reason, it is important to ask: why do we need records? Records bring:

  1. A way to organize data by fields
  2. Efficient in memory and opereations
  3. Compile-time errors
  4. In Elixir they provide "types" which are used on protocol dispatch

Maps address 1) and 2). Could we address 3) and 4) with maps too? Let's find out.

Maps

Before we explore how we can add compile-time errors to maps. Let's see how the basic record syntax is going to look like. Creating a map can be done as:

user = %{ :name => "josé", :age => 27 }

Maps will also allow the keywords shortcut syntax for atom keys:

user = %{ name: "josé", age: 27 }

Maps can be used in pattern match and they also provide a update syntax:

user = %{ user | name: "josé", age: 27 }

Updates the user record only if it has the fields :name and :age. Not having those fields will lead to an error. Furthermore, this syntax has been inspired by the records syntax in order functional languages.

This definitely addresses 1 and 2. For the reasons why Maps are efficient, I would recommend reading Joe's blog post.

Structs (annotated maps)

The question is: can we add compile-time guarantees (about field names) and also allow annotations to be used in protocols dispatch to maps? Yes, we can! Here is the proposed syntax:

defmodule User do
  def __struct__ do
    %{ name: "default", age: 0 }
  end
end

%User{} == %{ __struct__: User, name: "default", age: 0 }

By defining a __struct__ function in a module, it allows the following syntax to be used: %User{}. The map created will have all the default values returned by User.__struct__/0.

Now we can do compile-time checks. The following will fail during compilation time with unknown field foo:

%User{ foo: 1 }

We can also provide the same guarantees for updates by simply adding the annotation:

%User{ user | name: "josé", age: 27 }

The following will raise an unknown field error as well:

%User{ user | foo: 1 }

At runtime, the update operation will check user has a __struct__: User field and guarantee all fields being updated already exist. Furthermore, since some maps now have a __struct__ field, we can actually implement polymorphism via protocols too.

Polymorphic structs

Calling Enumerable.count/1 with a struct will now dispatch to the struct implementation. For example, Enumerable.count(%User{}) dispatches to Enumerable.User.count(%User{}).

This is similar to the runtime polymorphism we have for records but without the drawbacks. Since records are simply tuples, the only requirement for a polymorphic dispatch is that the first element of the tuple to be an atom. This triggers a lot of false positives which slows down protocol dispatching.

Since there are so many false positives, protocols need to have a rather complicate fallback mechanism. If you invoke Enumerable.count(User[]), it will first try Enumerable.User.count/1, then Enumerable.Tuple.count/1 and then Enumerable.Any.count/1. The rate of false positives and the complicated fallbacks is what makes the inspect implementation slow (unless you consolidate protocols).

For maps/structs, we can use the __struct__ key with an atom value as a clearer indication, avoiding false positives and improving the performance of some protocols. The fallback mechanism can be simpler too: Enumerable.count(%User[]) will invoke Enumerable.User.count/1 and then Enumerable.Any.count/1.

That said, structs not only bring the same benefits as records, but they also provide a much simpler, faster and better foundation for polymorphism!

defstruct

So far we have defined our struct as a simple function in a module:

defmodule User do
  def __struct__ do
    %{ name: "default", age: 0 }
  end
end

%User{} == %{ __struct__: User, name: "default", age: 0 }

This is great because it shows the simplicity in defining structs compared to records. Defining a record requires a whole module definition behind the scenes with many accessor functions. In particular, this aspect of records has always caused confusion and has been a topic of frequent discussions in the mailing list: should we define extra functions in a record or not? Structs completely put an end to this discussion. Structs are only data, there is no way to attach behaviour to them.

We also propose the introduction of the defstruct macro:

defmodule User do
  defstruct %{name: "default", age: 0}
end

The defstruct macro simply:

  1. defines the __struct__ function
  2. defines a type t
  3. allows deriving definitions

The deriving definitions is one of the features that are hard to implement for records, due to their rigidness, but easy to implement for structs. For example, a User struct does not implement Enumerable by default, since it dispatches to Enumerable.User. However, there already exists one implementation of Enumerable for maps, at Enumerable.Map, and we can simply use it in our structs too:

defmodule User do
  @derive [Enumerable, Access]
  defstruct %{name: "default", age: 0}
end

Note the @derive attribute is being used here as an example, its name and/or shape will likely change. Also, note that you can only define structs inside modules and only one per module.

Benchmarks

For those interested, very simple benchmarks comparing records and maps (i.e. with only atom keys) can be found here:

https://gist.github.com/josevalim/a2c71a6dfd93e2acb938

Results show that:

  1. Creating maps are twice slower than creating records
  2. Matching on maps is slightly slower than records
  3. Updating maps are slightly faster than records

For 2) and 3) the difference is mostly insignificant. Note thought that those benchmarks measure maps with compile-time constructs for records. In Elixir, most people access their records using the polymorphic user.name syntax. In such cases, maps are about 30% faster for access and 40% faster for updates!

Also this is the just the release candidate implementation of maps. More improvements are bound to come as the map implementation gets more mature.

Other considerations

Maps will be defined with their own special form %{} and they won't be valid literals in the AST. The update operator % seen in the previous proposal has been completely dropped and will be used for structs.

As in the previous proposal, the guards is_map/1 and map_size/1 will be added to Kernel while a whole new Map module will be added to work with Dict. ListDict will be deprecated and completely removed.

Elixir records on the way out

After maps and structs are introduced, records as seen today will be phased out. We will still have small conveniences for loading and working with Erlang records, but records backed by modules will be removed from the language.

Of course, this deprecation will take a long time. In particular, we will stick with our plan where short after Erlang R17 RC is released, a new Elixir v0.12.x release will be made. This release, and therefore the v0.12 branch will continue to be maintained for those already using Elixir production. Meanwhile work on the v0.13 series will start for those eager to play with maps.

No backwards incompatible change will be introduced in the v0.13 series, with the exception that support to previous Erlang versions will be dropped. The goal is to make a v0.13.0 release once R17B is out, allowing developers to jump from the latest version in v0.12 to v0.13.0 without any issues.

Once such migration is ready, we will slowly deprecate existing functionality. First starting with the obvious ones, like ListDict, and then slowly changing new code to be based on maps and structs. For example, Macro.Env will become a struct, and so on. Records will be effectively phased out just in a later stage after the ecossystem has migrated to structs.

Summary

This proposal introduces maps and structs. Structs allows us to bring the compile time and polymorphic dispatch features of records while addressing many of the downsides and complications brought by records today.

Records will be slowly phased out but they will still exist, with much less important than seen today, existing mostly for interacting with Erlang applications.

@orenbenkiki

This comment has been minimized.

Show comment
Hide comment
@orenbenkiki

orenbenkiki Feb 2, 2014

Intriguing... Some questions:

  • It seems as though Elixir is going in the direction of of "provide constructs that are easy to code with, even at the cost of decreased efficiency", sort of the same as Ruby did (in comparison with, say, Python). Is this an explicit decision?
  • Specifically, would this mean that Elixir would be giving up on heavy usage of the magical dispatch mechanism provided by the VM, or is there still a way to use it in the context of maps?
  • Does this mean that Elixir would only need non-polymorphic records (for interacting with Erlang code), or would it still require polymorphic records? It seems as though restricting records to non-polymorphic ones would greatly simplify them and would remove a lot of the questions around defrecord vs. defrecordp.

orenbenkiki commented Feb 2, 2014

Intriguing... Some questions:

  • It seems as though Elixir is going in the direction of of "provide constructs that are easy to code with, even at the cost of decreased efficiency", sort of the same as Ruby did (in comparison with, say, Python). Is this an explicit decision?
  • Specifically, would this mean that Elixir would be giving up on heavy usage of the magical dispatch mechanism provided by the VM, or is there still a way to use it in the context of maps?
  • Does this mean that Elixir would only need non-polymorphic records (for interacting with Erlang code), or would it still require polymorphic records? It seems as though restricting records to non-polymorphic ones would greatly simplify them and would remove a lot of the questions around defrecord vs. defrecordp.
@josevalim

This comment has been minimized.

Show comment
Hide comment
@josevalim

josevalim Feb 2, 2014

Hello @orenbenkiki!

It seems as though Elixir is going in the direction of of "provide constructs that are easy to code with, even at the cost of decreased efficiency", sort of the same as Ruby did (in comparison with, say, Python). Is this an explicit decision?

I wouldn't say so. Initialization is bound to get faster once the OTP team support map literals as they do with record. Matching is a bit slower, but update is a bit faster. However, polymorphic maps/structs are much faster than polymorphic records. The general case is that it will make code faster rather than slower.

Specifically, would this mean that Elixir would be giving up on heavy usage of the magical dispatch mechanism provided by the VM, or is there still a way to use it in the context of maps?

The dispatch mechanism will still be there... since it is provided by the VM.

Does this mean that Elixir would only need non-polymorphic records (for interacting with Erlang code), or would it still require polymorphic records? It seems as though restricting records to non-polymorphic ones would greatly simplify them and would remove a lot of the questions around defrecord vs. defrecordp.

Exactly, we will have only non-polymorphic records, for interacting with Erlang code. It will definitely remove all of the questions around defrecord and defrecordp as we will only have one type of records now.

Owner

josevalim commented Feb 2, 2014

Hello @orenbenkiki!

It seems as though Elixir is going in the direction of of "provide constructs that are easy to code with, even at the cost of decreased efficiency", sort of the same as Ruby did (in comparison with, say, Python). Is this an explicit decision?

I wouldn't say so. Initialization is bound to get faster once the OTP team support map literals as they do with record. Matching is a bit slower, but update is a bit faster. However, polymorphic maps/structs are much faster than polymorphic records. The general case is that it will make code faster rather than slower.

Specifically, would this mean that Elixir would be giving up on heavy usage of the magical dispatch mechanism provided by the VM, or is there still a way to use it in the context of maps?

The dispatch mechanism will still be there... since it is provided by the VM.

Does this mean that Elixir would only need non-polymorphic records (for interacting with Erlang code), or would it still require polymorphic records? It seems as though restricting records to non-polymorphic ones would greatly simplify them and would remove a lot of the questions around defrecord vs. defrecordp.

Exactly, we will have only non-polymorphic records, for interacting with Erlang code. It will definitely remove all of the questions around defrecord and defrecordp as we will only have one type of records now.

@ToJans

This comment has been minimized.

Show comment
Hide comment
@ToJans

ToJans Feb 4, 2014

I like it a lot; might I add the following suggestion - you can probably already guess it:

defmodule User, do: defstruct %{name: "default", age: 0}
defmodule Organisation, do: defstruct %{name: "default", vat_number:"unknown"}
defmodule Contact, do: defstruct [User, Organisation]

def print_name(%Contact{name: name}), do: IO.puts name

@match_all
def print_details(org=%Organisation{vat_number: vat_number}) do
   print_name(org)
   IO.puts(vat_number)
end

# throws a compilation error because there is no 
# `print_details(%Contact{})` or `print_details(%User{})`

ToJans commented Feb 4, 2014

I like it a lot; might I add the following suggestion - you can probably already guess it:

defmodule User, do: defstruct %{name: "default", age: 0}
defmodule Organisation, do: defstruct %{name: "default", vat_number:"unknown"}
defmodule Contact, do: defstruct [User, Organisation]

def print_name(%Contact{name: name}), do: IO.puts name

@match_all
def print_details(org=%Organisation{vat_number: vat_number}) do
   print_name(org)
   IO.puts(vat_number)
end

# throws a compilation error because there is no 
# `print_details(%Contact{})` or `print_details(%User{})`
@khia

This comment has been minimized.

Show comment
Hide comment
@khia

khia Feb 6, 2014

If private structs are implemented as proposed i.e.

defmodule User do
  defstructp %{name: "default", age: 0}
end

It means that there is no way to define multiple private structs within one module due to the fact that even private structure requires module. Would it be better to do something like this:

defmodule MyModule do
  defstructp :user, %{name: "default", age: 0}
  defstructp :role, %{name: "default", permissions: nil}
  def check do
     # Use user and role structures here
  end
end

defstruct macro would expand it into

def __struct__(:user) do
    %{ name: "default", age: 0 }
end
def __struct__(:role) do
    %{ name: "default", permissions: nil }
end

khia commented Feb 6, 2014

If private structs are implemented as proposed i.e.

defmodule User do
  defstructp %{name: "default", age: 0}
end

It means that there is no way to define multiple private structs within one module due to the fact that even private structure requires module. Would it be better to do something like this:

defmodule MyModule do
  defstructp :user, %{name: "default", age: 0}
  defstructp :role, %{name: "default", permissions: nil}
  def check do
     # Use user and role structures here
  end
end

defstruct macro would expand it into

def __struct__(:user) do
    %{ name: "default", age: 0 }
end
def __struct__(:role) do
    %{ name: "default", permissions: nil }
end
@khia

This comment has been minimized.

Show comment
Hide comment
@khia

khia Feb 6, 2014

To obtain list of structures defined in a module we could use

def __struct__ do
   [user: :private, role: :private]
end

khia commented Feb 6, 2014

To obtain list of structures defined in a module we could use

def __struct__ do
   [user: :private, role: :private]
end
@dch

This comment has been minimized.

Show comment
Hide comment
@dch

dch Feb 28, 2014

This is a great writeup, and I'd love to see it on the main elixir website. I'm especially concerned about interop, as I use a lot of erlang modules already, and whether in their gen_server state loops, or elsewhere in the code, records are used heavily. The interop story is really a important part of Elixir wrt to module reuse, and I hope we can make this clear for people.

dch commented Feb 28, 2014

This is a great writeup, and I'd love to see it on the main elixir website. I'm especially concerned about interop, as I use a lot of erlang modules already, and whether in their gen_server state loops, or elsewhere in the code, records are used heavily. The interop story is really a important part of Elixir wrt to module reuse, and I hope we can make this clear for people.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment