Skip to content

Instantly share code, notes, and snippets.

@itsderek23
Last active December 7, 2016 17:43
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save itsderek23/cc2a40ef69b660713b401348464e9a25 to your computer and use it in GitHub Desktop.
Save itsderek23/cc2a40ef69b660713b401348464e9a25 to your computer and use it in GitHub Desktop.
Elixir foundations for Ruby Devs: transforming data

Elixir foundations for Ruby Devs: transforming data

Have you ever reached for a drink of water, then realized, half-way through your first sip, it was Spite? I like Sprite - but that first sip of a similar-looking, but very different liquid is a shock.

That's how Elixir can feel if you are coming from Ruby. The syntax looks similar, but Elixir is different from the first sip. For a smooth transition, you need to (a) learn some functional programming patterns and (b) unlearn some Object-Orientated habits.

A core design pattern of Elixir is the focus on data transformations: you'll see it in libaries like Ecto.Changeset, Ecto.Multi, Plug.Conn and built-ins like Enum. Let's dive in.

Data Transformations

The data transformation pattern uses one data structure (DS) that is your single source of truth and many small functions operating on it. It makes your programs:

  1. Easy to compose (which translates to "easy to write")
  2. Easy to extend (customize)
  3. And easy to test.

Almost every time it looks exactly the same:

  1. Choose a data structure
  2. Write many functions that take the data structure as first argument and return that data structure
  3. Make sure most of those functions don't have side effects.

The examples will start simple. Lets start with lists and Enum protocol.

Simple data transformers with Enum

You'll often see code like this:

sorted_events =
  list
  |> Enum.filter(&Integer.is_even/1)
  |> Enum.sort()

You can chain many operations from the Enum module using the pipe operator |>. This is possible, because all functions like filter or sort take a list as a first argument and return a different modified list.

If you need any other operations that you would like to compose in the same way, just write a function that takes a list as a first argument and returns a new list. Lets encapsulate those three lines in its own function:

def sorted_events(list) do
  list
  |> Enum.filter(&Integer.is_even/1)
  |> Enum.sort()
end

Without any modifications we captured the logic and put it into a function that we can later us in chain like this:

def two_smallest_events(list) do
  list
  |> sorted_events()
  |> Enum.take(2)
end

This may look obvious, but more complicated libraries use the same pattern.

The three observations are:

  1. Transformations on lists are easy to compose, because a set of transformations is also a transformation (composability).
  2. You can write functions that do crazy things as long as they return list at the end (extensibility).
  3. As long as your functions don't introduce side effects testing is easy.

Ecto Transformations

Lets look at the Ecto library and how it solves data validation. Validators need to check if changes to given data are valid and if not - indicate why. Lets call our single data structure a "changeset". Our validators should be small functions that take a changeset as a first argument and return a changeset.

What do we need to store?

  • Original data (for example, what is currently saved in the database)
  • A set of modifications (params for short)
  • Computed changes, because params and original data can be the same
  • An indication if a changeset is valid
  • A list of errors if a changeset isn't valid

There are fields in our DS that we can treat as input like data and params, there are fields that are intermediate like changes, and there are fields that are clearly output: valid? and errors. We combine all this sauce into a single DS and now we can use it like this:

user
|> cast(params, [:name, :email, :age])
|> validate_required([:name, :email])
|> validate_format(:email, ~r/@/)

We take the original data - for example, a user in database - and tell Ecto what params we want to change and list all the fields that may require validation. The cast function returns a changeset and from that point on, all functions in the chain take a changeset as a first argument and return a changeset.

Writing your own Ecto validator

Ecto has a nice set of ready-to-use validators that you can easily compose, but what if you wanted to write your own? Something non-standard like making sure that an event in database doesn't finish before it started. We just need to write a function that takes a changeset and returns a changeset like this:

def validate_interval(changeset, start, end) do
  start_date = get_field(changeset, start)
  end_date = get_field(changeset, end)
  case Date.compare(start_date, end_date) do
    :gt -> add_error(changeset, start, "…")
    _otherwise -> changeset
  end
end

We extract the starting date and ending date from the changeset, compare them, and if everything is OK we return an unmodified changeset. If there were errors from previous validations, we don't care: we just pass them on through the pipe chain. If start_date is greater than end_date, we set the valid? indicator to false and prepend the error to the list of errors.

Validators can also be composed the same as lists. Lets say we have a set of validators that are always used together. For example, we would like to create an address validator from street and zipcode validators:

def validate_address(cs, street, zip) do
  cs
  |> validate_street(street)
  |> validate_zipcode(zip)
end

A set of validators applied one by one is also a validator, so validators are easy to compose. They are also easy to test, because in the end, you pass a DS and check fields in a DS in your tests:

Ecto.Changeset is also used when calling the database:

case Repo.insert(changeset) do
  {:error, changeset} →...
  {:ok, model} →...
end

Database constraints are converted to changeset errors. This follows the principle of separating pure and impure parts of your program.

Ecto.Multi

A third example of the same "single data structure" pattern is Ecto.Multi. Multi stores database queries that can be later fired in one transaction:

Multi.new
|> Multi.update(:account, a_changeset))
|> Multi.insert(:log, log_changeset))
|> Multi.delete_all(:sessions,
    assoc(account, :sessions))

multi |> Repo.transaction

It is different from a changeset, because the main DS is opaque. The internals may change and you can't use them directly. It is similar to a changeset, because all operations take Multi as a first argument and return Multi, which makes them easy to extend and compose in the same way as validators. This also means that you can end up with pretty big Multis with branching logic and nested operations that you would likely test to ensure everything works as expected.

Instead of making actual queries to the database, Multi gives you a to_list function which lists all the operations that ended up in Multi. This way, you can test your application logic using pure data structures instead of hitting the database. This makes tests easier and faster.

Phoenix

The fourth example is the king of them all. The essence of Phoenix Framework. The almighty Plug. Lets apply the same principles that were shown above, this time to web servers. We need a single DS that holds all information about web request. It is going to contain all things that come with a request:

  • host
  • method
  • req_headers
  • path_info

All things that we need to return at the end:

  • resp_body
  • status

And some intermediate things that might come in handy during request lifecycle like:

  • params

Params are intermediate, because they come either in GET URL or in POST body and need to be normalized to an Elixir map first. Lets call this DS a Conn. When a request arrives to web server, it is immediately translated to Conn. It is similar to how cast works in Changeset. After that there are many small functions called plugs that (you guessed it) take Conn as a first argument and return Conn.

A set of plugs chained together is called a pipeline. Entire Phoenix framework is a pipeline like this:

Conn |> Enpoint |> UserPipelines |> Router |> Controller

A pipeline is also a plug so the entire Phoenix Framework is just a plug. The beauty comes with its extensibility. You can put your own custom plugs in almost any place in the request lifecycle. It allows library creators to add new functionality to Phoenix by simply writing couple of functions with instruction where developer needs to... you know... plug them.

It is even better if you can keep your plugs pure. Lets say in your plug you want to add something from database to Conn.assigns. You can do it like this:

def my_plug(conn) do
  user_id = get_session(conn, :user_id)
  user = Repo.get(User, user_id)
  assign(conn, :current_user, user)
end

...and this would be hard to test, because it needs to call the database each time you call it. There is a simple workaround for that. Pass impure things as an argument!

def my_plug(conn, repo \\ Repo) do
  user_id = get_session(conn, :user_id)
  user = repo.get(User, user_id)
  assign(conn, :current_user, user)
end

We pass the module name as the second argument with the default value of Repo. This will ensure, that in case we call this plug with a single argument, it behaves in exactly the same way as the one above. We can use the second argument in our tests like this:

defmodule Fakerepo do
  def get(1) do
    %User{name: „Tomasz”, …}
  end
end

my_plug(conn, Fakerepo)

We made the plug testable by making all its "contracts" with outside world explicit. This nicely separates pure and impure parts of the plug.

Summary

We can see the same data transformation pattern repeated many times through different libraries. It is convenient to use when language offers easy chaining with pipe operator or something similar. As Alan Perils wrote in Epigrams on Programming:

It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures.

This is reflected in the Unix philosophy where everything is a stream of lines and you build larger programs by composing programs with the Unix pipe |. It is also reflected in many Elixir libraries which use single DS and compose programs using Elixir pipe operator |>.

@itsderek23
Copy link
Author

Good stuff - solid technical content. Some thoughts (note my Elixir terms may not be correct):

Summary

The specific Elixir design pattern (vs. design patterns, plural) referenced is method chaining. This supports code-reuse.

Intro

I think we should focus these from the perspective of a Ruby dev curious about Elixir rather than covering every possible starting point for a reader. This is our current market and a perspective we know well.

There's some proofreading required here.

Questions/Thoughts

  • What is "pure" vs. "impure"? Just haven't come across it much in Ruby land.
  • Method chaining reminds me a lot of jQuery operations
  • Needs some section headers: Simple example, Ecto Data Validations, Phoenix Plugs
  • Are impure things kind of like Duck-typing in Ruby? The FakeRepo reference.
  • I'd sprinkle some help for Ruby devs to better understand things...assume the reader is a Ruby dev but doesn't have Elixir experience.

The technical bits are solid, needs some work around the words.

@dlanderson
Copy link

Agree with all Derek's points.

For me: This is a big technical post, and design patterns doesn't hook me. I tried reading this a few times but lost interest (needs more spice! :) I think the perspective of a ruby dev might go a long way

@tomekowal
Copy link

Hey, I like the changes, especially the introduction with Sprite and splitting into sections.
Two things that need changing:

  • There is "r" missing in first "Sprite", so it is just "Spite"
  • in list example code fragments I used word evens as a shortcut for even integers and it got changed to events which doesn't make much sense, we could just use longer names like sorted_even_numbers and two_smallest_even_numbers

And two other suggestions:

  • good point about Ruby developers not knowing what a pure function is, maybe we could make the word "pure" into a link, for example to Wikipedia article https://en.wikipedia.org/wiki/Pure_function The point is that pure functions output depends only on its arguments (no external state from global variables or IO operations), so it is both testable (no mocking) and composable.
  • method chaining and JQuery operations are very similar, because they are both examples of monads https://importantshock.wordpress.com/2009/01/18/jquery-is-a-monad/ however I wanted to avoid this word, because it makes people scared :P We could add a sentence where we say that it is really similar, but in Elixir it is easier to test, because of purity mentioned above

@jsturg
Copy link

jsturg commented Dec 7, 2016

@itsderek23

Forked with edits here: https://gist.github.com/jsturg/1d37c33bb3c51407f0a76a60fde52c43

Edit left to consider:

method chaining and JQuery operations are very similar, because they are both examples of monads https://importantshock.wordpress.com/2009/01/18/jquery-is-a-monad/ however I wanted to avoid this word, because it makes people scared :P We could add a sentence where we say that it is really similar, but in Elixir it is easier to test, because of purity mentioned above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment