Skip to content

Instantly share code, notes, and snippets.

@laser
Last active July 26, 2018 22:40
Show Gist options
  • Save laser/e75ed34363debd3cff9509f37aa9a191 to your computer and use it in GitHub Desktop.
Save laser/e75ed34363debd3cff9509f37aa9a191 to your computer and use it in GitHub Desktop.
Erlang/OTP Review - First Meeting

Housekeeping

General rules for making a distributed book club like this not-awful:

  • keep your mic muted unless you're speaking
  • raise your hand if you have something to say and wait for someone to call on you
  • if you're raising your hand and nobody calls on you, type a message into the chat

Chapter 1

Why Erlang?

Erlang/OTP is unique among programming languages and frameworks in the breadth, depth, and consistency of the features it provides for scalable, fault-tolerant systems with requirements for high availability.

What do these terms mean?

Scalable refers to how well a computing system can adapt to changes in load or available resources.

Distributed refers to how systems are clustered together and interact with each other.

Systems that are fault tolerant continue to operate predictably when things in their environment are failing.

By soft real-time, we mean the predictability of response and latency, handling a constant throughput, and guaranteeing a response within an acceptable time frame.

High availability minimizes or completely eliminates downtime as a result of bugs, outages, upgrades, or other operational activities.

What's OTP?

OTP is short for Open Telecom Platform

OPT Components

  1. Erlang: Includes the semantics of the language and its underlying Virtual Machine
  2. Tools and libs: applications (erts, kernel, stdlib, dialyzer, debugger, etc.), FFI (ei, erl_interface, jinterface), snmp, mnesia (anybody use this?), eunit
  3. Design principles: behaviors (e.g. worker and supervisor processes) which are seen as a formalizaton of concurrent design patterns

Distribution, Infrastructure, Multicore

In Erlang, processes communicate via asynchronous message passing. This works even if a process is on a remote node because the Erlang virtual machine supports passing messages from one node to another.

When one node joins another, it also becomes aware of any nodes already known to the other. In this manner, all the nodes in a cluster form a mesh, enabling any process to send a message to another process on any other node in the cluster.

Each node in the cluster also automatically tracks liveness of other nodes in order to become aware of nonresponsive nodes.

Implication: Programming for a clustered environment should feel identical to programming for a non-clustered environment.

Chapter 2

Erlang

Language features:

  • anonymous functions
  • tail-call optimization
  • structural pattern matching
  • tuples, maps, lists
  • list comprehensions
  • macros

Processes and Message Passing

  • processes spawn other processes and send messages to them
  • messages are stored in a process's mailbox
  • processes do not share memory (messages are copied from stack of sending process to heap of receiver)

Selective Receive

This function will build a list of all messages with those with a priority above 10 coming first:

important() ->
  receive
    {Priority, Message} when Priority > 10 ->
      [Message | important()]
  after 0 ->
    normal()
  end.
 
normal() ->
  receive
    {_, Message} ->
      [Message | normal()]
  after 0 ->
    []
  end.
1> c(multiproc).
{ok,multiproc}
2> self() ! {15, high}, self() ! {7, low}, self() ! {1, low}, self() ! {17, high}.       
{17,high}
3> multiproc:important().
[high,high,low,low]

When messages are sent to a process, they're stored in the mailbox until the process reads them and they match a pattern there. As said in the previous chapter, the messages are stored in the order they were received.

When there is no way to match a given message, it is put in a save queue and the next message is tried. If the second message matches, the first message is put back on top of the mailbox to be retried later.

This lets you only care about the messages that are useful. Ignoring some messages to handle them later in the manner described above is the essence of selective receives. While they're useful, the problem with them is that if your process has a lot of messages you never care about, reading useful messages will actually take longer and longer (and the processes will grow in size too).

Scheduler

  • for ever core, BEAM starts a thread which runs a scheduler
  • each scheduler is responsible for a group of processes
  • at any one time, a process from each scheduler executes in parallel on each core
  • schedulers try to balance CPU time across all processes ("What the BEAM virtual machine tries to do is avoid cases where processes in a run queue with 10 processes get twice as much CPU time as those in a run queue with 20 processes.")
  • processes are migrated between run queues (cores)
  • schedulers can preempt processes based on workload they've executed (reduction count - kinda works like Ethereum "gas")

Supervision

  • links (bi-directional)
    • when process X dies, process Y will also die (if linked)
  • monitors (one direction)
    • used for resource management, e.g. distributed lock

Deploying

  • can deploy new software w/out taking the node offline
  • can upgrade a module by compiling it in the shell or explicitly loading it
  • at any one time, two versions of a module may exist in the VM: old and current

Distributed Erlang

  • processes can transparently spawn processes on other nodes, by name of node
  • nodes in cluster can be distributed on same host or on different ones
  • message passing model is the same regardless of clustered/not
-module(tut17).

-export([start_ping/1, start_pong/0,  ping/2, pong/0]).

ping(0, Pong_Node) ->
    {pong, Pong_Node} ! finished,
    io:format("ping finished~n", []);

ping(N, Pong_Node) ->
    {pong, Pong_Node} ! {ping, self()},
    receive
        pong ->
            io:format("Ping received pong~n", [])
    end,
    ping(N - 1, Pong_Node).

pong() ->
    receive
        finished ->
            io:format("Pong finished~n", []);
        {ping, Ping_PID} ->
            io:format("Pong received ping~n", []),
            Ping_PID ! pong,
            pong()
    end.

start_pong() ->
    register(pong, spawn(tut17, pong, [])).

start_ping(Pong_Node) ->
    spawn(tut17, ping, [3, Pong_Node]).
(pong@gollum)1> tut17:start_pong().
true
(ping@kosken)1> tut17:start_ping(pong@gollum).
<0.37.0>
Ping received pong
Ping received pong 
Ping received pong
ping finished

Discussion

Topic A

At any one time, two versions of a module may exist in the virtual machine: the old and current versions.

  1. What are some of the considerations of running two modules of different versions in a cluster at the same time?

Topic B

Out of the box, Erlang distribution is not designed to support systems operating across potentially hostile environments such as the Internet or shared cloud instances.

Does anyone know why this is the case?

Topic C

For decades, the computing industry has explored how programming languages can support distribution. Designing general-purpose languages is difficult enough; designing them to support distribution significantly adds to that difficulty. Because of this, a common approach is to add distribution support to nondistributed programming languages through optional libraries. This approach has the benefit of allowing distribution support to evolve separately from the language itself, but it often suffers from an impedance mismatch with the language, feeling to developers as if it were “bolted on.”

  1. What are some examples of tools in other languages (addressing distributed systems) which feel "bolted on?"

Topic D

[...] consider that Erlang clusters do not require master or leader nodes, which means that using them for peer-to-peer systems of replicas works well

  1. What are some of the benefits of a leaderless cluster?
  2. What are some of the benefits of a hierarchical design?

Topic E

Each process is allowed to execute a predefined number of reductions before being preempted, allowing the process at the head of the run queue to execute. The number of reductions each process is allowed to execute before being suspended and the reduction count of each instruction are purposely not documented to discourage premature optimization, because the reduction count and the total number of reductions the scheduler allows a process to execute may change from one release and hardware architecture to another.

  1. Is this good?

Topic F

You need to think hard about your requirements and properties, making certain you pick the right libraries and design patterns that ensure the final system behaves the way you want it to and does what you originally intended. In your quest, you will have to make tradeoffs that are mutually dependent — tradeoffs on time, resources, and features and tradeoffs on availability, scalability, and reliability. No ready-made library can help you if you do not know what you want to get out of your system.

  1. What are some of the tradeoff decisions on availability, scalability, and reliability that you've had to make on client projects?
@eingenito
Copy link

eingenito commented Jul 26, 2018

Also Ingar pointed out an interesting sentence in Chapter 1 which he thinks is revealing of both how Erlang/OTP approaches reliability and what problems it is appropriate for:

And because adding new nodes to a cluster is easy - all it takes is to have that node contact just one other node to join the mesh - horizontal scaling is also well within easy reach. This, in turn, allows you to focus on the real challenge when dealing with distributed systems: namely, distributing your data and state across hosts and networks that are unreliable.

@laser
Copy link
Author

laser commented Jul 26, 2018

We Wanna Learn...

  1. How does hot code reloading work? Sounds like we can deploy a new version of a module and... variables' values persist.
  2. How does Docker intersect with cross-node message passing?
  3. Do Erlang web apps embrace in-memory state in a way that other stacks do not?
  4. How are we supposed to use the Erlang "special sauce" that we hear so much about?
  5. If you don't need real time, do you really need Erlang?
  6. Why would we switch from Ruby to Elixir/Erlang?
  7. What are the use cases for distributed Erlang (and could we conceivably use it in our work)?
  8. Is there any infrastructure available to support us in our efforts to deploy a distributed Erlang cluster?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment