caioaao/blog.md Secret

## blog.md

      
    Raw
  

              blog.md
            
          
    Demystifying functional programming (in a real company)

Over the years, we've all heard some skepticism around using functional programming on a real life project. In our view, most of this skepticism stems from the perception that functional programming is inaccessible, overly academic, or not terribly useful.
Some examples we've heard:

"It's too hard - I barely made through the course when I studied it";
"It's a cool theory, but has no practical applications and it's not suited for a commercial project";
"How am I supposed to avoid side effects on real world? All we do needs side effects to some extent";
"We can't hire people with these skills at my company";
"WHAT? I'm supposed to use monads on my day job?"

Each statement contains a grain of truth, but Nubank uses functional programming in production at scale, and we believe that this paradigm is easier to learn and more useful than is commonly understood.
Many of these concerns arise from the fact that functional programming has a really broad definition. The key here is to use the parts that make your life easier. In this post, we attempt to lay out a few of the points that make our lives easier when building highly available and scalable distributed systems at Nubank.
Immutable data structures

This is one killer (and often misunderstood) feature of modern functional programming  languages. But how are we supposed to update an attribute when all we have is an  immutable "thing"?  Do we have to make a new copy every time? Isn't that  prohibitively expensive?
The first thing to remember is: with immutable data structures, copy is cheap. You're never actually duplicating data because you don't have to.  All you have to do is point to the same object and voilá. Let's see a comparison between a snippet for updating a map in Python and in Clojure:
https://gist.github.com/413aed32902a9ac1e1c58819b6a8e976
The snippet above will print {'foo': 'baz', 'something': {'sub': 123}}. Notice the initial value is lost. Now with Clojure:
https://gist.github.com/77b45b404bde8af954c85b29e4f07f88
The Clojure snippet prints {:foo bar, :something {:sub 123}} followed by {:foo baz, :something {:sub 123}}. We have both the new and the old values.
assoc, pronounced as in "associate", is a function that will add or update the value of the specified key in a "collection"  (in this case, a map/dictionary or vector/array). If the key already exists on a map,  it will be replaced, and if it doesn't exist, it will be added. But here's where the magic happens: you're not actually changing our-immutable-map; you're creating a new one that shares structure with our-immutable-map, but has a different value to :foo.  Because it's so easy to share structure on this immutable world [1], everything is still quite efficient, memory-wise. You can have a huge data structure and lots of modifications on top of it, and it'll all just work, with minimal memory overhead. Also, the garbage collector will also take care of unused parts of this structure.

Note: this is due to the underlying data structures used in Clojure and many other functional languages. If you're interested, they're called persistent data structures.

The big advantage: we all know that shared mutable state is the root of all evil, and we just killed the mutable part (or most of it). The idea that mutable state is a major contributor to complexity in software is also well discussed in the famous Out of the Tarpit paper, which we highly recommend. In this specific example, we can now share data structures between multiple threads with the confidence that they'll always be consistent. No need to worry about race conditions with shared data structures. Sounds pretty useful, right?
As a side note: sometimes we still have to deal with shared mutable state. In fact, we also use Object Oriented Programming at Nubank to solve problems where a stateful in-memory object is the right tool for the job (e.g., connection pools, consumer loops, etc.).  Some functional languages have facilities for dealing with mutable state safely and explicitly.  Clojure, for instance, has a nifty way of doing so called atoms. They use compare-and-swap, so the necessary synchronization is handled automagically for you.  Zooming out a bit more, we also use records and the component library for most of the stateful objects in our services.
Clear separation between pure and impure functions

A function is pure when it doesn't have side-effects and its output depends solely on its input. This is also referred to as referential transparency. Give it the same arguments a thousand times and it'll return the same values a thousand times. It also won't affect the rest of the world. Examples of side effects are: HTTP requests, reading or writing to a database (although there are exceptions here), producing a message to a broker, etc.
The great advantage of pure functions are: they're easy to test. If you keep all your business logic in pure functions, all there is to do in impure functions is compose business logic and I/O. Because impure functions are trickier to test and reason about, it's nice to minimize the amount of complexity we tackle in them. This naturally provides a useful convention for structuring projects. Impure functions only require integration tests (e.g., testing from a Kafka message to the database and back again). Pure functions are covered by unit tests (and are also exercised by integration tests). And that's it!  No need to think if a function you're calling will change that one bit that will make your program blow up. We've even adopted a convention within Nubank of naming impure functions to end with a "bang" (!) to keep it easy for us to read and remember when we are detail with side effects.  Sounds useful for real life projects, right?
Focus on transforming data

When compared to an imperative paradigm, in functional programming you're usually declaring data transformations instead of computation steps. Most of the impure logic in your program will just be steps that transform data and then feed it into an I/O function. Given this, it's better to have a bunch of small functions that can be easily composable. In fact, some styleguides state that a limit of 5 lines of code is enough on most occasions.
To illustrate this concept, let's look at an example. At Nubank we register users (like most companies!), so we need to have a (strongly) hashed version of the user's password and include the creation date. Let's insert this into our database:
https://gist.github.com/700cfb60defb9cb466fd673c891afd1d
This will modify the user data with pure functions and only then feed it into the db/save-user! function, which will insert it into the database. Bonus: we can make handle-user-registration! clearer with Clojure's thread-first macro. This is only a syntactic sugar: it will take the output from the last function and use it as the first argument for the next function.
https://gist.github.com/928d449724d9a60336c10834c364f454
This makes the focus on a data transformation pipeline even clearer.
Functions as a first-class citizen

This has a big name, but all that it means is that you can use functions the same way you use any other value, like integers or strings. You can pass them to other functions, naming them, etc. This is useful for composing small functions to do big, important things. Let's take a look at the previous example, but this time we'll make a function that receives a list of user data to be prepared before being inserted in the database.
https://gist.github.com/fbd5df9c6bcddf9718c84d200a12d80c
Don't get spooked by the syntax, we'll walk you through it:

#(with-creation-date % as-of) is a syntactic sugar for creating an anonymous function (a function that has no name) that receives a single argument, denoted by %;
let is Clojure's way of assigning names to values in a defined scope;
comp takes a list of functions and composes them. The value returned from comp is also a function that will pipe results from one function to another. For more clarity, suppose we have 3 functions f, g and h. This way, (comp h g f) is equivalent to: (fn [x] (h (g (f x))));
map takes a function and a sequence and applies the function to each element of the sequence, returning a new sequence.

This snippet exemplifies an interesting concept: higher-order functions are just functions that accept other functions as arguments. Both comp and map are higher order functions and there are a lot more in Clojure's core library as well as the Clojure Koans.
This pretty much sums up what we at Nubank use from the broad array of functional programming definitions. Serious, that's all you have to know.
Wait, but what about monads?

Well, we don't really use them (at least not at scale). Monads, like everything else, are just tools that can help. We do have some monads lying around, and although they're super useful in some situations, they're not really needed most of the time. You don't have to have your "a monad is like a burrito" enlightenment moment before working with functional programming. You can do just fine without them (that is, unless you're using a stricter language like Haskell - in this case you have no option).
Another misconception: the "big ball of mud"

How do we cope with complexity without the help from classes, design patterns, factories, etc? If we don't have a data structure to provide encapsulation, and we are going to write a truckload of small functions (less than 5 lines each!), isn't that going to be a big mess for real world use cases? Not to worry. In reality, small functions with well defined scopes can be really well organized within namespaces (think "file paths"). A namespace is a unit of encapsulation in Clojure, not a data structure (for now, let's not worry about the concept that code is also data), and any public functions in a namespace . When a project gets bigger, some simple architectural patterns can help. At Nubank, our preferred pattern is the hexagonal architecture. But that's a topic for a future post.