nicl/scala-good-parts.md

## scala-good-parts.md

      
    Raw
  

              scala-good-parts.md
            
          
    How I write Scala

Note, this is a living document. If there are bits of Scala you think
deserve to be called 'good' or 'bad' let me know and I'll add them to
the discussion below.

Intro
Guiding principles
Good features

Singleton objects as namespaces for pure functions
Case classes as heterogeneous typed wrappers
Traits to describe 'is-a' relationships


Bad features

Stateful classes and objects
Call-by-name
Structural typing
Implicits
All the rest


Handling state

Intro

Scala is a multi-paradigm, permissive, language. It supports writing
code in an object-oriented or mostly functional style. Often
criticised for being complicated, some companies have
[famously left the language][twitter-leave-scala]. Nevertheless, Scala
is increasingly used in enterprise software, distributed systems -
typically '[big data][spark]' - and elsewhere as a 'better
Java'. Competition for mindshare on the JVM includes Clojure and
Kotlin. Beyond the platform, languages like Haskell, Erlang and
Go offer alternatives.
Language features include: pattern matching, implicits (conversions
and arguments), for expressions, singleton objects, traits, structural
typing, and higher-kinded types.
Some of these features are dangerous; others introduce more complexity
than they are worth for the majority of teams.
For my part, I like to subset the language. The 'good parts' of Scala
are relatively small. They include:

objects without state
case classes without behaviour
pattern matching
for comprehensions

These features are sufficient for most code and lend themselves to a
functional approach. Objects act as namespaces for functions; case
classes model heterogeneous data in a type-safe way.
There are lots of other niceties which are worth using but don't get a
full discussion here, for example named arguments.
Features to steer clear of include:

regular classes
objects with state
implicits (all types)
exceptions
infix notation

A large caveat is that I typically work on web services with teams of
mixed Scala and programming experience. Having said that, there are
good reasons for even [advanced teams][databicks-style-guide] to adopt
many of these recommendations.
As an aside, I am not a strong believer in the more advanced
functional programming - I'm talking here of free monads, lenses, etc.
And also type-heavy libraries like Cats, Scalaz, Slick/Doobie,
Shapeless and so on. If you do want to use these things, make sure your
team is ready and that the benefit exceeds the conceptual overhead for
your domain. I haven't seen these criteria met in my career so far.
Guiding principles

Before diving into specific features, we need a framework to evaluate
them. My framework is this:

it is better to keep behaviour and data separate
it is important to preserve referential transparency
priorise readability

The last is the most important concern.
I interpret readability as a combination of:
no magic, limited abstraction, pure functions, small and focused
components, one way of doing things.
Good features

Singleton objects as namespaces for pure functions

This is our bread and butter. It is well known that
[pure functions][pure-functions] are easier to reason about, test, and
compose. Strive to build the bulk of your program out of them.
Unfortunately, Scala is object-orientated and doesn't allow functions
to be defined outside of an object. But we can easily use methods
defined on a singleton object as pseudo-functions:
object MyPsuedoNamespace {
    def hello(name: String): String = s"hello $name!"
}

Provided our methods don't access member state, they act as
functions.
We can convert our method to an actual function, which can be passed
around, using an underscore character like so:
val hello = MyPsuedoNamespace.hello _

Yes this is a little weird. Underscores are
[famously overloaded][underscore] in Scala so get used to it.
Case classes as heterogeneous typed wrappers

In a dynamic language, we can get by with sequences and maps most of
the time. In a statically typed language, we keep the sequences but
tend to use structs or other typed collections in favour of maps. In
Scala, this is achieved via case classes:
case class Foo(a: String, b: Int)

Case classes are normal classes with some utility methods, which
assist with pattern matching (among other things), and with
constructor parameters defaulting to vals. They cannot be used to
extend another class.
Stateful classes are to be avoided (see below) but case classes are
useful as heterogeneous wrappers, provided we stick to the following
rules:

they should only contain vals pointing to immutable values
they should not contain methods

Some people like adding utility methods to case classes as
auto-completion is helpful and it can be nicer to write foo.bar rather
than bar(foo). E.g.
case class Person(firstName: String, lastName: String) {
    def name: String = firstName + " " + lastName
}

However, the disadvantages and dangers of allowing methods which rely
on object members are sufficiently great that I think it is best to
stick to the simple and dumb rule to ban all methods on classes. This
way avoids mistakes and is easy to follow.
Traits to describe 'is-a' relationships

Traits in Scala can function as a variety of things: interfaces,
mixins, categorisation.
Interfaces are interesting because they enable polymorphism.
trait Writer {
    def write[A](item: A): Unit = ???
}

Good interfaces are small, typically describing only a single
method. This makes them easy to implement and test. Golang has many
examples of this kind of interface. They are particularly useful when
performing impure operations - for example reading from something, or
writing to something - but are helpful when you are writing a library
or component which needs to expose an interface of some kind.
What is the role of interfaces though if we are interested in pure
functions? It is obvious that to use interfaces we need classes, and
we have already said those are to be avoided.
There is a tension here: polymorphism is useful, but using traits to
achieve it risks the gluing together of behaviour and data we are
trying to avoid. First-class functions, or working with pure data
(perhaps passing around case classes) can help us get by without
classes.
Traits are useful, though, to describe 'is-a' relationships. E.g.
sealed trait Status
case object Approved extends Status
case object Rejected extends Status

This combines usefully with pattern matching in Scala:
def report(status: Status): Unit = status match {
    case Approved => println("Approved!")
    case Rejected => println("Sad times")
}

Because the trait is 'sealed' Scala will warn us if we've missed any
cases when we match against it.
Other uses are an abomination. Do not use traits to
mixin behaviour;
prefer singleton objects wrapping pure functions. Do not use traits to
mixin state; prefer composition (or better yet, avoid state
altogether).
Bad features

(By which is meant, features to avoid.)
Stateful classes and objects

I've often heard people say 'never use var or mutable collections in
Scala.' But, to be honest, there's nothing wrong with var in
itself. For example:
def count[A](seq: Sequence[A]: p: A => Boolean): Int = {
    var n = 0
    for (i <- seq) {
        if (p(i)) n += 1
    }

    n
}

The code is easy to understand and also quick. The function is
pure.
When they talk about var, people are really objecting to state. By
state is meant anything which can change over time. In contrast, a
value is by it's very nature fixed over time.
In Scala, state is often stored in a var:
class Foo {
    var bar: List[String] = List("doh")
}

But it doesn't have to be:
class Foo {
    val bar: Array[String] = Array("doh")
}

(The Array in this case is a mutable collection.)
Of course, if classes aren't allowed to contain state why use them at
all? Indeed, a singleton object is a better container for behaviour.
So avoid classes in your Scala code.
An exception is made for case classes, which are discussed elsewhere.
We've already discussed traits, but as a reminder: use traits to
express sum types but
not for other purposes.
Call-by-name

Call by name is one way to achieve lazy evaluation in Scala. In the
following code, the laz parameter is an expression that returns a
string, and will be calculated each time laz is used in the body:
def callByName(laz: => String): String = ...

Laziness can also be achieved with function arguments. E.g.
def alsoLazy(laz: () => String): String = ...

The contract with the callee is slightly different, and may involve a
bit more work (to create the function), but the effect is the same.
The advantage of the function argument is that the callee is made
aware that the argument is lazy. This is valuable as the code might
have side effects or be expensive to compute, so it is important to
know if and how often if will be run.
Therefore, avoid call-by-name; prefer function arguments.
Structural typing

Structural typing is a beautiful thing when you are working with
existing classes. It is a form of
duck-typing, and allows
you to tease out interfaces (of a sort) for existing or library
code. Rather than writing a function to accept a very specific library
class, which makes testing difficult, I can specify the subset of
behaviour my function requires.
Consider the case where I want to close a database connection. The
client library has a close method. A trait capturing this might be:
trait Closable { def close(): Unit }

The interface specified is pleasingly small. And I can write a simple,
easily testable, method using it:
def closeConn(client: Closable): Unit = client.close()

Of course, I could write this method to accept the client library
class itself. But that would make it much less generic and also more
difficult to test - as I would have to mock or implement the entire
interface.
Unfortunately, the JDBC class doesn't implement our Closable
trait. Structural typing can help us get round this:
def closeConn(client: { def close(): Unit }): Unit = client.close()

Instead of requiring a trait, we now simply state the interface as a
structural type. The code is simple and usable beyond the specific
JDBC class.
It remains true, however, that people in Scala don't use structural
typing. More common is to see a wrapper class or type class (which is
equivalent), or simply handling the specific client type in their
code.
This seems a shame, as the structural approach is more succinct and
results in better code.
Why are people so averse to structural typing?
One reason cited is speed; structural typing works via reflection,
which is slow. There's some research on this
here (pdf). I'm
not sure this should be a concern for most people but obviously it
depends on your application.
Perhaps it's simply that these kinds of cases are rare and the
language feature is relatively unknown.
I'm not sure.
But it's hard to recommend a language feature few people are aware of
and even fewer use.
Implicits

Implicits are ubiquitous in Scalaland so it may surprise some to see
them listed as 'evil' here. They can reduce boilerplate, through
implicit parameters, or extend existing code, through implicit
conversions.
But software is about tradeoffs; and in
the case of implicits, I'm not sure the benefits are worth the cost.
Implicits suffer because they are somewhat 'magical' by which is meant
they make things harder to reason about in the local. This is the
result of implicit resolution rules which are complicated and way too
broad, and also because magically converting from one type to another
is just non-obvious and confusing.
Implicits are also abused* by libraries like Shapeless to introduce
incredibly generic programming. The conceptual complexity and
crippling compile times that result are not worth it for almost all
teams.
I should be clear here: I admire the creativity and intelligence shown
by these libraries. But if you want generic, data-oriented
programming, I think you're better off writing Clojure.
All the rest

Below are a list of features which I have not found strong-enough
benefits for to justify the additional learning/overhead in what
already is a difficult language to learn:

self types
currying
infix notation

In general, do not pursue terseness if it requires you to use
less commonly-used language features. There is a lot of value in
consistency across codebases and an aggressive language subset is
necessary to achieve this.
Handling state

The astute reader will reasonably be frothing at this point: where is
the state?!
And it's true. In complex applications, it is necessary to manage
state - typically, stateful dependencies. In a web app, these may be
clients of some kind - perhaps to a database or external service. The
clients are expensive, by which is meant they occupy threads or some
other limited resource, and we do not want to duplicate them
unnecessarily.
Or we might want to cache expensive calculations or network calls
locally for performance or availability reasons.
This state needs to be managed, which inevitably means using some of
the 'bad features' I attempt to dissuade you from below. The aim then
is to push state to the edge of our programs, or component if your
codebase is large, and keep to 'good features' for the rest.
The difference here between regular OO is that the
granularity of 'components' is a lot less. In OO components,
or classes are used even for relatively small bits of
functionality. Here I am advocating for an idea of components
closer to subsystems. A typical program should not have many
components at all. Small programs can be treated as a single
component, with all state managed at the edge.
A small note on Dependency Injection Frameworks

To address the state problem, people use dependency injection; dependencies
are injected into stateful components using a library of some kind, or
perhaps manually. In most languages (including Scala), components take
the form of classes and dependencies are passed in as constructor
arguments. Closures can achieve the same effect but are certainly not
idiomatic.
The key advantage of DI is that dependencies are explicit, leading to
better understanding, testability, and flexibility.
This is all good stuff.
But how this is achieved also matters. There are two approaches
I want to discourage here:

DI libraries, which handle the creation and injection of dependencies,
reducing the need for lots of boilerplate code and enabling the easy
sharing of dependencies where sensible
using implicits to reduce boilerplate when passing dependencies (DI
libraries often use this under the hood)

Libraries can reduce boilerplate but they typically do so at the
expense of readability, which is a bad exchange.
The second, implicits, suffers from all the pitfalls of implicits
themselves; the lack of clarity works against some of the core
motivations for DI in the first place.
Bio


https://github.com/databricks/scala-style-guide
https://www.reddit.com/r/scala/comments/2ze443/a_good_example_of_a_scala_style_guide_by_people/