Skip to content

Instantly share code, notes, and snippets.

@nicl
Last active September 17, 2020 08:58
Show Gist options
  • Save nicl/0b9537f950902802a461c56e232d5634 to your computer and use it in GitHub Desktop.
Save nicl/0b9537f950902802a461c56e232d5634 to your computer and use it in GitHub Desktop.
Scala - the good parts

How I write Scala

Note, this is a living document. If there are bits of Scala you think deserve to be called 'good' or 'bad' let me know and I'll add them to the discussion below.

Intro

Scala is a multi-paradigm, permissive, language. It supports writing code in an object-oriented or mostly functional style. Often criticised for being complicated, some companies have [famously left the language][twitter-leave-scala]. Nevertheless, Scala is increasingly used in enterprise software, distributed systems - typically '[big data][spark]' - and elsewhere as a 'better Java'. Competition for mindshare on the JVM includes Clojure and Kotlin. Beyond the platform, languages like Haskell, Erlang and Go offer alternatives.

Language features include: pattern matching, implicits (conversions and arguments), for expressions, singleton objects, traits, structural typing, and higher-kinded types.

Some of these features are dangerous; others introduce more complexity than they are worth for the majority of teams.

For my part, I like to subset the language. The 'good parts' of Scala are relatively small. They include:

  • objects without state
  • case classes without behaviour
  • pattern matching
  • for comprehensions

These features are sufficient for most code and lend themselves to a functional approach. Objects act as namespaces for functions; case classes model heterogeneous data in a type-safe way.

There are lots of other niceties which are worth using but don't get a full discussion here, for example named arguments.

Features to steer clear of include:

  • regular classes
  • objects with state
  • implicits (all types)
  • exceptions
  • infix notation

A large caveat is that I typically work on web services with teams of mixed Scala and programming experience. Having said that, there are good reasons for even [advanced teams][databicks-style-guide] to adopt many of these recommendations.

As an aside, I am not a strong believer in the more advanced functional programming - I'm talking here of free monads, lenses, etc. And also type-heavy libraries like Cats, Scalaz, Slick/Doobie, Shapeless and so on. If you do want to use these things, make sure your team is ready and that the benefit exceeds the conceptual overhead for your domain. I haven't seen these criteria met in my career so far.

Guiding principles

Before diving into specific features, we need a framework to evaluate them. My framework is this:

  • it is better to keep behaviour and data separate
  • it is important to preserve referential transparency
  • priorise readability

The last is the most important concern.

I interpret readability as a combination of: no magic, limited abstraction, pure functions, small and focused components, one way of doing things.

Good features

Singleton objects as namespaces for pure functions

This is our bread and butter. It is well known that [pure functions][pure-functions] are easier to reason about, test, and compose. Strive to build the bulk of your program out of them.

Unfortunately, Scala is object-orientated and doesn't allow functions to be defined outside of an object. But we can easily use methods defined on a singleton object as pseudo-functions:

object MyPsuedoNamespace {
    def hello(name: String): String = s"hello $name!"
}

Provided our methods don't access member state, they act as functions.

We can convert our method to an actual function, which can be passed around, using an underscore character like so:

val hello = MyPsuedoNamespace.hello _

Yes this is a little weird. Underscores are [famously overloaded][underscore] in Scala so get used to it.

Case classes as heterogeneous typed wrappers

In a dynamic language, we can get by with sequences and maps most of the time. In a statically typed language, we keep the sequences but tend to use structs or other typed collections in favour of maps. In Scala, this is achieved via case classes:

case class Foo(a: String, b: Int)

Case classes are normal classes with some utility methods, which assist with pattern matching (among other things), and with constructor parameters defaulting to vals. They cannot be used to extend another class.

Stateful classes are to be avoided (see below) but case classes are useful as heterogeneous wrappers, provided we stick to the following rules:

  • they should only contain vals pointing to immutable values
  • they should not contain methods

Some people like adding utility methods to case classes as auto-completion is helpful and it can be nicer to write foo.bar rather than bar(foo). E.g.

case class Person(firstName: String, lastName: String) {
    def name: String = firstName + " " + lastName
}

However, the disadvantages and dangers of allowing methods which rely on object members are sufficiently great that I think it is best to stick to the simple and dumb rule to ban all methods on classes. This way avoids mistakes and is easy to follow.

Traits to describe 'is-a' relationships

Traits in Scala can function as a variety of things: interfaces, mixins, categorisation.

Interfaces are interesting because they enable polymorphism.

trait Writer {
    def write[A](item: A): Unit = ???
}

Good interfaces are small, typically describing only a single method. This makes them easy to implement and test. Golang has many examples of this kind of interface. They are particularly useful when performing impure operations - for example reading from something, or writing to something - but are helpful when you are writing a library or component which needs to expose an interface of some kind.

What is the role of interfaces though if we are interested in pure functions? It is obvious that to use interfaces we need classes, and we have already said those are to be avoided.

There is a tension here: polymorphism is useful, but using traits to achieve it risks the gluing together of behaviour and data we are trying to avoid. First-class functions, or working with pure data (perhaps passing around case classes) can help us get by without classes.

Traits are useful, though, to describe 'is-a' relationships. E.g.

sealed trait Status
case object Approved extends Status
case object Rejected extends Status

This combines usefully with pattern matching in Scala:

def report(status: Status): Unit = status match {
    case Approved => println("Approved!")
    case Rejected => println("Sad times")
}

Because the trait is 'sealed' Scala will warn us if we've missed any cases when we match against it.

Other uses are an abomination. Do not use traits to mixin behaviour; prefer singleton objects wrapping pure functions. Do not use traits to mixin state; prefer composition (or better yet, avoid state altogether).

Bad features

(By which is meant, features to avoid.)

Stateful classes and objects

I've often heard people say 'never use var or mutable collections in Scala.' But, to be honest, there's nothing wrong with var in itself. For example:

def count[A](seq: Sequence[A]: p: A => Boolean): Int = {
    var n = 0
    for (i <- seq) {
        if (p(i)) n += 1
    }

    n
}

The code is easy to understand and also quick. The function is pure.

When they talk about var, people are really objecting to state. By state is meant anything which can change over time. In contrast, a value is by it's very nature fixed over time.

In Scala, state is often stored in a var:

class Foo {
    var bar: List[String] = List("doh")
}

But it doesn't have to be:

class Foo {
    val bar: Array[String] = Array("doh")
}

(The Array in this case is a mutable collection.)

Of course, if classes aren't allowed to contain state why use them at all? Indeed, a singleton object is a better container for behaviour.

So avoid classes in your Scala code.

An exception is made for case classes, which are discussed elsewhere.

We've already discussed traits, but as a reminder: use traits to express sum types but not for other purposes.

Call-by-name

Call by name is one way to achieve lazy evaluation in Scala. In the following code, the laz parameter is an expression that returns a string, and will be calculated each time laz is used in the body:

def callByName(laz: => String): String = ...

Laziness can also be achieved with function arguments. E.g.

def alsoLazy(laz: () => String): String = ...

The contract with the callee is slightly different, and may involve a bit more work (to create the function), but the effect is the same.

The advantage of the function argument is that the callee is made aware that the argument is lazy. This is valuable as the code might have side effects or be expensive to compute, so it is important to know if and how often if will be run.

Therefore, avoid call-by-name; prefer function arguments.

Structural typing

Structural typing is a beautiful thing when you are working with existing classes. It is a form of duck-typing, and allows you to tease out interfaces (of a sort) for existing or library code. Rather than writing a function to accept a very specific library class, which makes testing difficult, I can specify the subset of behaviour my function requires.

Consider the case where I want to close a database connection. The client library has a close method. A trait capturing this might be:

trait Closable { def close(): Unit }

The interface specified is pleasingly small. And I can write a simple, easily testable, method using it:

def closeConn(client: Closable): Unit = client.close()

Of course, I could write this method to accept the client library class itself. But that would make it much less generic and also more difficult to test - as I would have to mock or implement the entire interface.

Unfortunately, the JDBC class doesn't implement our Closable trait. Structural typing can help us get round this:

def closeConn(client: { def close(): Unit }): Unit = client.close()

Instead of requiring a trait, we now simply state the interface as a structural type. The code is simple and usable beyond the specific JDBC class.

It remains true, however, that people in Scala don't use structural typing. More common is to see a wrapper class or type class (which is equivalent), or simply handling the specific client type in their code.

This seems a shame, as the structural approach is more succinct and results in better code.

Why are people so averse to structural typing?

One reason cited is speed; structural typing works via reflection, which is slow. There's some research on this here (pdf). I'm not sure this should be a concern for most people but obviously it depends on your application.

Perhaps it's simply that these kinds of cases are rare and the language feature is relatively unknown.

I'm not sure.

But it's hard to recommend a language feature few people are aware of and even fewer use.

Implicits

Implicits are ubiquitous in Scalaland so it may surprise some to see them listed as 'evil' here. They can reduce boilerplate, through implicit parameters, or extend existing code, through implicit conversions.

But software is about tradeoffs; and in the case of implicits, I'm not sure the benefits are worth the cost.

Implicits suffer because they are somewhat 'magical' by which is meant they make things harder to reason about in the local. This is the result of implicit resolution rules which are complicated and way too broad, and also because magically converting from one type to another is just non-obvious and confusing.

Implicits are also abused* by libraries like Shapeless to introduce incredibly generic programming. The conceptual complexity and crippling compile times that result are not worth it for almost all teams.

I should be clear here: I admire the creativity and intelligence shown by these libraries. But if you want generic, data-oriented programming, I think you're better off writing Clojure.

All the rest

Below are a list of features which I have not found strong-enough benefits for to justify the additional learning/overhead in what already is a difficult language to learn:

  • self types
  • currying
  • infix notation

In general, do not pursue terseness if it requires you to use less commonly-used language features. There is a lot of value in consistency across codebases and an aggressive language subset is necessary to achieve this.

Handling state

The astute reader will reasonably be frothing at this point: where is the state?!

And it's true. In complex applications, it is necessary to manage state - typically, stateful dependencies. In a web app, these may be clients of some kind - perhaps to a database or external service. The clients are expensive, by which is meant they occupy threads or some other limited resource, and we do not want to duplicate them unnecessarily.

Or we might want to cache expensive calculations or network calls locally for performance or availability reasons.

This state needs to be managed, which inevitably means using some of the 'bad features' I attempt to dissuade you from below. The aim then is to push state to the edge of our programs, or component if your codebase is large, and keep to 'good features' for the rest.

The difference here between regular OO is that the granularity of 'components' is a lot less. In OO components, or classes are used even for relatively small bits of functionality. Here I am advocating for an idea of components closer to subsystems. A typical program should not have many components at all. Small programs can be treated as a single component, with all state managed at the edge.

A small note on Dependency Injection Frameworks

To address the state problem, people use dependency injection; dependencies are injected into stateful components using a library of some kind, or perhaps manually. In most languages (including Scala), components take the form of classes and dependencies are passed in as constructor arguments. Closures can achieve the same effect but are certainly not idiomatic.

The key advantage of DI is that dependencies are explicit, leading to better understanding, testability, and flexibility.

This is all good stuff.

But how this is achieved also matters. There are two approaches I want to discourage here:

  • DI libraries, which handle the creation and injection of dependencies, reducing the need for lots of boilerplate code and enabling the easy sharing of dependencies where sensible
  • using implicits to reduce boilerplate when passing dependencies (DI libraries often use this under the hood)

Libraries can reduce boilerplate but they typically do so at the expense of readability, which is a bad exchange.

The second, implicits, suffers from all the pitfalls of implicits themselves; the lack of clarity works against some of the core motivations for DI in the first place.

Bio

@regiskuckaertz
Copy link

But software is about tradeoffs; and in the case of implicits, I'm not sure the benefits are worth the cost.

That is pretty much the only reason to use Scala at all. Without that feature, there are literally dozens of languages that are much cleaner and pleasant to use than Scala; if people don't like implicits, they have no business doing Scala (I say that for their own sake). From a language design perspective, it was silly to give so much power in users' hands when there's a solution that has been working perfectly for almost 40 years now in Haskell.

currying
infix notation

If the goal of using Scala is to adopt an OO style, then it's a complete waste of time. There are again other languages that are much better designed for this. OTOH, Scala is part of a small niche of FP languages with an advanced type system (well ... it is still weak, let's say moderately advanced), this is where it could still shine, despite its terrible syntax. In that scenario, currying and infix operators are vital to the expressiveness of the language.

@regiskuckaertz
Copy link

I would add a few things to the evil list:

  • all the meta-programming stuff, macros should be avoided at all costs 🤗
  • everything here

@nicl
Copy link
Author

nicl commented Oct 4, 2018

@regiskuckaertz thanks. The truth is that, while I don't hate Scala or anything, I think the only reason the Guardian should use it now is because we have existing software to maintain. So I agree that Scala without implicits is a strange choice, just that we live in a world where people have already chosen Scala for us. I would not pick it for a greenfield project though.

@regiskuckaertz
Copy link

I would not pick it for a greenfield project though

I agree but probably not for the same reasons. To me, the feature set of the language is not appealing when you subtract the cost of the syntactic rules into the equation (and indeed the mess that is implicit resolution). I understand that Java programmers marvel at the amount of boilerplate they can save, but in the grand scheme of things Scala is extremely verbose. And fucking slow at development time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment