Skip to content

Instantly share code, notes, and snippets.

@Engelberg
Last active August 29, 2015 14:18
Show Gist options
  • Save Engelberg/e54f719e96a56113d001 to your computer and use it in GitHub Desktop.
Save Engelberg/e54f719e96a56113d001 to your computer and use it in GitHub Desktop.
Empty sequences

As we've seen, seq is the mechanism in Clojure for converting a collection into something that responds to first and rest, so that we can traverse it as we would a linked list. Sequences automatically print at the REPL like a list.

=> (seq '(1 2 3 4))
(1 2 3 4)
=> (seq [1 2 3 4])
(1 2 3 4)
=> (seq #{1 2 3 4})
(1 4 3 2)
=> (seq {:a 1, :b 2})
([:b 2] [:a 1])

But one thing I've neglected to mention is what happens when you try to call seq on an empty collection?

=> (seq ())
nil
=> (seq [])
nil
=> (seq #{})
nil
=> (seq {})
nil

The reasoning behind this is that an empty collection cannot be converted into something with a first and rest, so seq should return nil.

This leads us to another Clojure idiom for testing whether a collection is empty -- we simply call seq on it. If the call to seq returns anything other than nil, the collection is not empty. Remember, all values other than nil and false are truthy. So sometimes in Clojure code, rather than:

(if (empty? l)
  <what to do if l is empty>
  <what to do if l is non-empty>)

you'll see

(if (seq l)
  <what to do if l is non-empty>
  <what to do if l is empty>)

In fact, the empty? predicate is internally defined in Clojure as:

(defn empty? [coll]
  (not (seq coll)))

So we gain a minuscule amount of performance by testing directly off of the call to seq rather than testing off of empty? which is implemented as a double negative.

In situations where programmers want to eke out a few more cycles of performance, there's another little trick we can do. Usually, inside the two cases of our if, we are doing things to the first and rest of the sequence. But first and rest implicitly call seq. This means we're actually calling seq multiple times, once to find out whether it is empty, and then it is getting called again by first and rest. For lists and things that are already sequences, seq is a no-op, so it's no big deal, but for some other collections (like vector), seq is allocating a new object to track where it is in the sequence. So a slighlty more performant idiom is:

(defn fn-for-collection [coll]
  (let [s (seq coll)]
    (if s
      <do stuff with (first s) and (rest s)>
      <what to do if the collection is empty>)))

This is a common enough pattern, there's a built-in macro if-let that turns into let followed by if. So we can rewrite this as:

(defn fn-for-collection [coll]
  (if-let [s (seq coll)]
    <do stuff with (first s) and (rest s)>
    <what to do if the collection is empty>))

This is not critical to remember, but I wanted to point out that this ordering of cases for structural recursion on sequences, along with giving a name to the result of calling seq on the collection, is marginally more performant, and this is why you'll see it crop up in Clojure code, especially in Clojure's internals, where every little bit of performance matters.

Sometimes, in loop-recur code, people will optimize a few extra cycles by repositioning the call to seq at the initialization of the loop variables and at the recur site, so that you don't need to create a local variable, just using the name already given as part of the loop-recur structure. A function (next s) is in Clojure, implemented as (seq (rest s)) to make this idiom more concise. It looks like this:

(defn fn-for-collection [coll]
  (loop [s (seq coll),
         accumulator initial-value]
    (if s
      (recur (next s) <update the accumulator>)
      accumulator)))

These tricks provide only the tiniest of gains, so don't spend a lot of time worrying about this in ordinary code. Just use whatever idiom feels most natural to you. And most of the time, you'll be using Clojure's higher-order functions which use all these tricks and more, so you don't have to think about these details.

I'll end this article with a few more quirky examples you might find interesting if you're the kind of person who likes to understand all the edge cases:

=> (first nil)
nil
=> (rest nil)
()
=> (first [])
nil
=> (rest [])
()
=> (cons 1 nil)
(1)
@bion
Copy link

bion commented Apr 12, 2015

this is a little unsetting, you only get the same container type back if there are two or more elements in it:

=> (type (rest []))
clojure.lang.PersistentList$EmptyList

=> (type (rest [1]))
clojure.lang.PersistentList$EmptyList

=> (type (rest [1 2]))
clojure.lang.PersistentVector$ChunkedSeq

@bion
Copy link

bion commented Apr 12, 2015

also not sure if (cons 1 nil) is something that looks handy or scary.

@Engelberg
Copy link
Author

I didn't get an email notification about your comments, so I only just noticed them now.

The (cons 1 nil) thing dates back to the early days of Clojure. At that time, Clojure followed the Common Lisp tradition that there is no such thing as an empty list, and () is just an alias for nil. The idea was that a list is something with a first and rest; the empty list has neither, therefore it doesn't exist.

It turned out that this had subtle implications with respect to laziness. You couldn't even determine whether something was a list without trying to force the first element. This behavior made Clojure sequences a little too eager, because Clojure was always trying to realize one element ahead of what you really cared about.

So in a big overhaul, Clojure revised its notion of a sequence to be something that might be empty, and only after you call seq on it do you find out whether it turns into nil or some other underlying implementation of the interface that supports first/rest. Clojure takes its backwards-compatibility story pretty seriously though, so there are still a lot of little ways that you can use nil in place of the empty list, and cons is one of those instances.

As for your other comment, I'd say that Clojure tries to discourage people from worrying too much about the concrete types that are used behind the scenes to represent your data. You are not meant to care (or rely on in your code) the exact types that are produced when you rest various things. Instead, all you need to know is that rest produces some further implementation of first/rest, i.e., it behaves like a sequence, and that's all that matters.

For example, if you create a list with (cons 1 (cons 2 [3 4])), and start traversing it with first/rest, you'll see that at some point the type will switch over from being a clojure.lang.Cons to being a PersistentVector$ChunkedSeq. But so what, as long as I can treat it seamlessly as a sequence?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment