As we've seen, seq
is the mechanism in Clojure for converting a collection into something that responds to first and rest, so that we can traverse it as we would a linked list. Sequences automatically print at the REPL like a list.
=> (seq '(1 2 3 4))
(1 2 3 4)
=> (seq [1 2 3 4])
(1 2 3 4)
=> (seq #{1 2 3 4})
(1 4 3 2)
=> (seq {:a 1, :b 2})
([:b 2] [:a 1])
But one thing I've neglected to mention is what happens when you try to call seq
on an empty collection?
=> (seq ())
nil
=> (seq [])
nil
=> (seq #{})
nil
=> (seq {})
nil
The reasoning behind this is that an empty collection cannot be converted into something with a first and rest, so seq
should return nil.
This leads us to another Clojure idiom for testing whether a collection is empty -- we simply call seq
on it. If the call to seq
returns anything other than nil, the collection is not empty. Remember, all values other than nil and false are truthy. So sometimes in Clojure code, rather than:
(if (empty? l)
<what to do if l is empty>
<what to do if l is non-empty>)
you'll see
(if (seq l)
<what to do if l is non-empty>
<what to do if l is empty>)
In fact, the empty?
predicate is internally defined in Clojure as:
(defn empty? [coll]
(not (seq coll)))
So we gain a minuscule amount of performance by testing directly off of the call to seq
rather than testing off of empty?
which is implemented as a double negative.
In situations where programmers want to eke out a few more cycles of performance, there's another little trick we can do. Usually, inside the two cases of our if, we are doing things to the first and rest of the sequence. But first
and rest
implicitly call seq
. This means we're actually calling seq
multiple times, once to find out whether it is empty, and then it is getting called again by first
and rest
. For lists and things that are already sequences, seq
is a no-op, so it's no big deal, but for some other collections (like vector), seq
is allocating a new object to track where it is in the sequence. So a slighlty more performant idiom is:
(defn fn-for-collection [coll]
(let [s (seq coll)]
(if s
<do stuff with (first s) and (rest s)>
<what to do if the collection is empty>)))
This is a common enough pattern, there's a built-in macro if-let
that turns into let
followed by if
. So we can rewrite this as:
(defn fn-for-collection [coll]
(if-let [s (seq coll)]
<do stuff with (first s) and (rest s)>
<what to do if the collection is empty>))
This is not critical to remember, but I wanted to point out that this ordering of cases for structural recursion on sequences, along with giving a name to the result of calling seq
on the collection, is marginally more performant, and this is why you'll see it crop up in Clojure code, especially in Clojure's internals, where every little bit of performance matters.
Sometimes, in loop-recur code, people will optimize a few extra cycles by repositioning the call to seq
at the initialization of the loop variables and at the recur site, so that you don't need to create a local variable, just using the name already given as part of the loop-recur structure. A function (next s)
is in Clojure, implemented as (seq (rest s))
to make this idiom more concise. It looks like this:
(defn fn-for-collection [coll]
(loop [s (seq coll),
accumulator initial-value]
(if s
(recur (next s) <update the accumulator>)
accumulator)))
These tricks provide only the tiniest of gains, so don't spend a lot of time worrying about this in ordinary code. Just use whatever idiom feels most natural to you. And most of the time, you'll be using Clojure's higher-order functions which use all these tricks and more, so you don't have to think about these details.
I'll end this article with a few more quirky examples you might find interesting if you're the kind of person who likes to understand all the edge cases:
=> (first nil)
nil
=> (rest nil)
()
=> (first [])
nil
=> (rest [])
()
=> (cons 1 nil)
(1)
this is a little unsetting, you only get the same container type back if there are two or more elements in it: