Skip to content

Instantly share code, notes, and snippets.

@Engelberg
Last active March 22, 2018 22:06
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Engelberg/ba30fb436bbe50db6d64 to your computer and use it in GitHub Desktop.
Save Engelberg/ba30fb436bbe50db6d64 to your computer and use it in GitHub Desktop.
An analysis of different methods for implementing frequencies in Clojure

First, here is the cleverly concise version of freqs that I showed in class, which builds a bunch of maps that map each element to the number 1, and then merges them together using +:

(defn freqs [l]
  (apply merge-with + (for [item l] {item 1})))

Here is one way of coding it using loop-recur:

(defn freqs [l]
  (loop [l l,
         frequency-map {}]
    (if (empty? l) frequency-map,
      (recur (rest l)
             (assoc frequency-map
                    (first l)
                    (inc (get frequency-map (first l) 0)))))))

The idea here is that each time we recur, we drop the first item from the list, but add 1 to the tally for that item in the frequency-map (by first looking up the current value in the map, using a default value of 0 if it is not present, and then incrementing it and assoc’ing it back into the frequency-map).

The most common use of loop-recur, by far, is traversing a list once, accumulating some result. As we saw, reduce is simply a higher-order function that implements this common pattern. So, the equivalent way to do this with reduce is:

(defn freqs [l]
  (reduce
    (fn [frequency-map item]
      (assoc frequency-map item
             (inc (get frequency-map item 0))))
    {} l))

Is there a way to make this more compact? Of course there is -- this is Clojure!

Read on if you’re curious. But first, I have to take a little detour to show you a few new functions.

Detour - Nested Maps

Data structures in Clojure can be arbitrarily nested. For example, consider the following map:

(def m {:father {:person "Tom", :age 23},
        :daughter {:person "Marsha", :age 2}})

Let’s say we want to extract the age of the father. One way:

(get (get m :father) :age)

Ugh. That’s a little clunky. So instead, we use a function designed for nested-maps, which takes a vector of the keys:

(get-in m [:father :age])

Similarly, we can assoc-in:

=> (assoc-in m [:father :age] 25)
{:daughter {:age 2, :person "Marsha"}, :father {:age 25, :person "Tom"}}

But a common pattern is to extract a deeply nested value, update it via some function, and then re-assoc the new value back in. There’s a function for this, update-in. So to increment the father’s age, we could just do:

=> (update-in m [:father :age] inc)
{:daughter {:age 2, :person "Marsha"}, :father {:age 24, :person "Tom"}}

So there’s get, get-in, assoc, assoc-in, and update-in. Surely there is a function “update”, for non-nested maps, right?

Well, you’d be right to think there would be such a function, and in fact, there will be in Clojure 1.7 (coming soon), but for whatever reason, update was neglected in the core API until now. Fortunately, there’s a simple enough trick to use update-in on a flat map -- simply use a vector of length 1. For example, to increment the value of :b in the map {:a 1, :b 2}, we simply use update-in with [:b] as follows:

=> (update-in {:a 1, :b 2} [:b] inc)
{:b 3, :a 1}

Back to freqs

So with this in mind, we might try something like this:

(defn freqs [l]
  (reduce
    (fn [frequency-map item]
      (update-in frequency-map [item] inc))
    {} l))

But this doesn’t quite work. If we run this, we get a null ptr exception:

=> (freqs [:a :b :a :b :c])
NullPointerException   clojure.lang.Numbers.ops (Numbers.java:961)

What’s going wrong? Well, the problem is that update-in looks up the first item in the empty map (which is nil), and then tries to increment nil. (inc nil) is a null pointer exception.

So, we need to replace inc with something that handles the notion of “incrementing nil” and returning 1. Remember, all numbers are "truthy" values and nil is one of the two "falsey" values. So, we can just test the input n in our if statement. If it's a number, it will test as true, and if it is nil, it will test as false.

(defn inc-handling-nil [n]
  (if n (inc n) (inc 0)))

=> (inc-handling-nil 3)
4
=> (inc-handling-nil nil)
1

Now, we can write a somewhat simpler version:

(defn freqs [l]
  (reduce
    (fn [frequency-map item]
      (update-in frequency-map [item] inc-handling-nil))
    {} l))

As you might imagine, this idea of “handling the nil case” comes up a lot when using update-in. So there’s a higher-order function called fnil, which implements this pattern of “nil patching” a function. It takes the original function and the number it should use as an input instead of nil. So, the inc-handling-nil function we just wrote can be expressed as (fnil inc 0)

(defn freqs [l]
  (reduce
    (fn [frequency-map item]
      (update-in frequency-map [item] (fnil inc 0))
    {} l)))

If we really want to play “code golf” and make the code even more compact at the expense of clarity, we could do something like this:

(defn freqs [l]
  (reduce #(update-in %1 [%2] (fnil inc 0)) {} l))

which is now almost as short as the merge-with strategy I showed at the beginning, and in Clojure 1.7, we will be able to write it as:

(defn freqs [l]
  (reduce #(update %1 %2 (fnil inc 0)) {} l))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment