Skip to content

Instantly share code, notes, and snippets.

@ejackson
Created April 20, 2011 21:15
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ejackson/932875 to your computer and use it in GitHub Desktop.
Save ejackson/932875 to your computer and use it in GitHub Desktop.
nothing.html
<h1>Introduction</h1>
This is a brief post about how laziness can be used to construct a step-wise quicksort, or somebody gets excited about laziness.
<P>
Laziness is not something I'd come across before Clojure, and I've been repeatedly bitten by forgetting about it. So it was a real pleasure the other day to see the laziness revealed in some code, and for once not have this as the result of protracted debugging. I was so taken aback that I'm sharing the experience.
<h1>Quicksort</h1>
<h2>The Algo</h2>
We all know the <a href="http://en.wikipedia.org/wiki/Quicksort">algorithm</a>, and <a href="http://bilfp.wikidot.com/why-functional-programming-matters">here's</a> a pretty Clojure implementation:
<pre class="brush:Clojure">
(defn qsort [[pivot & tail]]
(when pivot
(lazy-cat (qsort (filter #(< % pivot) tail))
[pivot]
(qsort (remove #(< % pivot) tail)))))
</pre>
The logic of quicksort is straight forward enough: you build a (lazy) sequence by taking the first element of a given sequence, prepending the sorted sequence of all those elements that are less than it, and appending the sorted sequence of all those elements that are greater. Its really almost declarative and thus the intrinsic laziness may not apparent.
<h2>Laziness</h2>
The kicker here is that the prepended and appended sequences, those generated by <tt>(filter ...)</tt> and <tt>(remove ...)</tt> forms are <b>lazy</b>, and the whole thing lives in a <tt>lazy-cat</tt>. Thus the whole output sequence is lazy. This means that we can do crazy-eyed things like return the 3rd element of the sorted set without having sorted the rest of the sequence! This borders on the dark arts.
<h2>The Experiment</h2>
To figure out how this works lets modify the code to have a side effect so that we can watch the evolution of the lazy sequences.
<pre class="brush:Clojure">
(defn qsort [[pivot & tail]]
(when pivot
(lazy-cat (qsort (filter (fn [x] (prn (str "F:" x "<" pivot)) (< x pivot)) tail))
[pivot]
(qsort (remove (fn [x] (prn (str "R:" x "<" pivot)) (< x pivot)) tail)))))
</pre>
To get the output to match mine you'll want to run it under Clojure 1.0 to avoid the complicating effect of chunked sequences. Using this definition of <tt>qsort</tt> lets see the output of <tt>(first (qsort [1 3 2 6 4 5]))</tt>:
<P>
<tt>
"F:3<1"<BR>
"F:2<1"<BR>
"F:6<1"<BR>
"F:4<1"<BR>
"F:5<1"<BR>
1
</tt>
<P>
That's all it did before giving us the first element of the sorted seq - just 5 comparisons! It never even got into the <tt>remove</tt> branch.
<P>
In more detail <tt>qsort</tt> returns a lazy sequence and Clojure will only execute the necessary code to realise the sequence up to the point requested. We've asked for the <tt>first</tt> element. So Clojure will execute only as much code is necessary to realise that first element. At this point you might be still think that you need to sort the whole sequence to know which is the smallest. Not so.
<P>
The function <tt>qsort</tt> was passed the vector <tt>[1 3 2 6 4 5]</tt> and, via destructuring, took the first element, 1, as the <tt>pivot</tt> and the seq (3 2 6 4 5) as <tt>tail</tt>. I'm going to call those <tt>pivot_1</tt> and <tt>tail_1</tt> where the subscript indicates the level in the recursion. In that vein lets call the lazy sequence generated by the <tt>(filter...)</tt> <tt>filter_1</tt> and similarly for <tt>remove_1</tt>.
<P>
It then <tt>filter</tt>ed <tt>tail</tt>, selecting all elements < 1, and at the point of each comparison spat out some output. As no elements were < 1, <tt>filter_1</tt> was empty. The outer <tt>cat</tt> thus saw <tt>(concat (qsort filter_1) [pivot_1] (qsort remove_1))</tt> which is realised as <tt>(concat (qsort '()) [1] (qsort remove_1))</tt> which is <tt>(concat [1] remove_1)</tt>, ie (1 (qsort remove_1)). So the first element of the outer sequence became known at this point and execution stopped!
<P>
I intentionally placed the smallest element first in the input sequence to make the first pass through easy to understand. But don't be fooled into thinking that I'm cheating and that the laziness only works on such edge cases. Observe the full power of the Lazy side of the Force:
<P>
<tt>(second (qsort [1 3 2 6 4 5))</tt>
<P>
<tt>
;; ROUND 1<BR>
"F:3<1"<BR>
"F:2<1"<BR>
"F:6<1"<BR>
"F:4<1"<BR>
"F:5<1" ;; For first element of output, as above<BR>
;; ROUND 2
"R:3<1" ;; Get the pivot for round 2, ie realise the first element of remove_1<BR>
"R:2<1" ;; Second element of the remove_1<BR>
"F:2<3" ;; First element of filter_2<BR>
"R:6<1" ;; and so on...<BR>
"F:6<3"<BR>
"R:4<1"<BR>
"F:4<3"<BR>
"R:5<1"<BR>
"F:5<3"<BR>
</tt>
<BR>The thing to notice is how even when we're in round 2, considering the second pivot, 3, <tt>remove_1</tt>is still only realised element-wise as needed.
<P>
In detail, the consideration of the first pivot is as above. On the output we have (1 qsort(remove_1)). We know that, if realised, <tt>remove_1</tt> contains <tt>(3 2 6 4 5)</tt>, but at the point where <tt>qsort</tt> is called on it nothing has been realised. We can tell because the "R:#<1" outputs that occur when the comparisons that allow the seq to be realised occur have not occured.
<P>
So <tt>qsort</tt> starts processing <tt>remove_1</tt>. The first output we see is "R:3<1", which is the code destructuring to get <tt>pivot_2</tt>. It needs the first element of <tt>remove_1</tt>, but this is not yet realised, so it needs to run the comparison from round 1 to realise it, and thus we see "R:3<1" as it does so.
<P>
Now it needs to start filling <tt>filter_2</tt>. First it needs to read <tt>tail_2</tt>, that is,<tt>(rest remove_1)</tt> to get the next element, being 2, and we see "R:2<1" as it realises that element of <tt>remove_1</tt>. Then we see "F:2<3" as it realises the first element of </tt>filter_2</tt>. The same pattern occurs for the rest of <tt>remove_1</tt> as </tt>filter_2</tt> is realised. The realised seq is only <tt>(2)</tt>, so the ultimate output, the sequence we are realising with <tt>second</tt>, is </tt>(1 2 3 (qsort remove_2)</tt>. So again, execution stops once <tt>filter_2</tt> has been realised because that element of the lazy sequence that we requested has been realised. Notice that the following elements was realised too, as it was integral in the calculation. Thus no further output would be generated by <tt>(nth [1 3 2 6 4 5] 2)</tt>.
<P>
The rest of the sequence is generated similarly, but I'm not going to belabour the point any further.
<h1>Summary</h1>
Its easy to think about laziness when working with something that looks like a stream, or is somehow inherently sequential. However, the benefits of laziness are not all limited to that class of problem, as virtually all calculation is sequential. By capturing the calculation in a lazy sequence, and unhinging our brains slightly, we can reap great benefits. In this case although sorting the whole sequence is naturally still O(nlogn), we can get far lower complexity for partial sorts, and critically without having to change the underlying algorithm.
@Github3742
Copy link

amazing. now this is combined with a token logger. hope you love your creation

@Github3742
Copy link

this is 10 years old I had to make It into malware. you can't blame me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment