pythonesque/OPTIMIZATION.md

## OPTIMIZATION.md

      
    Raw
  

              OPTIMIZATION.md
            
          
    Five basic methods of optimization

These are just some thoughts I had today about optimization and I decided to write them down before I forgot what I was thinking or changed my mind.
Suppose you have a bottleneck function, f, as part of a larger system.


Do exactly f, but faster.
This is the most straightforward approach.  Basically, compute the exact same thing f is, but do it faster.  This can include speeding up subproblems, improving the algorithm used in f itself, switching to faster hardware, performing precomputation in places that aren't time-critical (so there isn't a tradeoff from this perspective), and so on.
Easy example: your system is the Fibonacci sequence.  You can define it like this:
(defun fib (n)
  (cond ((< n 1) 0)
        ((eq n 1) 1)
        ((eq n 2) 1)
        (t (+ (fib (- n 1)) (fib (- n 2))))))
For large values of n this implementation is bottlenecked by the fact that computing (fib (- n 1)) and (fib (- n 2)) is so slow.  We can solve it by switching to a tail-recursive solution.
(defun fib-iter (n)
  (labels
    ((f (m a b)
        (cond ((< m 1) 0)
              ((eq m 1) a)
              ((eq m 2) b)
              (t (f (- m 1) b (+ a b))))))
    (f n 1 1)))
We are still doing exactly what we did before (find (fib-iter n) by summing (fib-iter (- n 1)) and (fib-iter (- n 2))), but doing it in a smarter way.


Find a clever way to avoid doing f.
If (as is generally the case) f is only needed in the context of some other problem, perhaps you don't really need to compute f.  Maybe you can rewrite the outer problem to avoid it entirely.  There are numerous examples of this.  The Fibonacci sequence provides such an example, actually.  Using math (!), we can determine that it actually isn't necessary to compute the value of (fib (- n 1)) and (fib (- n 2)) to find (fib n).  We can, in fact, do it like this:
(defun fib-expt (n)
  (let
    ((phi (/ (+ 1 (sqrt 5)) 2))
     (psi (/ (- 1 (sqrt 5)) 2)))
    (cond ((= n 0) 0)
          (t (/ (- (expt phi n) (expt psi n)) (- phi psi))))))
Of course, this is not without its pitfalls.  Those of you who actually try running this will note that it doesn't quite return the same thing that (fib n) does, since expt returns a floating point number.  Beyond requiring a type conversion to satisfy the original contract, this starts to matter a lot as the numbers get larger, where fib-iter retains its precision but fib-expt devolves into approximation.  And even beyond that, is it really faster than fib-iter?  The answer depends a lot on implementation-specific details like how expt is calculated, whether the operations can be hardware-accelerated, whether the compiler is able to precompute phi and psi, and so on.  In general, this is a common theme when it comes to optimization: it's nice when you can just plain make the original problem faster (1), but the rest of the time (2 through 4), it's all about tradeoffs, benchmarking, and understanding the needs of your application.
(The above caveats aside, if it works for you this solution is clearly faster than fib for large n.


Give up on doing f by relaxing the guarantees of the system.
Sometimes, we don't need to know f quickly for all legal inputs.  Sometimes we can get away with making it fast only for some inputs that we think are more common than others, or even let it be flat out wrong sometimes (it's not nearly as crazy as it seems--most practical primality testing algorithms fall into this category, for example, and there is an enormous body of work on the subject of the computational complexity of approximation algorithms).  Given the issues with the example I used for (2), perhaps it should fall into that category as well, but there are many clear examples of (2) (switching from a simple O(n^2) sorting algorithm to a O(n log n) one arguably falls into that category).  (3) tends to be a deliberate decision on the part of the implementor.  For example, let's assume (hypothetically) that we have determined that people almost never ask for fib(n) with n > 30.  Yes, I know this is somewhat implausible, but we've already decided (fib n) is a bottleneck for some reason, and that our language of choice for talking about optimization is LISP, so we've already thrown common sense out the window.  Let's see just how fast we can make this go by changing the contract so that we require n to be between 0 and 30:
(let
  ((fib-array (make-array 31 :initial-element 0 :element-type (type-expand `(integer 0 ,(- (expt 2 31) 1))))))
  (loop
    for i from 0
    while (< i 31)
    do (setf (aref fib-array i) (fib i)))
  (defun fib30 (n) (aref fib-array n)))
f has been replaced by an array lookup.  Again, whether this is really faster or works for you depends on many things.  Sometimes compilers will perform optimizations like this themselves, e.g. through loop unrolling, so it might not actually be faster.  It might be slower than you think, too (though it's unlikely to be slower than fib-iter); sometimes, array access is slower than branching (e.g. if the vast majority of calls to fib were for a single value, branch prediction might get it run speculatively), in which case it would be necessary to rewrite this (or possibly drop down to a lower-level language like Rust with a true switch statement).  And of course since this function does no explicit bounds checking (though it is LISP), it's up to the caller to ensure that only valid value of n get here, and to decide how to handle the others.
Finally, the eagle-eyed will note that we are no longer calculating (fib n) as (+ (fib (- n 1)) (fib (- n 2))) at call time, so perhaps it is also an example of (1), and that because we are making use of precomputation it could even be considered an example of (2).  I would argue, however, that these methods are all relative to the initial function f.  If f had been the same as our original definition of fib but with a contract that restricted n to be between 1 and 30, then this would certainly be an example of (2).  And if we were considering fib30 in combination with another function that allowed it to emulate the full functionality of fib, then it would be an example of (1) (the basic algorithm stays the same, but we with changes to make it faster).  (3) only comes into play when the optimization of f actually changes the contract, to the point where it is in fact solving a different problem from the original one.  To find examples of (3) that do not involve some alteration of the algorithm one would be restricted to uninteresting examples (e.g. just limiting the original fib implementation to 30 elements without doing anything different, just to prevent the bottleneck from taking forever).  It should be noted, however, that the "boring" solution is also very often the one actually taken, since it has the notable advantage of being extremely quick to implement.  When a server is choking up and a quick fix is needed, such solutions are a developer's best friend.


Modify f to help speed up the rest of the system.
The previous three examples focused on situations where when our bottleneck was f, we did the obvious thing: we tried to optimize f.  But sometimes that's not possible, or leads to worse problems down the road.  The reason for this is that it is rare for your bottleneck to be the actual item of interest that your system provides, or for f to comprise the entire system (although it can happen).  Much more often, f is a part of a larger problem that lies on the critical path.  Nearly as often, f will depend upon subproblems whose solutions are also required by the rest of the system, or there will be multiple ways f can satisfy its contract that have variable impacts upon the performance of the rest of the system.  fib is not really the best example of this (nor is LISP the best language with which to demonstrate it, for that matter), but consider the following approach:
(let
  ((fib-hash-table (make-hash-table :size 1024)))
  (setf (gethash 1 fib-hash-table) 1)
  (setf (gethash 2 fib-hash-table) 1)
  (labels
    ((f (n)
      (if (< n 1)
          0
          (multiple-value-bind
            (cached-result found) (gethash n fib-hash-table)
            (if found
                cached-result
                (let
                  ((result (+ (f (- n 1)) (f (- n 2)))))
                  (setf (gethash n fib-hash-table) result)))))))
    (defun fib-dynamic (n)
        (f n))))
This algorithm illustrates what I consider to be the primary ethos of (4): f is tuned to making the system as a whole (fib) faster, rather than its particular call (though that too is faster, which is why fib is not the best example of this--but to make the subproblem as slow as fib the first time it was called would actually be more work in this case).  Either way, the tradeoff here is intended to be between performance of f itself (and compared to fib-iter I suspect it is slower the first time through) and the performance of repeated calls to fib.  If your system is a bit larger than f and fib happens to be used in a way that involves repeated calculations of Fibonacci numbers, this algorithm will be very good.  Of course, a smarter implementation would do all sorts of things better (including be less vulnerable to stack overflows).  But this is inherently a contrived example.  Real examples are often much more subtle.  Algorithms with amortized complexity guarantees (like splay trees) provide good examples of this, and so do some dynamic programming problems.


Modify the system to move f off the critical path.
Sometimes, it turns out that your bottleneck can be avoided--either by reducing the number of times f is called substantially, or calling it in a way that does not interfere with the functionality of your system.  Fib provides a pretty bad example of this, however.  But consider a system in which a user can enter numbers on the command line.  For each number (fib n) will be calculated in order.  If the user interface is intended to be responsive, then (fib n) must be quite performant, on the surface.  But if (fib n) has its processing moved to the background, or the function is rewritten to interleave with checks for more input, it is not nearly as important that f be fast.  The above example also arguably provides an example of (5) since fib caches itself, but the two concepts do not always intersect.


Hopefully this aside on optimization techniques is interesting or helpful to someone other than myself :)