View richhicky.clj
(defn words [text] (re-seq #"[a-z]+" (.toLowerCase text)))
(defn train [features]
(reduce (fn [model f] (assoc model f (inc (get model f 1)))) {} features))
(def *nwords* (train (words (slurp "big.txt"))))
(defn edits1 [word]
(let [alphabet "abcdefghijklmnopqrstuvwxyz", n (count word)]
(distinct (concat
View git-blast
#!/usr/bin/env python
Shows git branches sorted by last commit date, noting when branch has been
$ git blast
* master 33 minutes ago
david 4 days ago [M]
unholy-david-payments 4 days ago
handsontable-2 5 days ago
View parse-arg.clj
;; Based on
(defn parse-arg [s]
(condp (comp next re-matches) s
#"([+-])(\w+)" :>> (fn [[f k]] [(keyword k) (= f "+")])
#"(\w+)=(\w+)" :>> (fn [[k v]] [(keyword k) v])
(throw (IllegalArgumentException. s))))
(parse-arg "+foo") ;=> [:foo true]
(parse-arg "-foo") ;=> [:foo false]
View gist:3bf900821cc70f57cdb4
  1. General Background and Overview

Business Models


Models Examples
Display ads Yahoo!
Search ads Google
View latency.txt
Latency Comparison Numbers
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy 3,000 ns
Send 1K bytes over 1 Gbps network 10,000 ns 0.01 ms
Read 4K randomly from SSD* 150,000 ns 0.15 ms
View gist:8259970
  1. General Background and Overview
View tmux_cheatsheet.markdown

tmux cheatsheet

As configured in my dotfiles.

start new:


start new with session name:


A friend asked me for a few pointers to interesting, mostly recent papers on data warehousing and "big data" database systems, and I figured I'd share the list. This is biased and rather incomplete but maybe of interest to someone. While many are obvious choices, I think there are a few underappreciated gems.

Dataflow/Stream Processing Engines:

Dryad--general-purpose distributed parallel dataflow engine

Google Dremel--columnar storage for fast queries (c.f. Impala)