Shannon's measure of entropy is essentially a measure of the unpredictability of a signal that has been converted into a stream of samples. His work was initially concerned with efficient representation of a signal over a network of switching relays, but it has turned out to be of great value in many areas of computer science. Applications include error correcting codes for noisy channels (how many extra bits does it take to tolerate a given level of noise?), data compression (what is the least number of bits that can represent a signal?), machine learning (which of the available sub-samples is the most informative?), and so on. One should also remember that a signal sent from one place to another in space has the same properties as ones stored, for example on a disk, which are in effect signals sent through time.
The name "entropy" for this measure was suggested to Shannon by Von Neumann because the calculation is the same as the one used to measure disorder in quantum statistical mechanics: http://en.wikipedia.org/wiki/Von_Neumann_entropy
(defn log2
"log2(x), log2(0) = 0"
[x]
(if (= x 0) 0 (/ (Math/log x) (Math/log 2))))
(defn entropy
"Entropy(S) = —p+log2(p+) —p- log2(p-)"
[p-pos p-neg]
(+ (* (- p-pos) (log2 p-pos))
(* (- p-neg) (log2 p-neg))))
The entropy of a signal containing only positive or negative samples is zero. That is, there is no uncertainty and the entire message can be compressed to one bit.
(entropy 1 0) ; all positive => 0.0
(entropy 0 1) ; all negative => 0.0
At the other end of the spectrum is a signal containing exactly half 1s and 0s, which has a statistically random distribution (think of it like a fair coin).
(entropy 1/2 1/2) ; 1/2 of each => 1.0
In a machine learning context, one can think of the entropy of the signal as representing how easy it is to summarize it. A very low entropy signal, say one with all zeroes, is very easy to summarize: "it's all zeroes." A complex message with a nearly even number of 1s and 0s is hard to summarize.
It is worth considering the obvious metaphor here for human learning and memory formation as a species of data compression in which a series of experiences become a general understanding.
Use this routine to play around with lists of 1s and 0s until you have an intuitive understanding of the entropy of different inputs.
(defn seq-entropy
"Calculate the entropy of sequence 'sq' assuming the all positive
numbers represent positive samples."
[sq]
(let [len (count sq)
pos (count (filter pos? sq))
neg (- len pos)]
(entropy (/ pos len) (/ neg len))))
(seq-entropy '(1 0 1 0 1 1 1 1 1))
=> 0.7642045065086203
Write a function: (defn random-bit-sequence [len odds] ...)
that
returns a random bit sequence with each bit having odds
probability
of being a 1. Use that function to explore how the random sequences
converge to the entropy expected from odds
as len
increases (i.e.
the Law of Large Numbers). Show your work.
(defn string-to-bits
"Returns a list containing 1s and 0s for the bits that make up the
bytes of string 's'."
[s]
(mapcat #(for [i (range 8)]
(if (not= 0 (bit-and (int (Math/pow 2 i)) %)) 1 0))
(.getBytes s)))
(seq-entropy (string-to-bits " "))
=> 0.5435644431995964
(seq-entropy (string-to-bits "abcd"))
=> 0.9744894033980523
Have a play with the function above and tell me why these strings have these entropy values. This might be a trick question.