Skip to content

Instantly share code, notes, and snippets.

@geoffgarside
Created December 19, 2010 10:14
Show Gist options
  • Save geoffgarside/747242 to your computer and use it in GitHub Desktop.
Save geoffgarside/747242 to your computer and use it in GitHub Desktop.
Ruby function, calculates the shannon entropy for a given word
def word_entropy(word)
len = word.chars.count.to_f
log2 = Math.log(2)
counts = word.chars.inject({}) do |h,c|
h[c] = (h[c] || 0) + 1
h
end
counts.inject(0) do |entropy, pair|
frequency = (pair[1] / len)
entropy = (frequency * (Math.log(frequency) / log2))
end.abs
end
if __FILE__ == $0
$KCODE = "UTF8"
require 'rubygems'
require 'benchmark'
require 'active_support'
puts "Examples:"
words = ["aba17a8c1041b4b4", "gR~_'UJ78KOl:yNp",
"aaaaaaaaaaaaaaaa", "password", "blërg"]
words.each do |w|
puts 'word_entropy("%s") #=> %f' % [w, word_entropy(w)]
end
puts "\nBenchmarks:"
n = 10000
Benchmark.bm do |x|
x.report('word_entropy("aba17a8c1041b4b4"): ') { n.times { word_entropy("aba17a8c1041b4b4") }}
x.report('word_entropy("gR~_\'UJ78KOl:yNp"): ') { n.times { word_entropy("gR~_'UJ78KOl:yNp") }}
x.report('word_entropy("aaaaaaaaaaaaaaaa"): ') { n.times { word_entropy("aaaaaaaaaaaaaaaa") }}
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment