Skip to content

Instantly share code, notes, and snippets.

@klochner
Created July 30, 2009 00:33
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save klochner/158474 to your computer and use it in GitHub Desktop.
Save klochner/158474 to your computer and use it in GitHub Desktop.
def freq(letter)
freq_dict = {"a" => 8.167,
"b" => 1.492,
"c" => 2.782,
"d" => 4.253,
"e" => 12.702,
"f" => 2.228,
"g" => 2.015,
"h" => 6.094,
"i" => 6.966,
"j" => 0.153,
"k" => 0.772,
"l" => 4.025,
"m" => 2.406,
"n" => 6.749,
"o" => 7.507,
"p" => 1.929,
"q" => 0.095,
"r" => 5.987,
"s" => 6.327,
"t" => 9.056,
"u" => 2.758,
"v" => 0.978,
"w" => 2.360,
"x" => 0.150,
"y" => 1.974,
"z" => 0.074}
freq_dict[letter] || 10 #hack so double-letters get a high score
end
#shorten one piece of the url
def shorten(word,size)
#leave intact if it's a number
return word if word =~ /\A\d+\z/
#pull out duplicates and put letters in an array
# e.g., mississippi ==> ["m","i","ss","i","ss","i","pp","i"]
letter_arr = word.scan(/((\w)\2*)/).collect{|a| a.first}
#grab the indices so we can reassemble
with_indices = []
letter_arr[1..-2].each_with_index {|v,i| with_indices << [v,i]}
#sort based on frequencies (giving double letters highest priority)
with_indices.sort!{|a,b| freq(a[0]) <=> freq(b[0])}
#prune the word down
1.upto(2+with_indices.length-size) { with_indices.pop}
#and reassemble the word
with_indices.sort!{|a,b| a[1] <=> b[1]}
letter_arr[0]+
with_indices.collect {|v,index| v.length > 1 ? v[0..0] : v}.join +
letter_arr.last
end
#the main function just shortens all parts of the path after the host:
def shorten_url(url,host,size)
to_shorten = url.split(host)[1]
host + to_shorten.split("/").collect{|elem| shorten(elem,size)}.join("/")
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment