Skip to content

Instantly share code, notes, and snippets.

@KL-7
Created December 17, 2011 12:13
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save KL-7/1490066 to your computer and use it in GitHub Desktop.
Save KL-7/1490066 to your computer and use it in GitHub Desktop.
AI class NLP problem 1 solution
#!/usr/bin/env ruby
# Data is taken from Wikipedia: http://j.mp/english-letters-frequencies
LETTERS_FREQUENCIES = {
'a' => 8.167,
'b' => 1.492,
'c' => 2.782,
'd' => 4.253,
'e' => 12.702,
'f' => 2.228,
'g' => 2.015,
'h' => 6.094,
'i' => 6.966,
'j' => 0.153,
'k' => 0.772,
'l' => 4.025,
'm' => 2.406,
'n' => 6.749,
'o' => 7.507,
'p' => 1.929,
'q' => 0.095,
'r' => 5.987,
's' => 6.327,
't' => 9.056,
'u' => 2.758,
'v' => 0.978,
'w' => 2.360,
'x' => 0.150,
'y' => 1.974,
'z' => 0.074
}
TEXT = "Esp qtcde nzyqpcpynp zy esp ezatn zq Lcetqtntlw Tyepwwtrpynp hld spwo le Olcexzfes Nzwwprp ty estd jplc."
LETTERS_COUNT = TEXT.gsub(/[^a-zA-Z]/, '').size
alphs = lambda { |alph| alph.join + alph.join.upcase }
ALPH = ('a'..'z').to_a
ALPHS = alphs.call ALPH
rot = ALPH.dup
results = []
begin
rot.push rot.shift
rots = alphs.call rot
original = TEXT.tr rots, ALPHS
frequencies = ALPH.inject({}) { |m, letter| m[letter] = original.downcase.count(letter); m }
diff = ALPH.inject(0) do |total, letter|
total + (LETTERS_FREQUENCIES[letter] - 100.0 * frequencies[letter] / LETTERS_COUNT).abs
end
results << { original: original, diff: diff }
end until rot == ALPH
results.sort_by{ |r| r[:diff] }.each do |r|
puts "# %06.2f - '#{r[:original]}'" % r[:diff]
end
### RESULTS ###
# 033.39 - 'The first conference on the topic of Artificial Intelligence was held at Dartmouth College in this year.'
# 083.83 - 'Esp qtcde nzyqpcpynp zy esp ezatn zq Lcetqtntlw Tyepwwtrpynp hld spwo le Olcexzfes Nzwwprp ty estd jplc.'
# 086.74 - 'Iwt uxghi rdcutgtcrt dc iwt idexr du Pgixuxrxpa Xcitaaxvtcrt lph wtas pi Spgibdjiw Rdaatvt xc iwxh ntpg.'
# 092.40 - 'Pda benop ykjbanajya kj pda pkley kb Wnpebeyewh Ejpahhecajya swo dahz wp Zwnpikqpd Ykhhaca ej pdeo uawn.'
# 094.34 - 'Znk loxyz iutlkxktik ut znk zuvoi ul Gxzoloiogr Otzkrromktik cgy nkrj gz Jgxzsuazn Iurrkmk ot znoy ekgx.'
# 101.10 - 'Aol mpyza jvumlylujl vu aol avwpj vm Hyapmpjphs Pualsspnlujl dhz olsk ha Khyatvbao Jvsslnl pu aopz flhy.'
# 101.16 - 'Gur svefg pbasrerapr ba gur gbcvp bs Negvsvpvny Vagryyvtrapr jnf uryq ng Qnegzbhgu Pbyyrtr va guvf lrne.'
# 101.70 - 'Ftq rudef oazrqdqzoq az ftq fabuo ar Mdfuruoumx Uzfqxxusqzoq ime tqxp mf Pmdfyagft Oaxxqsq uz ftue kqmd.'
# 104.09 - 'Xli jmvwx gsrjivirgi sr xli xstmg sj Evxmjmgmep Mrxippmkirgi aew liph ex Hevxqsyxl Gsppiki mr xlmw ciev.'
# 105.15 - 'Sgd ehqrs bnmedqdmbd nm sgd snohb ne Zqshehbhzk Hmsdkkhfdmbd vzr gdkc zs Czqslntsg Bnkkdfd hm sghr xdzq.'
# 106.02 - 'Max ybklm vhgyxkxgvx hg max mhibv hy Tkmbybvbte Bgmxeebzxgvx ptl axew tm Wtkmfhnma Vheexzx bg mabl rxtk.'
# 106.23 - 'Wkh iluvw frqihuhqfh rq wkh wrslf ri Duwlilfldo Lqwhooljhqfh zdv khog dw Gduwprxwk Froohjh lq wklv bhdu.'
# 106.60 - 'Cqn orabc lxwonanwln xw cqn cxyrl xo Jacrorlrju Rwcnuurpnwln fjb qnum jc Mjacvxdcq Lxuunpn rw cqrb hnja.'
# 107.76 - 'Uif gjstu dpogfsfodf po uif upqjd pg Bsujgjdjbm Joufmmjhfodf xbt ifme bu Ebsunpvui Dpmmfhf jo uijt zfbs.'
# 108.03 - 'Nby zclmn wihzylyhwy ih nby nijcw iz Ulnczcwcuf Chnyffcayhwy qum byfx un Xulngionb Wiffyay ch nbcm syul.'
# 108.19 - 'Hvs twfgh qcbtsfsbqs cb hvs hcdwq ct Ofhwtwqwoz Wbhszzwusbqs kog vszr oh Rofhacihv Qczzsus wb hvwg msof.'
# 108.51 - 'Vjg hktuv eqphgtgpeg qp vjg vqrke qh Ctvkhkekcn Kpvgnnkigpeg ycu jgnf cv Fctvoqwvj Eqnngig kp vjku agct.'
# 109.05 - 'Ymj knwxy htskjwjshj ts ymj ytunh tk Fwynknhnfq Nsyjqqnljshj bfx mjqi fy Ifwyrtzym Htqqjlj ns ymnx djfw.'
# 109.91 - 'Kyv wzijk tfewvivetv fe kyv kfgzt fw Rikzwztzrc Zekvcczxvetv nrj yvcu rk Urikdflky Tfccvxv ze kyzj pvri.'
# 111.52 - 'Jxu vyhij sedvuhudsu ed jxu jefys ev Qhjyvysyqb Ydjubbywudsu mqi xubt qj Tqhjcekjx Sebbuwu yd jxyi ouqh.'
# 111.81 - 'Ocz admno xjiazmzixz ji ocz ojkdx ja Vmodadxdvg Diozggdbzixz rvn czgy vo Yvmohjpoc Xjggzbz di ocdn tzvm.'
# 112.19 - 'Rfc dgpqr amldcpclac ml rfc rmnga md Yprgdgagyj Glrcjjgeclac uyq fcjb yr Byprkmsrf Amjjcec gl rfgq wcyp.'
# 113.23 - 'Dro psbcd myxpoboxmo yx dro dyzsm yp Kbdspsmskv Sxdovvsqoxmo gkc rovn kd Nkbdwyedr Myvvoqo sx drsc iokb.'
# 117.42 - 'Bpm nqzab kwvnmzmvkm wv bpm bwxqk wn Izbqnqkqit Qvbmttqomvkm eia pmtl ib Lizbuwcbp Kwttmom qv bpqa gmiz.'
# 117.77 - 'Qeb cfopq zlkcbobkzb lk qeb qlmfz lc Xoqfcfzfxi Fkqbiifdbkzb txp ebia xq Axoqjlrqe Zliibdb fk qefp vbxo.'
# 118.15 - 'Lzw xajkl ugfxwjwfuw gf lzw lghau gx Sjlaxauasd Aflwddaywfuw osk zwdv sl Vsjlegmlz Ugddwyw af lzak qwsj.'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment