Skip to content

Instantly share code, notes, and snippets.

@dreikanter
Last active December 5, 2015 13:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dreikanter/30af1917e5356e0cc028 to your computer and use it in GitHub Desktop.
Save dreikanter/30af1917e5356e0cc028 to your computer and use it in GitHub Desktop.
N-grams counter
require 'pp'
WORDS = %w(correct horse battery staple)
LENGTH = 100
SOURCE = (0..LENGTH).map { WORDS.sample }
puts "Source: #{SOURCE.join(', ')}"
###
NGRAM_LENGTH = 2
TOP_LENGTH = 10
###
ngram = -> (i) { SOURCE[i, NGRAM_LENGTH].join(' ') }
ngrams = (0..(LENGTH - NGRAM_LENGTH + 1)).map(&ngram)
counters = ngrams.each_with_object(Hash.new(0)) { |item, hash| hash[item] += 1 }
pp Hash[counters.sort_by { |_key, value| value }.last(TOP_LENGTH).reverse]
Source: battery, correct, correct, correct, staple, battery, correct, staple, staple, correct, correct, correct, correct, horse, staple, staple, correct, correct, staple, horse, staple, correct, horse, staple, staple, staple, battery, horse, correct, battery, horse, battery, correct, horse, correct, correct, horse, horse, correct, battery, battery, correct, correct, battery, horse, battery, correct, staple, battery, battery, battery, correct, battery, correct, battery, staple, horse, staple, correct, correct, correct, horse, battery, correct, horse, correct, correct, battery, battery, battery, correct, staple, correct, staple, battery, staple, staple, battery, battery, battery, staple, correct, horse, battery, horse, correct, battery, horse, battery, battery, horse, correct, horse, battery, horse, horse, horse, staple, correct, correct, staple
{"correct correct"=>12,
"battery correct"=>9,
"battery battery"=>8,
"correct horse"=>8,
"battery horse"=>7,
"correct staple"=>7,
"correct battery"=>7,
"staple correct"=>7,
"horse correct"=>6,
"horse battery"=>6}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment