Skip to content

Instantly share code, notes, and snippets.

@humbroll
Created March 2, 2015 02:05
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save humbroll/913f82646172cecb8523 to your computer and use it in GitHub Desktop.
Save humbroll/913f82646172cecb8523 to your computer and use it in GitHub Desktop.
top_twenty_frequent_words
# Download this file - The Adventures of Sherlock Holmes
# http://www.gutenberg.org/cache/epub/1661/pg1661.txt
#
# Write a program to print the 20 most frequent words in the document, in
# descending order,
# with counts. Output format looks like:
#
# 9213 the
# 3223 i
def top_twenty_frequent_words(text)
words_count = {}
words = text.split(/\W+/)
words_count = words.inject({}) do |count, word|
normalized_word = word.downcase
count[normalized_word] = 0 if count[normalized_word].nil?
count[normalized_word] += 1
count
end
top_twenty = words_count.sort_by(&:last).reverse[0..19]
top_twenty.each_with_index do |(word, count), i|
puts "#{count}\t#{word}"
end
end
sherlock_holmes = File.readlines("./pg1661.txt").join
top_twenty_frequent_words(sherlock_holmes)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment