Created
December 3, 2010 13:58
-
-
Save etagwerker/726979 to your computer and use it in GitHub Desktop.
Ruby script to count words in a file and order them by occurrences
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Script to index words inside a text file. | |
# Words separated by spaces. | |
# Usage: ruby indexer.rb /path/to/file.txt | |
file_path = ARGV[0] || "votacion.txt" | |
WORDS_COUNT = {} | |
file = File.open(file_path, "r") | |
puts "Indexing #{file_path}" | |
file.each_line do |line| | |
words = line.split | |
words.each do |word| | |
word = word.gsub(/[,()'"]/,'') | |
if WORDS_COUNT[word] | |
WORDS_COUNT[word] += 1 | |
else | |
WORDS_COUNT[word] = 1 | |
end | |
end | |
end | |
puts "Indexed #{file_path}" | |
puts "Words count: " | |
WORDS_COUNT.sort {|a,b| a[1] <=> b[1]}.each do |key,value| | |
puts "#{key} => #{value}" | |
end | |
puts "The end. " |
This really came in handy for me to use while making my girlfriends xmas present. I tweaked it and made it work for seeding a rails database but not having to spend that extra time writing from scratch was nice! http://jailee.us
Thanks!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Don't you think it will blow off heap, in case txt file is very large(say 1 TB)?