Skip to content

Instantly share code, notes, and snippets.

@tom-galvin
Created May 13, 2014 22:18
Show Gist options
  • Save tom-galvin/5c26c5e4470aa307d783 to your computer and use it in GitHub Desktop.
Save tom-galvin/5c26c5e4470aa307d783 to your computer and use it in GitHub Desktop.
DailyProgrammer Challenge #162i Solution (Novel Compression, pt. 2: Compressing the Data)
def tokenize(word, dict)
return word[0] if ['-', 'R*'].include? word
token = (dict.find_index word.downcase.gsub(/[.,\?!;:]/, '')).to_s
(dict << word.downcase.gsub(/[.,\?!;:]/, '');
token = (dict.length - 1).to_s) unless token.length > 0
case word
when /^[a-z]*[.,\?!;:]?$/; # nothing
when /^[A-Z][a-z]*[.,\?!;:]?$/; token += '^'
when /^[A-Z]*[.,\?!;:]?$/; token += '!'
else; puts "Error! Invalid case or punctuation in word #{word}."; abort
end
word.match(/[.,\?!;:]$/) {|m| word = word[0..(word.length - 1)]; token += ' ' + m[0]}
(puts "Error! Invalid punctuation #{word[-1]}."; abort) unless word[-1].match /[a-zA-Z.,\?!;:]/
return token
end
def compress(data)
dict = []
word_list = data.lines.map do |l|
(l + ' R*')
.chomp
.gsub(/\-/, ' - ')
.split(' ')
end.flatten.map do |w|
tokenize(w, dict)
end
word_data = word_list
.each_slice(24)
.to_a
.map {|ws| ws.join ' '}
.join "\n"
return "#{dict.length}\n#{dict.join "\n"}\n#{word_data} E"
end
data = gets(nil)
puts compress data
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment