Skip to content

Instantly share code, notes, and snippets.

@dkln
Created March 2, 2009 11:43
Show Gist options
  • Save dkln/72716 to your computer and use it in GitHub Desktop.
Save dkln/72716 to your computer and use it in GitHub Desktop.
Extract text from Microsoft Word file (ment for search engine indexation)
def parse_word(file)
buffer = ""
File.open(file, 'rb').each_line { |x| buffer = buffer + x + " " if x.include?(0.chr) }
return buffer.gsub!(/[^a-zA-Z0-9\s\,\.\-@\/\_]/, '').sub!(/[,\.\-\\\/@\_]/, ' ').split(' ')
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment