Skip to content

Instantly share code, notes, and snippets.

@ttscoff
Created February 5, 2012 17:53
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ttscoff/1746898 to your computer and use it in GitHub Desktop.
Save ttscoff/1746898 to your computer and use it in GitHub Desktop.
rough draft of a Markdown converter for .doc files
#!/usr/bin/ruby
# doc2mmd.rb
# rough draft of a Markdown converter for .doc files
# requires python html2text, edit path in last shell call
def e_sh(str)
str.to_s.gsub(/(?=["\\])/, '\\')
end
ARGV.each {|f|
input = %x{/usr/bin/textutil -convert html -stdout "#{f}"}
input.gsub!(/<p.*?><br><\/p>/,'')
sizes = input.scan(/p.(p\d+) \{.*?font: ([\d\.]+)px.*?\}/)
sorted = sizes.sort_by{|a| a[1]}.reverse
count = sorted.length > 3 ? sorted.length - 3 : sorted.length - 1
sorted = sorted[0,count]
conversions = {}
sorted.each_with_index { |e,i|
conversions[e[0]] = "h#{i+1}"
}
conversions.each {|k,v|
input.gsub!(/<p class="#{k}">(.*?)<\/p>/,"<#{v}>\\1</#{v}>")
}
puts %x{echo "#{e_sh input}"|/Users/ttscoff/scripts/html2text}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment