Create a gist now

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Convert RTF/DOC files to Markdown via Textutil
#!/usr/bin/ruby
=begin
Usage: rtftomarkdown.rb FILENAME.rtf
Uses textutil, available on Mac only (installed by default)
Outputs to STDOUT
Notes:
Links are replaced with Markdown references (duplicate links combined).
This works fine on RTF files, but the markup that the Word/textutil
combination produces seems impossible to work with. Most links disappear
when converting from a DOC or DOCX file, and even Word's RTF export is
unworkable.
List levels converted by textutil can not be changed at a given depth.
If you start a second-level nested list as an ordered list, the next
second-level list will be ordered. It's a textutil/RTF thing.
=end
write_file = false # for Services set to true to write out files with .md extension
if ARGV.length == 0
puts "#{__FILE__} expects an input file (RTF or DOC) as an argument"
exit
end
def remove_empty(input)
input.gsub!(/(<(\w+)( class=".*?")?>\s*<\/\2>)/,'')
input = remove_empty(input) if input =~ /(<(\w+)( class=".*")?>\s*<\/\2>)/
return input.strip
end
ARGV.each do |infile|
file = infile.sub(/\/$/,'')
if File.exists?(File.expand_path(file))
ext = file.match(/\.(\w+)$/)[1]
input = %x{/usr/bin/textutil -convert html -stdout "#{file}"}.strip
input.gsub!(/.*?<body>(.*?)<\/body>.*/m,"\\1")
# remove span/br tags, unneccessary
input.gsub!(/<br>/,'')
input.gsub!(/<\/?span( class=".*?")?>/,'')
# substitute headers
input.gsub!(/<p class="p1">(?:<b>)?(.+?)(?:<\/b>)?<\/p>/,'# \\1')
input.gsub!(/<p class="p2"><b>(.+?)<\/b><\/p>/,'## \\1')
input.gsub!(/<p class="p3"><b>(.+?)<\/b><\/p>/,'## \\1')
input.gsub!(/<p class="p4"><b>(.+?)<\/b><\/p>/,'### \\1')
input.gsub!(/<p class="p5"><b>(.+?)<\/b><\/p>/,'### \\1')
input = input.split("\n").map { |line|
remove_empty(line)
}.join("\n")
# remove paragraph tags
input.gsub!(/<p class="p\d">(.*?)<\/p>/,'\\1')
# emphasis
input.gsub!(/<\/?b>/,'**')
input.gsub!(/<\/?i>/,'*')
# links
links = {}
footer = ''
input.gsub!(/<a href="(.*?)">(.*?)<\/a>/) do |match|
if links.has_key? $1
marker = links[$1]
else
links[$1] = links.length + 1
footer += "\n[#{links[$1]}]: #{$1}"
end
"[#{$2}][#{links[$1]}]"
end
input = input.split("\n").map { |line|
line.strip
}.join("\n")
# handle lists
list_level = 0
list_type = []
input = input.split("\n").map { |line|
if line =~ /<([uo])l.*?>/
list_level += 1
list_type[list_level] = $1 =~ /u/ ? "*" : "1."
"*REMOVEME"
elsif line =~ /<\/[uo]l>/
list_level -= 1
"*REMOVEME"
else
indent = ""
(list_level -1).times do indent += " " end
line.gsub(/<li.*?>(.*?)<\/li>/,"#{indent}#{list_type[list_level]} \\1")
end
}.delete_if {|line|
line =~ /\*REMOVEME/
}.join("\n")
if write_file
open(file+".md", 'w+') { |f|
f.puts input
f.puts footer
}
else
puts input
puts footer
end
else
puts "File not found: #{file}"
end
end
@andypiper

This comment has been minimized.

Show comment
Hide comment
@andypiper

andypiper Jul 12, 2013

This is brilliant, and just acted as a nice Google Docs -> RTF export -> Markdown converter for me. Thanks!

This is brilliant, and just acted as a nice Google Docs -> RTF export -> Markdown converter for me. Thanks!

@toadkicker

This comment has been minimized.

Show comment
Hide comment
@toadkicker

toadkicker Nov 12, 2013

+1 would execute again

+1 would execute again

@janpaul123

This comment has been minimized.

Show comment
Hide comment
@janpaul123

janpaul123 Mar 25, 2014

Nice, thanks! For more complicated documents (e.g. with math) I've made this repo, hope this could be useful for some: https://github.com/Versal/word2markdown

Nice, thanks! For more complicated documents (e.g. with math) I've made this repo, hope this could be useful for some: https://github.com/Versal/word2markdown

@arronwall

This comment has been minimized.

Show comment
Hide comment
@arronwall

arronwall May 12, 2014

Hi,
Thanks for your nice sharing. But using code to deal with the related Word Doc converting projects is too complicated for me. As for myself, I prefer to do it with the help of some 3rd party Word Converters. It can make all the converting process much simpler and faster. You can also google it and select one to help you with the related Word Docs converting work. It will be better if it is totally manual and can be customized by users according to our own favors. Remember to check its free trial package first if possible. I hope you success. Good luck.

Best regards,
Arron

Hi,
Thanks for your nice sharing. But using code to deal with the related Word Doc converting projects is too complicated for me. As for myself, I prefer to do it with the help of some 3rd party Word Converters. It can make all the converting process much simpler and faster. You can also google it and select one to help you with the related Word Docs converting work. It will be better if it is totally manual and can be customized by users according to our own favors. Remember to check its free trial package first if possible. I hope you success. Good luck.

Best regards,
Arron

@jlconlin

This comment has been minimized.

Show comment
Hide comment
@jlconlin

jlconlin Oct 16, 2015

Okay, this is probably the best thing that I've found on the internet in a very long time. So very useful. Thank you!

Okay, this is probably the best thing that I've found on the internet in a very long time. So very useful. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment