Skip to content

Instantly share code, notes, and snippets.

@dtao
Last active December 20, 2015 06:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dtao/6090139 to your computer and use it in GitHub Desktop.
Save dtao/6090139 to your computer and use it in GitHub Desktop.
A very basic script to translate a simple HTML file (consisting only of paragraphs with bold & italic text) to Markdown.
require 'nokogiri'
file = ARGV[0]
if !File.exist?(file)
puts "File #{file} does not exist."
exit
end
html = File.read(file)
puts 'Parsing HTML...'
document = Nokogiri::HTML.parse(html)
paragraphs = []
puts 'Reading content...'
document.css('p').each do |paragraph|
# Skip blank paragraphs.
if paragraph.content =~ /\A\s*\Z/
next
end
paragraph.css('i').each do |italic|
italic.replace "*#{italic.content}*"
end
paragraph.css('b').each do |bold|
bold.replace "**#{bold.content}**"
end
# Replace newlines not at the beginning of a line.
paragraphs << paragraph.content.gsub(/(?<!^)\n/, ' ')
end
markdown_filename = File.join(File.dirname(file), File.basename(file, '.html') + '.md')
puts "Writing Markdown to #{markdown_filename}..."
File.open(markdown_filename, 'w') do |f|
f.write(paragraphs.join("\n"))
end
puts "Done!"
@dtao
Copy link
Author

dtao commented Jul 26, 2013

This is useful if you have, e.g., a basic Word document with no frills—just some bold and/or italic text. Use Word, OpenOffice, etc. to export the file to HTML, then run this script to get Markdown.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment