Skip to content

Instantly share code, notes, and snippets.

@johnbellone
Created September 6, 2012 11:20
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save johnbellone/3655023 to your computer and use it in GitHub Desktop.
Save johnbellone/3655023 to your computer and use it in GitHub Desktop.
Script to import from Wordpress XML to Markdown for Jekyll
source :rubygems
gemspec
Gem::Specification.new do |s|
s.authors = ["John Bellone"]
s.email = 'jb@thunkbrightly.com'
s.homepage = 'http://github.com/johnbellone/wp-to-md'
s.require_paths = %w[lib]
end
require 'nokogiri'
require 'date'
doc = Nokogiri::XML(File.open(ARGV[0]))
doc.xpath('//item').each do |node|
pubDate = DateTime.parse(node.xpath('.//pubDate').inner_text)
title = node.xpath('.//title').inner_text.gsub(/\:/, '')
description = node.xpath('.//description').inner_text
description = description.prepend('description: ') if (description.length > 0)
tags = node.xpath('.//category/text()').to_a.join(',').split(/(\W)/).map(&:capitalize).join
content = node.xpath('.//content:encoded/text()').inner_text
unless content.empty?
filename = pubDate.strftime('%Y-%m-%d-').concat(title.downcase.strip.gsub(/\s/,'-').gsub(/[^\w-]/,'')).concat('.md')
doc = <<-text
---
layout: post
title: #{title}
#{description}
tags: #{tags}
---
#{content}
text
File.open(filename, 'w+') { |f| f.write(doc) }
end
end
@johnbellone
Copy link
Author

Obviously very crude right now, there are some issues converting from the Wordpress XML to Markdown in regards to the content blocks containing malformed HTML (or at the very least, HTML). I haven't had time to go through this and figure out all the problems. But luckily its easy to test!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment