Created
September 6, 2012 11:20
-
-
Save johnbellone/3655023 to your computer and use it in GitHub Desktop.
Script to import from Wordpress XML to Markdown for Jekyll
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
*.xml | |
*.md |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
source :rubygems | |
gemspec |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Gem::Specification.new do |s| | |
s.authors = ["John Bellone"] | |
s.email = 'jb@thunkbrightly.com' | |
s.homepage = 'http://github.com/johnbellone/wp-to-md' | |
s.require_paths = %w[lib] | |
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'nokogiri' | |
require 'date' | |
doc = Nokogiri::XML(File.open(ARGV[0])) | |
doc.xpath('//item').each do |node| | |
pubDate = DateTime.parse(node.xpath('.//pubDate').inner_text) | |
title = node.xpath('.//title').inner_text.gsub(/\:/, '') | |
description = node.xpath('.//description').inner_text | |
description = description.prepend('description: ') if (description.length > 0) | |
tags = node.xpath('.//category/text()').to_a.join(',').split(/(\W)/).map(&:capitalize).join | |
content = node.xpath('.//content:encoded/text()').inner_text | |
unless content.empty? | |
filename = pubDate.strftime('%Y-%m-%d-').concat(title.downcase.strip.gsub(/\s/,'-').gsub(/[^\w-]/,'')).concat('.md') | |
doc = <<-text | |
--- | |
layout: post | |
title: #{title} | |
#{description} | |
tags: #{tags} | |
--- | |
#{content} | |
text | |
File.open(filename, 'w+') { |f| f.write(doc) } | |
end | |
end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Obviously very crude right now, there are some issues converting from the Wordpress XML to Markdown in regards to the content blocks containing malformed HTML (or at the very least, HTML). I haven't had time to go through this and figure out all the problems. But luckily its easy to test!