Script to export posts from a Wordpress XML file to Jekyll (Textile) files. Collects comments in a YAML file too.
#!/usr/bin/env ruby
# Input: WordPress XML export file.
# Outputs: a series of Textile files ready to be included in a Jekyll site,
# and comments.yml which contains all approved comments with metadata which
# can be used for a Disqus import.
require 'rubygems'
require 'hpricot'
require 'clothred'
require 'time'
require 'yaml'
WORDPRESS_XML_FILE_PATH = "/home/marko/Documents/wordpress.2010-01-01.xml"
OUTPUT_PATH = "/tmp/export"
class Post
attr_accessor :title, :post_date, :created_at, :slug, :url, :content, :textile_content
attr_accessor :hpricot_element
def initialize(item)
@hpricot_element = item
@title ="title").first.inner_text
@post_date ="wp:post_date").first.inner_text
@created_at = Date.parse(post_date)
@slug ="wp:post_name").first.inner_text
@url = ORIGINAL_DOMAIN + "/" +"wp:post_date_gmt").first.inner_text[0, 10].gsub(/-/, "/") + "/" + @slug
@content ="content:encoded").first.inner_text
text =
@textile_content = text.to_textile
def to_jekyll
buf = ""
buf << "---\n"
buf << "layout: post\n"
buf << "title: #{title}\n"
buf << "---\n\n"
buf << textile_content
def save(root_path)"#{root_path}/#{created_at}-#{slug}.textile", "w") { |file| file.write self.to_jekyll }
def save_comments(path)
comment_elements ="wp:comment").reject do |c|"wp:comment_approved").inner_text != "1"
end"#{path}/comments.yml", "a") do |yaml_file|
comment_elements.collect { |el|, el) }.each { |comment| comment.write_to yaml_file }
class << self
def parse(element, path)
return nil unless element.is_a?(Hpricot::Elem)
post =
class Comment
attr_accessor :author_name, :author_email, :author_url, :content, :post
def initialize(post, element)
@post_url = post.url + "/"
@author_name ="wp:comment_author").first.inner_text
@author_email ="wp:comment_author_email").first.inner_text
@author_url ="wp:comment_author_url").first.inner_text
@content ="wp:comment_content").first.inner_text || ""
comment_date ="wp:comment_date_gmt").first.inner_text
@created_at = Time.parse("#{comment_date} GMT")
def write_to(file)
file.write self.to_yaml + "\n" unless @content.size == 0
# main
doc = Hpricot("#{OUTPUT_PATH}/comments.yml", "w") { |f| }
(doc / "item").each do |item|
post = Post.parse(item, OUTPUT_PATH)
#!/usr/bin/env ruby
# Takes comments.yml generated by wordpressxml2jekyll.rb and posts them to your Disqus forum.
# sudo gem install disqus
require 'disqus'
require 'disqus/api'
COMMENTS_YAML_FILE = '/tmp/export/comments.yml'
Disqus::defaults[:api_key] = "N4wWciM45UAfBJe6QbylR0mfQ340WH7kdEKlBi7q5Tb0QeAKHOxP7wC6W5WyJWWz"
forum_id = Disqus::Api.get_forum_list["message"].first["id"]
fak = Disqus::Api.get_forum_api_key(:forum_id => forum_id)["message"] do |yf|
YAML.each_document( yf ) do |c|
thread = Disqus::Api.get_thread_by_url(:forum_api_key => fak, :url => c.ivars["post_url"])
Disqus::Api.create_post(:forum_api_key => fak,
:thread_id => thread["message"]["id"],
:message => c.ivars["content"],
:author_name => c.ivars["author_name"],
:author_email => c.ivars["author_email"],
:author_url => c.ivars["author_url"],
:created_at => Time.parse(c.ivars["created_at"].to_s).strftime("%Y-%m-%dT%H:%M"))
kez commented Apr 28, 2010

Amazingly useful - thank you!

neozhang commented May 5, 2010

This looks awesome!
Would this support non-Latin characters?

markoa commented May 5, 2010

@neozheng I don't know, my blog posts don't have them. But I think that it would. If you try it please let me know.

neozhang commented May 8, 2010

@markoa just tried. perfectly transformed the file contents but not the names. so much better than the built-in converter which breaks everything.

markoa commented May 8, 2010

@neozhang since slug and content are read pretty much the same way I suspect something special needs to be done in the call, but I haven't really worked with such issues before. Feel free to fork the gist and send more comments.

ecerulm commented Jul 30, 2010

The title in the YAML Front Matter is not properly escaped. If the title of the post is for example "command not found: clear" the YAML Front Matter becomes
layout: post
title: WordPress migration: ” (quotes) and ‘ (apostrophe) being replaced with “ and ’


but and it should be

layout: post
title: "Wordpress migration:  \" (quotes) and ' (apostrophe) being replaced with \xC3\xA2\xE2\x82\xAC\xC5\x93 and \xC3\xA2\xE2\x82\xAC\xE2\x84\xA2"


I fixed that in my gist

