Skip to content

Instantly share code, notes, and snippets.

@markoa
Created January 4, 2010 09:48
Show Gist options
  • Star 8 You must be signed in to star a gist
  • Fork 6 You must be signed in to fork a gist
  • Save markoa/268428 to your computer and use it in GitHub Desktop.
Save markoa/268428 to your computer and use it in GitHub Desktop.
Script to export posts from a Wordpress XML file to Jekyll (Textile) files. Collects comments in a YAML file too.
#!/usr/bin/env ruby
# Input: WordPress XML export file.
# Outputs: a series of Textile files ready to be included in a Jekyll site,
# and comments.yml which contains all approved comments with metadata which
# can be used for a Disqus import.
require 'rubygems'
require 'hpricot'
require 'clothred'
require 'time'
require 'yaml'
WORDPRESS_XML_FILE_PATH = "/home/marko/Documents/wordpress.2010-01-01.xml"
OUTPUT_PATH = "/tmp/export"
ORIGINAL_DOMAIN = "http://example.com"
class Post
attr_accessor :title, :post_date, :created_at, :slug, :url, :content, :textile_content
attr_accessor :hpricot_element
def initialize(item)
@hpricot_element = item
@title = item.search("title").first.inner_text
@post_date = item.search("wp:post_date").first.inner_text
@created_at = Date.parse(post_date)
@slug = item.search("wp:post_name").first.inner_text
@url = ORIGINAL_DOMAIN + "/" + item.search("wp:post_date_gmt").first.inner_text[0, 10].gsub(/-/, "/") + "/" + @slug
@content = item.search("content:encoded").first.inner_text
text = ClothRed.new(content)
@textile_content = text.to_textile
end
def to_jekyll
buf = ""
buf << "---\n"
buf << "layout: post\n"
buf << "title: #{title}\n"
buf << "---\n\n"
buf << textile_content
end
def save(root_path)
File.open("#{root_path}/#{created_at}-#{slug}.textile", "w") { |file| file.write self.to_jekyll }
self
end
def save_comments(path)
comment_elements = @hpricot_element.search("wp:comment").reject do |c|
c.search("wp:comment_approved").inner_text != "1"
end
File.open("#{path}/comments.yml", "a") do |yaml_file|
comment_elements.collect { |el| Comment.new(self, el) }.each { |comment| comment.write_to yaml_file }
end
end
class << self
def parse(element, path)
return nil unless element.is_a?(Hpricot::Elem)
post = Post.new(element)
post.save(path)
end
end
end
class Comment
attr_accessor :author_name, :author_email, :author_url, :content, :post
def initialize(post, element)
@post_url = post.url + "/"
@author_name = element.search("wp:comment_author").first.inner_text
@author_email = element.search("wp:comment_author_email").first.inner_text
@author_url = element.search("wp:comment_author_url").first.inner_text
@content = element.search("wp:comment_content").first.inner_text || ""
comment_date = element.search("wp:comment_date_gmt").first.inner_text
@created_at = Time.parse("#{comment_date} GMT")
end
def write_to(file)
file.write self.to_yaml + "\n" unless @content.size == 0
end
end
# main
doc = Hpricot(File.open(WORDPRESS_XML_FILE_PATH))
File.open("#{OUTPUT_PATH}/comments.yml", "w") { |f| }
(doc / "item").each do |item|
post = Post.parse(item, OUTPUT_PATH)
post.save_comments(OUTPUT_PATH)
end
#!/usr/bin/env ruby
# Takes comments.yml generated by wordpressxml2jekyll.rb and posts them to your Disqus forum.
# sudo gem install disqus
require 'disqus'
require 'disqus/api'
COMMENTS_YAML_FILE = '/tmp/export/comments.yml'
Disqus::defaults[:api_key] = "N4wWciM45UAfBJe6QbylR0mfQ340WH7kdEKlBi7q5Tb0QeAKHOxP7wC6W5WyJWWz"
forum_id = Disqus::Api.get_forum_list["message"].first["id"]
fak = Disqus::Api.get_forum_api_key(:forum_id => forum_id)["message"]
File.open(COMMENTS_YAML_FILE) do |yf|
YAML.each_document( yf ) do |c|
thread = Disqus::Api.get_thread_by_url(:forum_api_key => fak, :url => c.ivars["post_url"])
Disqus::Api.create_post(:forum_api_key => fak,
:thread_id => thread["message"]["id"],
:message => c.ivars["content"],
:author_name => c.ivars["author_name"],
:author_email => c.ivars["author_email"],
:author_url => c.ivars["author_url"],
:created_at => Time.parse(c.ivars["created_at"].to_s).strftime("%Y-%m-%dT%H:%M"))
end
end
@ecerulm
Copy link

ecerulm commented Jul 30, 2010

The title in the YAML Front Matter is not properly escaped. If the title of the post is for example "command not found: clear" the YAML Front Matter becomes
---
layout: post
title: WordPress migration: ” (quotes) and ‘ (apostrophe) being replaced with “ and ’


---

but and it should be


---
layout: post
title: "Wordpress migration:  \" (quotes) and ' (apostrophe) being replaced with \xC3\xA2\xE2\x82\xAC\xC5\x93 and \xC3\xA2\xE2\x82\xAC\xE2\x84\xA2"


---

I fixed that in my gist http://gist.github.com/500506

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment