public
Last active

Import a blogger archive to jekyll

  • Download Gist
import.rb
Ruby
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
require 'rubygems'
require 'nokogiri'
require 'fileutils'
require 'date'
 
# usage: ruby import.rb my-blog.xml
# my-blog.xml is a file from Settings -> Basic -> Export in blogger.
 
data = File.read ARGV[0]
doc = Nokogiri::XML(data)
 
@posts = {}
 
def add(node)
id = node.search('id').first.content
type = node.search('category').first.attr('term').split('#').last
case type
when 'post'
@posts[id] = Post.new(node)
when 'comment'
reply_to = node.children.find {|c| c.name == 'in-reply-to' }
post_id = reply_to.attr('ref')
#post_id = node.search('thr').first.attr('ref')
@posts[post_id].add_comment(Comment.new(node))
when 'template', 'settings'
else
raise 'dunno '+type
end
end
 
def write(post)
puts "Post [#{post.title}] has #{post.comments.count} comments"
 
puts "writing #{post.file_name}"
File.open(File.join('_posts', post.file_name), 'w') do |file|
file.write post.header
file.write "\n\n"
file.write "<h1>{{ page.title }}</h1>\n"
file.write "<div class='post'>\n"
file.write post.content
file.write "</div>\n"
file.write "<h2>Comments</h2>\n"
file.write "<div class='comments'>\n"
post.comments.each do |comment|
file.write "<div class='comment'>\n"
file.write "<div class='author'>"
file.write comment.author
file.write "</div>\n"
file.write "<div class='content'>\n"
file.write comment.content
file.write "</div>\n"
file.write "</div>\n"
end
file.write "</div>\n"
end
end
 
class Post
attr_reader :comments
def initialize(node)
@node = node
@comments = []
end
 
def add_comment(comment)
@comments.unshift comment
end
 
def title
@node.search('title').first.content
end
 
def content
@node.search('content').first.content
end
 
def creation_date
creation_datetime.strftime("%Y-%m-%d")
end
 
def creation_datetime
Date.parse(@node.search('published').first.content)
end
 
def file_name
param_name = title.split(/[^a-zA-Z0-9]+/).join('-').downcase
%{#{creation_date}-#{param_name}.html}
end
 
def header
[
'---',
%{layout: post},
%{title: #{title}},
%{date: #{creation_datetime}},
%{comments: false},
'---'
].join("\n")
end
end
 
class Comment
def initialize(node)
@node = node
end
 
def author
@node.search('author name').first.content
end
 
def content
@node.search('content').first.content
end
end
 
entries = {}
 
doc.search('entry').each do |entry|
add entry
end
 
FileUtils.rm_rf('_posts')
Dir.mkdir("_posts") unless File.directory?("_posts")
 
@posts.each do |id, post|
write post
end

This script is great and saved me a lot of time. One minor nitpick: it doesn't translate colons in the YAML front-matter to an : entity.

This script helped me alot!
But I had to fix for when the node type is 'page'. Other than that, it worked perfectly.
Kudos!

When I run your script, I get the following error:

[orschiro@thinkpad Blogger to Github]$ ruby import.rb blog-08-03-2013.xml 
WARNING: Nokogiri was built against LibXML version 2.8.0, but has dynamically loaded 2.9.1
import.rb:27:in `add': dunno page (RuntimeError)
    from import.rb:119:in `block in <main>'
    from /home/orschiro/.gem/ruby/2.0.0/gems/nokogiri-1.6.0/lib/nokogiri/xml/node_set.rb:237:in `block in each'
    from /home/orschiro/.gem/ruby/2.0.0/gems/nokogiri-1.6.0/lib/nokogiri/xml/node_set.rb:236:in `upto'
    from /home/orschiro/.gem/ruby/2.0.0/gems/nokogiri-1.6.0/lib/nokogiri/xml/node_set.rb:236:in `each'
    from import.rb:118:in `<main>'

Can you please help me on this?

I was able to fix the error @orshiro is referring to by changing line 25 to:

when 'template', 'settings', 'page'

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.