public
Last active

Migrate your blogger blog posts to jekyll.

  • Download Gist
blogspot_to_jekyll.rb
Ruby
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115
#!/usr/bin/env ruby
#
# Convert blogger (blogspot) posts to jekyll posts
#
# Basic Usage
# -----------
#
# ./blogger_to_jekyll.rb feed_url
#
# where `feed_url` can have the following format:
#
# http://{your_blog_name}.blogspot.com/feeds/posts/default
#
# Requirements
# ------------
#
# * feedzirra: https://github.com/pauldix/feedzirra
#
# Notes
# -----
#
# * Make sure Blogger shows full output of article in feeds.
# * Commenting on migrated articles will be set to false by default.
 
include Config
 
require 'rubygems' if CONFIG['host_os'].start_with? "darwin"
require 'feedzirra'
require 'date'
require 'optparse'
 
def parse_post_entries(feed, verbose)
posts = []
feed.entries.each do |post|
obj = Hash.new
created_datetime = post.updated
creation_date = Date.parse(created_datetime.to_s)
title = post.title
file_name = creation_date.to_s + "-" + title.split(/ */).join("-").delete('\/') + ".html"
content = post.content
obj["file_name"] = file_name
obj["title"] = title
obj["creation_datetime"] = created_datetime
obj["updated_datetime"] = post.updated
obj["content"] = content
obj["categories"] = post.categories.join(" ")
posts.push(obj)
end
return posts
end
 
def write_posts(posts, verbose)
Dir.mkdir("_posts") unless File.directory?("_posts")
 
total = posts.length, i = 1
posts.each do |post|
file_name = "_posts/".concat(post["file_name"])
header = %{---
layout: post
title: #{post["title"]}
date: #{post["creation_datetime"]}
updated: #{post["updated_datetime"]}
comments: false
categories: #{post["categories"]}
---
 
}
File.open(file_name, "w+") {|f|
f.write(header)
f.write(post["content"])
f.close
}
if verbose
puts " [#{i}/#{total[0]}] Written post #{file_name}"
i += 1
end
end
end
 
def main
options = {}
opt_parser = OptionParser.new do |opt|
opt.banner = "Usage: ./blogger_to_jekyll.rb FEED_URL [OPTIONS]"
opt.separator ""
opt.separator "Options"
opt.on("-v", "--verbose", "Print out all.") do
options[:verbose] = true
end
end
 
opt_parser.parse!
if ARGV[0]
feed_url = ARGV.first
else
puts opt_parser
exit()
end
 
puts "Fetching feed #{feed_url}..."
feed = Feedzirra::Feed.fetch_and_parse(feed_url)
puts "Parsing feed..."
posts = parse_post_entries(feed, options[:verbose])
puts "Writing posts to _posts/..."
write_posts(posts, options[:verbose])
 
puts "Done!"
end
 
main()

thanks this worked for me

Thanks. It's awesome that I could help you with my script.

Worked for me - thanks!

I had to add require ''rubygems" before feedzirra to make the script work on OS X Lion.

I also had to change the way dates are parsed like this:

created_datetime = post.updated
creation_date = Date.parse(created_datetime.to_s)

I guess the Blogger RSS format must have changed at some point or something.

That's quite probable. I am glad to receive patches.

Gotcha. I threw in the changes at my fork. Feel free to merge. Might want to test first to make sure I didn't bork things up (I'm a Pythonista :) ).

I just merged your changes. Sorry, that I couldn't figure out a way to keep the credits, but your work is most appreciated! Thank you, sir

No probs. Thanks. :)

Hi, one more thing... I noticed it is better to use "created_datetime = post.published" instead of "created_datetime = post.updated". Might want to change that in your gist. You might want to store the updated time in YAML front matter btw. Probably doesn't hurt.

You might want to store the updated time in YAML front matter btw. Probably doesn't hurt.

You lost me. YAML?

See this. It's just that little snippet in front of a page.

I updated my Gist to contain the change mentioned above. Feel free to merge.

Thanks for this gist. Works

After changing Config to RbConfig it worked. Thanks!

FYI, if you'd also like to migrate the comments: http://blog.coolaj86.com/articles/migrate-from-blogger-to-ruhoh-with-proper-redirects.html

This requires changing your template (explained in the walkthrough) and exporting a backup of your blog (also explained in the walkthrough)

Thanks for this script! Everything worked well, except the handling of colon characters in the title. They make Jekyll fall over and die, for some reason. Relevant: https://github.com/mojombo/jekyll/issues/549

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.