Skip to content

Instantly share code, notes, and snippets.

@rupeshtiwari
Forked from lolobosse/blogspot_to_jekyll.rb
Created February 26, 2021 22:16
Show Gist options
  • Save rupeshtiwari/80f2203fee697a94e4b11b75b856aa56 to your computer and use it in GitHub Desktop.
Save rupeshtiwari/80f2203fee697a94e4b11b75b856aa56 to your computer and use it in GitHub Desktop.
Migrate your blogger blog posts to jekyll.
#!/usr/bin/env ruby
#
# Convert blogger (blogspot) posts to jekyll posts
#
# Basic Usage
# -----------
#
# ./blogger_to_jekyll.rb feed_url
#
# where `feed_url` can have the following format:
#
# http://{your_blog_name}.blogspot.com/feeds/posts/default
#
# Requirements
# ------------
#
# * feedzirra: https://github.com/pauldix/feedzirra
#
# Notes
# -----
#
# * Make sure Blogger shows full output of article in feeds.
# * Commenting on migrated articles will be set to false by default.
include RbConfig
require 'rubygems' if CONFIG['host_os'].start_with? "darwin"
require 'feedjira' # gem install feedjira
require 'date'
require 'optparse'
require 'httparty' # gem install httparty
def parse_post_entries(feed, verbose)
posts = []
feed.entries.each do |post|
obj = Hash.new
created_datetime = post.updated
creation_date = Date.parse(created_datetime.to_s)
title = post.title
file_name = creation_date.to_s + "-" + title.split(/ */).join("-").delete('\/') + ".html"
content = post.content
obj["file_name"] = file_name
obj["title"] = title
obj["creation_datetime"] = created_datetime
obj["updated_datetime"] = post.updated
obj["content"] = content
obj["categories"] = post.categories.join(" ")
posts.push(obj)
end
return posts
end
def write_posts(posts, verbose)
Dir.mkdir("_posts") unless File.directory?("_posts")
total = posts.length, i = 1
posts.each do |post|
file_name = "_posts/".concat(post["file_name"])
header = %{---
layout: post
title: #{post["title"]}
date: #{post["creation_datetime"]}
updated: #{post["updated_datetime"]}
comments: false
categories: #{post["categories"]}
---
}
File.open(file_name, "w+") {|f|
f.write(header)
f.write(post["content"])
f.close
}
if verbose
puts " [#{i}/#{total[0]}] Written post #{file_name}"
i += 1
end
end
end
def main
options = {}
opt_parser = OptionParser.new do |opt|
opt.banner = "Usage: ./blogger_to_jekyll.rb FEED_URL [OPTIONS]"
opt.separator ""
opt.separator "Options"
opt.on("-v", "--verbose", "Print out all.") do
options[:verbose] = true
end
end
opt_parser.parse!
if ARGV[0]
feed_url = ARGV.first
else
puts opt_parser
exit()
end
puts "Fetching feed #{feed_url}..."
xml = HTTParty.get("http://roopkt.blogspot.com/feeds/posts/default").body
feed = Feedjira.parse(xml)
puts "Parsing feed..."
posts = parse_post_entries(feed, options[:verbose])
puts "Writing posts to _posts/..."
write_posts(posts, options[:verbose])
puts "Done!"
end
main()
@talonx
Copy link

talonx commented Mar 7, 2023

xml = HTTParty.get("http://roopkt.blogspot.com/feeds/posts/default").body
This should take the feed_url and not be hardcoded?

@nikhilsilveira
Copy link

xml = HTTParty.get("http://roopkt.blogspot.com/feeds/posts/default").body.body)
This should take the feed_url and not be hardcoded?

Indeed it should. Users please note:

  • line 104: xml = HTTParty.get({"http://roopkt.blogspot.com/feeds/posts/default"}).body
    change this to xml = HTTParty.get({feed_url}).body

  • lines 39, 40 , ie, the block:

    title = post.title
    file_name = creation_date.to_s + "-" + title.split(/  */).join("-").delete('\/') + ".html"

    with:

    title = post.title
    safe_title = title.gsub(/[^0-9A-Za-z.\- ]/, '').strip.gsub(/\s+/, '-')
    file_name = "#{creation_date}-#{safe_title}.html"
    

    My import was failing due to special characters in the blog titles, like '?'. This edit is for sanitizing file names.

Thank you Rupesh, Talonx, and ChatGPT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment