-
-
Save kennym/1115810 to your computer and use it in GitHub Desktop.
#!/usr/bin/env ruby | |
# | |
# Convert blogger (blogspot) posts to jekyll posts | |
# | |
# Basic Usage | |
# ----------- | |
# | |
# ./blogger_to_jekyll.rb feed_url | |
# | |
# where `feed_url` can have the following format: | |
# | |
# http://{your_blog_name}.blogspot.com/feeds/posts/default | |
# | |
# Requirements | |
# ------------ | |
# | |
# * feedzirra: https://github.com/pauldix/feedzirra | |
# | |
# Notes | |
# ----- | |
# | |
# * Make sure Blogger shows full output of article in feeds. | |
# * Commenting on migrated articles will be set to false by default. | |
include Config | |
require 'rubygems' if CONFIG['host_os'].start_with? "darwin" | |
require 'feedzirra' | |
require 'date' | |
require 'optparse' | |
def parse_post_entries(feed, verbose) | |
posts = [] | |
feed.entries.each do |post| | |
obj = Hash.new | |
created_datetime = post.updated | |
creation_date = Date.parse(created_datetime.to_s) | |
title = post.title | |
file_name = creation_date.to_s + "-" + title.split(/ */).join("-").delete('\/') + ".html" | |
content = post.content | |
obj["file_name"] = file_name | |
obj["title"] = title | |
obj["creation_datetime"] = created_datetime | |
obj["updated_datetime"] = post.updated | |
obj["content"] = content | |
obj["categories"] = post.categories.join(" ") | |
posts.push(obj) | |
end | |
return posts | |
end | |
def write_posts(posts, verbose) | |
Dir.mkdir("_posts") unless File.directory?("_posts") | |
total = posts.length, i = 1 | |
posts.each do |post| | |
file_name = "_posts/".concat(post["file_name"]) | |
header = %{--- | |
layout: post | |
title: #{post["title"]} | |
date: #{post["creation_datetime"]} | |
updated: #{post["updated_datetime"]} | |
comments: false | |
categories: #{post["categories"]} | |
--- | |
} | |
File.open(file_name, "w+") {|f| | |
f.write(header) | |
f.write(post["content"]) | |
f.close | |
} | |
if verbose | |
puts " [#{i}/#{total[0]}] Written post #{file_name}" | |
i += 1 | |
end | |
end | |
end | |
def main | |
options = {} | |
opt_parser = OptionParser.new do |opt| | |
opt.banner = "Usage: ./blogger_to_jekyll.rb FEED_URL [OPTIONS]" | |
opt.separator "" | |
opt.separator "Options" | |
opt.on("-v", "--verbose", "Print out all.") do | |
options[:verbose] = true | |
end | |
end | |
opt_parser.parse! | |
if ARGV[0] | |
feed_url = ARGV.first | |
else | |
puts opt_parser | |
exit() | |
end | |
puts "Fetching feed #{feed_url}..." | |
feed = Feedzirra::Feed.fetch_and_parse(feed_url) | |
puts "Parsing feed..." | |
posts = parse_post_entries(feed, options[:verbose]) | |
puts "Writing posts to _posts/..." | |
write_posts(posts, options[:verbose]) | |
puts "Done!" | |
end | |
main() |
+1 for feedjira
thanks for script!
when I rename feedzirra to feedjira, it work.
but feeds/posts/default option parse only some part of all my posts.
so, I change feeds/posts/default to feeds/posts/default?max-results=100 and it parse all my post.
I link about parsing all post.
http://too-clever-by-half.blogspot.kr/2011/12/blog-feed-500-post-limit-for-more-than.html
I´m getting blogspot_to_jekyll.rb:25:in
blogspot_to_jekyll.rb:27:in
<main>': uninitialized constant CONFIG (NameError)
after rename it to feedjira, why?
@danielgomezrico I got the same problem and solved it on my fork @kennym if you want, you can update yours from my code 😄 👍
Hi kennym,
Great code. Worked for me on Yosemite with some minor changes.
I removed the deprecated CONFIG call. I think rubygems is now required for El Capitan anyway.
Feedzirra is now called feedjira, so I made the appropriate changes in the code.
After these two minor changes, the code worked perfectly 10.11.3
Feel free to do a pull and merge. In my commit message, i inadvertently stated I was updated for Yosemite. This is my first fork, edit, and push of code on Git, and my first time working in Ruby.
Even I need help to migrate my blog 'https://shindesavita87.blogspot.co.uk' from blogspot to GitHub blog. Can you suggest me is it doable and if yes and how can we do that?
main': undefined method
fetch_and_parse' for Feedjira::Feed:Class (NoMethodError)
@yuceltoluyag - I'm pretty sure this script might need some updates after 9 years :-D
@yuceltoluyag - I'm pretty sure this script might need some updates after 9 years :-D
https://stackoverflow.com/questions/37371947/importing-my-blogger-blog-into-jekyll solved my problem =) ty for answer ;)
I am getting this error <internal:C:/Ruby30-x64/lib/ruby/3.0.0/rubygems/core_ext/kernel_require.rb>:85:in
require': cannot load such file -- feedzirra`
If anyone looking new changes. This is my fork https://gist.github.com/RobbiNespu/372571e4d271122ece3ee3a1830b4d26
If anyone looking new changes. This is my fork https://gist.github.com/RobbiNespu/372571e4d271122ece3ee3a1830b4d26
Thanks for the update. I have updated the code to fix few more errors which I have encountered.
My Fork: https://gist.github.com/prabathbr/0bb416b2dee7ed18d2a6fd3d8dd4b021
Updates:
- added <require 'httparty'>
- created "setup.sh" which will make a ruby environment to run the script in Ubuntu 20.04 LTS
- fixed script error[Invalid argument @ rb_sysopen --- post name ---- (Errno::EINVAL)] when running on posts with invalid post names with ":" & "*"
Hi Kennym,
Worked for me. Works like a charm. But I had some trouble because the feedzirra module is now renamed to feedjira.
You need to update the script to show that. I did the same and I could do the import then.
I have one question though: I lost the comments I had on blogger in the process. How do I migrate the comments?