Skip to content

Instantly share code, notes, and snippets.

@mathematicalcoffee
Last active August 29, 2015 14:04
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mathematicalcoffee/a42c9a25616368c9460b to your computer and use it in GitHub Desktop.
Save mathematicalcoffee/a42c9a25616368c9460b to your computer and use it in GitHub Desktop.
Convert posts from httrack of blogger/blogspot to jekyll format. Run from top folder.
#!/usr/bin/ruby
# convert posts from httrack of blogger/blogspot to jekyll format
# run from top folder.
require 'nokogiri'
require 'reverse_markdown'
require 'fileutils'
require 'date'
OUT_DIR = '_posts'
AUTHOR = nil # set default author-key or nil to use from post
# ---------------------------------
posts = Dir['20??/*/*']
FileUtils.mkdir_p "#{OUT_DIR}"
posts.each do |filename|
p = Nokogiri::HTML(open(filename))
date = p.css('h2.date-header')[0].text # Sunday, 18 August 2013
date = DateTime.strptime(date, "%A, %d %B %Y").strftime('%Y-%m-%d')
body = p.css('div.post')
outfile = "#{OUT_DIR}/#{date}-#{File.basename(filename, '.html')}.md"
date = "#{date} #{p.css('a.timestamp-link')[0].text} #{DateTime.now.strftime('%z')}"
title = p.css('.post-title')[0].text.strip
labels = p.css('span.post-labels')[0]
if labels
labels = '[' + labels.text.gsub(/\n?Labels:/, '').gsub(/\n/, '').gsub(',', ', ') + ']'
else
labels = '~'
end
author = AUTHOR ? AUTHOR : p.css('div.post-footer span.fn').text.strip
puts "writing to #{outfile}"
open(outfile, 'w') do |page|
# frontmatter
page.puts "---"
page.puts "layout: post"
page.puts "title: \"#{title}\""
page.puts "date: #{date}"
page.puts "comments: true"
page.puts "categories: ~"
page.puts "tags: #{labels}"
page.puts "authors: [#{author}]"
page.puts "---"
page.puts ReverseMarkdown.convert(body.to_s)
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment