Last active
March 30, 2021 01:11
-
-
Save fike/1898ce6b4b386fe2b4ad to your computer and use it in GitHub Desktop.
Recovery Octopress post to markdown
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env ruby | |
# This script (recovery_octopress_p.rb) convert Octopress post, more specific the body in | |
# Octopress markdown files and it need that pages stay in local computer. | |
# Its license is LGPL-2.1 and more details, you can look this FSF link: | |
# - https://www.gnu.org/licenses/old-licenses/lgpl-2.1.html | |
# | |
# It was developed for Fernando Ike de Oliveira <fike@midstorm.org>. | |
# | |
# Some bugs knows: | |
# - A problem to convert "." in title to dot word. Octopress was do much better | |
# - The "-" incresead more than need. For example: foo---bar instead foo-bar | |
# - You need to create a directory _post for this script work | |
# | |
# | |
require 'nokogiri' | |
require 'pandoc-ruby' | |
files = Dir.glob("**/201[0-9]/**/*.html") | |
files.each { |file| | |
doc = Nokogiri::HTML(File.open(file)) | |
title = doc.xpath('/html/body/div/div/div/article/header/h1').children | |
date = doc.xpath('/html/body/div/div/div/article/header/p/time').attr('datetime') | |
t_html = doc.xpath('/html/body/div/div/div/article/div') | |
t_markdown = PandocRuby.convert(t_html, :from => :html, :to => :markdown) | |
categories = Array.new | |
g_categories = doc.xpath('/html/body/div/div/div/article/footer/p/span[2]/a').children | |
year = Time.parse(date).strftime("%Y") | |
day = Time.parse(date).strftime("%d") | |
month = Time.parse(date).strftime("%m") | |
hour = Time.parse(date).strftime("%H") | |
minute = Time.parse(date).strftime("%M") | |
titlename = title.to_s.gsub(/\s+/, "-").downcase.tr("ÁÀÃáàãÉÈẼéèẽÍÌíìÓÒÕóòõÚÙúùÇç","AAAaaaEEEeeeIIiiOOOoooUUuucc") | |
titlename = titlename.gsub(/[^0-9A-Za-z\-]/, '') | |
filename = '_post/' + year + '-' + month + '-' + day + '-' + titlename + "\.markdown" | |
categories = Array.new | |
g_categories.each do |category| | |
categories.push(category.to_s) | |
end | |
text = <<EOF | |
--- | |
layout: post | |
title: "#{title}" | |
date: #{year}-#{month}-#{day} #{hour}:#{minute} | |
comments: true | |
published: true | |
categories: #{categories} | |
tags: #{categories} | |
--- | |
#{t_markdown} | |
EOF | |
File.open(filename, "w+") { |f| f.write(text) } | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment