Last active
August 16, 2016 00:14
-
-
Save saidie/450fdcd7658acfdfcbc5 to your computer and use it in GitHub Desktop.
Convert Pocket export file for org-mode
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
source "https://rubygems.org" | |
gem "nokogiri" | |
gem "open_uri_redirections" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
GEM | |
remote: https://rubygems.org/ | |
specs: | |
mini_portile2 (2.0.0) | |
nokogiri (1.6.7.2) | |
mini_portile2 (~> 2.0.0.rc2) | |
open_uri_redirections (0.2.1) | |
PLATFORMS | |
ruby | |
DEPENDENCIES | |
nokogiri | |
open_uri_redirections | |
BUNDLED WITH | |
1.11.2 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'open-uri' | |
require 'open_uri_redirections' | |
require 'nokogiri' | |
def parse_entry(entry) | |
title = entry.text | |
link = entry['href'] | |
added = Time.at(entry['time_added'].to_i).strftime('%Y-%m-%d %a %H:%M') | |
tags = if entry['tags'].empty? | |
[] | |
else | |
entry['tags'].split(',') | |
end | |
[title, link, added, tags] | |
end | |
def output_entry(entry) | |
title, link, added, tags = parse_entry(entry) | |
tags << '@READING' | |
if title == link | |
STDERR.puts "Load title... #{link}" | |
begin | |
html = open(link, allow_redirections: :all) | |
doc = Nokogiri::HTML(html) | |
title = doc.xpath('/html/head/title').text.strip | |
link = html.base_uri.to_s | |
rescue SocketError | |
end | |
end | |
puts <<ENTRY | |
** #{title} :#{tags.join(':')}: | |
#{link} | |
[#{added}] | |
ENTRY | |
end | |
html = STDIN.read | |
doc = Nokogiri::HTML.parse(html, nil, 'utf-8') | |
unread, read = doc.xpath('//ul') | |
puts '* Unread' | |
unread.xpath('li/a').each { |entry| output_entry(entry) } | |
puts '* Read' | |
read.xpath('li/a').each { |entry| output_entry(entry) } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Usage
% ruby org-ril-import.rb < ril-export.html