Skip to content

Instantly share code, notes, and snippets.

@chtzvt
Last active June 4, 2022 05:02
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save chtzvt/a9cdcd617b1016b2351f to your computer and use it in GitHub Desktop.
Save chtzvt/a9cdcd617b1016b2351f to your computer and use it in GitHub Desktop.
Simple script to save your entire Pocket list (unread + archived) in PDF form.
=begin
Pocket Export.rb
My first 'real' Ruby script (hello world)!
More info here: http://blog.ctis.me/2015/12/archiving-your-pocket-articles-with-ruby.html
DEPENDENCIES:
pocket_export requires the following gems:
curb
nokogiri
pocket_export requires the following packages (install them with your system's package manager):
wkhtmltopdf - wkhtmltopdf.org
USAGE:
**make sure that dependencies are installed first**
Go to https://getpocket.com/export/, and download the HTML file with your pocket data. Then, run this script with
the full path to the HTML file supplied as an argument (e.g. ~/Downloads/ril_export.html). The script will begin downloading
items immediately, and will save download files in ./pocket_export_data. Errors, if encountered, are logged in pocket_export_errors.log
NOTE:
This process can potentially be fairly CPU-intensive, as all pages are downloaded and rendered as PDFs. If you have many items in your list, the process is
going to take a while.
=end
require 'curb'
require 'open-uri'
require 'nokogiri'
if ARGV.length < 1
abort("pocket_export.rb /path/to/ril_export.html")
else
pocket_data = ARGV[0]
Dir.mkdir("./pocket_export_data/") unless File.exists?("./pocket_export_data/")
end
Nokogiri::HTML(open(pocket_data)).css('a').each { |link|
begin
# Set link to value of href attribute of <a> tag.
link = link['href']
# Follow any redirects until final destination is found (url shorteners etc).
curl = Curl::Easy.perform(link.gsub("\n",'')) do |curl|
curl.head = true
curl.follow_location = true
end
# Fetch the webpage title for use in the filename.
title = Nokogiri::HTML(open(curl.last_effective_url)).at('title').text.gsub("'", "").gsub('"','')
puts "\n\n\n***Downloading #{title} (#{link})..."
# Run wkhtmltopdf
system("wkhtmltopdf '#{link}' ./pocket_export_data/'#{title}.pdf'")
rescue
# Catch and log any exceptions.
puts "\n\n\n!!!Downloading #{link} FAILED!!\n\n\n"
File.open('./pocket_export_data/pocket_export_errors.log', 'a') { |errorlog|
errorlog.write("Error: " << a << "\n")
}
end
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment