Created
June 4, 2022 05:27
-
-
Save kewld00d/66e0567dd9e882fb00e8b948b022df87 to your computer and use it in GitHub Desktop.
export pocket
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
=begin | |
Pocket Export.rb | |
My first 'real' Ruby script (hello world)! | |
More info here: http://blog.ctis.me/2015/12/archiving-your-pocket-articles-with-ruby.html | |
DEPENDENCIES: | |
pocket_export requires the following gems: | |
curb | |
nokogiri | |
pocket_export requires the following packages (install them with your system's package manager): | |
wkhtmltopdf - wkhtmltopdf.org | |
USAGE: | |
**make sure that dependencies are installed first** | |
Go to https://getpocket.com/export/, and download the HTML file with your pocket data. Then, run this script with | |
the full path to the HTML file supplied as an argument (e.g. ~/Downloads/ril_export.html). The script will begin downloading | |
items immediately, and will save download files in ./pocket_export_data. Errors, if encountered, are logged in pocket_export_errors.log | |
NOTE: | |
This process can potentially be fairly CPU-intensive, as all pages are downloaded and rendered as PDFs. If you have many items in your list, the process is | |
going to take a while. | |
=end | |
require 'curb' | |
require 'open-uri' | |
require 'nokogiri' | |
if ARGV.length < 1 | |
abort("pocket_export.rb /path/to/ril_export.html") | |
else | |
pocket_data = ARGV[0] | |
Dir.mkdir("./pocket_export_data/") unless File.exists?("./pocket_export_data/") | |
end | |
Nokogiri::HTML(open(pocket_data)).css('a').each { |link| | |
begin | |
# Set link to value of href attribute of <a> tag. | |
link = link['href'] | |
# Follow any redirects until final destination is found (url shorteners etc). | |
curl = Curl::Easy.perform(link.gsub("\n",'')) do |curl| | |
curl.head = true | |
curl.follow_location = true | |
end | |
# Fetch the webpage title for use in the filename. | |
title = Nokogiri::HTML(open(curl.last_effective_url)).at('title').text.gsub("'", "").gsub('"','') | |
puts "\n\n\n***Downloading #{title} (#{link})..." | |
# Run wkhtmltopdf | |
system("wkhtmltopdf '#{link}' ./pocket_export_data/'#{title}.pdf'") | |
rescue | |
# Catch and log any exceptions. | |
puts "\n\n\n!!!Downloading #{link} FAILED!!\n\n\n" | |
File.open('./pocket_export_data/pocket_export_errors.log', 'a') { |errorlog| | |
errorlog.write("Error: " << a << "\n") | |
} | |
end | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
=begin | |
Pocket Export.rb | |
My first 'real' Ruby script (hello world)! | |
More info here: http://blog.ctis.me/2015/12/archiving-your-pocket-articles-with-ruby.html | |
DEPENDENCIES: | |
pocket_export requires the following gems: | |
curb | |
nokogiri | |
pocket_export requires the following packages (install them with your system's package manager): | |
wkhtmltopdf - wkhtmltopdf.org | |
USAGE: | |
**make sure that dependencies are installed first** | |
Go to https://getpocket.com/export/, and download the HTML file with your pocket data. Then, run this script with | |
the full path to the HTML file supplied as an argument (e.g. ~/Downloads/ril_export.html). The script will begin downloading | |
items immediately, and will save download files in ./pocket_export_data. Errors, if encountered, are logged in pocket_export_errors.log | |
NOTE: | |
This process can potentially be fairly CPU-intensive, as all pages are downloaded and rendered as PDFs. If you have many items in your list, the process is | |
going to take a while. | |
=end | |
require 'curb' | |
require 'open-uri' | |
require 'nokogiri' | |
if ARGV.length < 1 | |
abort("pocket_export.rb /path/to/ril_export.html") | |
else | |
pocket_data = ARGV[0] | |
Dir.mkdir("./pocket_export_data/") unless File.exists?("./pocket_export_data/") | |
end | |
Nokogiri::HTML(open(pocket_data)).css('a').each { |link| | |
begin | |
# Set link to value of href attribute of <a> tag. | |
link = link['href'] | |
# Follow any redirects until final destination is found (url shorteners etc). | |
curl = Curl::Easy.perform(link.gsub("\n",'')) do |curl| | |
curl.head = true | |
curl.follow_location = true | |
end | |
# Fetch the webpage title for use in the filename. | |
title = Nokogiri::HTML(open(curl.last_effective_url)).at('title').text.gsub("'", "").gsub('"','') | |
puts "\n\n\n***Downloading #{title} (#{link})..." | |
# Run wkhtmltopdf | |
system("wkhtmltopdf '#{link}' ./pocket_export_data/'#{title}.pdf'") | |
rescue | |
# Catch and log any exceptions. | |
puts "\n\n\n!!!Downloading #{link} FAILED!!\n\n\n" | |
File.open('./pocket_export_data/pocket_export_errors.log', 'a') { |errorlog| | |
errorlog.write("Error: " << a << "\n") | |
} | |
end | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment