Skip to content

Instantly share code, notes, and snippets.

@tdtds
Last active December 31, 2015 10:08
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save tdtds/7970902 to your computer and use it in GitHub Desktop.
BOOKSCANから送られてきたPDFに、Amazonの書影で表紙を付け、メタ情報のCreaterをScanSnapに変更するスクリプト。srcに置いて、dstに出力。
#!/usr/bin/env ruby
require 'open-uri'
require 'rexml/document'
def metainfo(isbn)
uri = 'http://rpaproxy.tdiary.org/rpaproxy/jp/'
uri << "?Service=AWSECommerceService"
uri << "&SubscriptionId=1CVA98NEF1G753PFESR2"
uri << "&Operation=ItemLookup"
uri << "&ItemId=#{isbn}"
uri << "&IdType=ASIN"
uri << "&ResponseGroup=Medium"
uri << "&Version=2011-08-01"
xml = open(uri, &:read)
meta = {}
doc = REXML::Document::new(REXML::Source::new(xml)).root
item = doc.elements.to_a( '*/Item' )[0]
meta[:title] = item.elements.to_a('*/Title').first.text
meta[:author] = [].tap{|a|item.elements.each('*/Author'){|author|a << author.text}}.sort.uniq.join(',')
meta[:cover] = (item.elements.to_a('LargeImage').first || item.elements.to_a('ImageSets/ImageSet\LargeImage').first).elements['URL'].text
if meta[:title] =~ /\(.[^(]+\)/
meta[:title] = meta[:title].sub(/\(.[^(]+\)/, '').strip
end
meta
end
def dstfile(meta)
"dst/" + "#{meta[:title]} - #{meta[:author]}.pdf".tr('<>\\/', '()\/')
end
Dir.glob('org/*.pdf').each do |org|
# getting metainfo from amazon
base = File.basename(org)
print "#{base}: "
isbn = base.scan(/.*_(.*)\.pdf/).first.first
begin
meta = metainfo(isbn)
rescue OpenURI::HTTPError
puts 'Amazon error, skip'
next
end
if File.exist?(dstfile(meta))
puts 'Dest file exist, skip'
next
end
print '.'
# replace coreation tool to ScanSnap
pdfmeta = ''
system("pdftk '#{org}' dump_data output meta.txt")
open('meta.txt', &:read).split(/\n/).each_slice(2) do |pair|
if pair[0] =~ /InfoKey:\s*Creator$/
pair[1] = "InfoValue: PFU ScanSnap Manager 5.1.10 #S1300"
end
pdfmeta << pair.join("\n") << "\n"
end
open('meta2.txt', 'w'){|o|o.write(pdfmeta)}
print '.'
system("pdftk '#{org}' update_info meta2.txt output tmp.pdf")
print '.'
# insert cover image
open('cover.jpg', 'wb') do |o|
o.write(open(meta[:cover], 'rb', &:read))
end
print '.'
system('sam2p -j:quiet cover.jpg cover.pdf')
print '.'
system("pdftk cover.pdf tmp.pdf cat output '#{dstfile(meta)}'")
puts 'done'
# delete tmp files
%w(tmp.pdf meta.txt meta2.txt cover.jpg cover.pdf).map{|file|FileUtils.rm file}
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment