Skip to content

Instantly share code, notes, and snippets.

@diegommarino
Last active August 10, 2018 10:47
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save diegommarino/c0c82f0155b57590364fe88f5fa8f7d8 to your computer and use it in GitHub Desktop.
Save diegommarino/c0c82f0155b57590364fe88f5fa8f7d8 to your computer and use it in GitHub Desktop.
Extract URLs and Notes from PDFs
# Credits to https://gist.github.com/danlucraft/5277732
require 'pdf-reader'
class Annotations
def initialize(reader, page)
@objects = reader.objects
@page = reader.page(page)
@all_annots = annots_on_page
end
def notes
@all_annots.select { |a| is_note?(a) }
end
def links
links_obj = @all_annots.select { |a| is_link?(a) }
links_obj.map { |link| @objects[link[:A]][:URI] } unless links_obj.nil?
end
private
def is_link?(object)
object[:Type] == :Annot && [:Link].include?(object[:Subtype])
end
def is_note?(object)
object[:Type] == :Annot && [:Text, :FreeText].include?(object[:Subtype])
end
def annots_on_page
references = (@page.attributes[:Annots] || [])
lookup_all(references).flatten
end
def lookup_all(refs)
refs = *refs
refs.map { |ref| lookup(ref) }
end
def lookup(ref)
object = @objects[ref]
return object unless object.is_a?(Array)
lookup_all(object)
end
end
# Usage example
puts 'Running...'
file = "file_path"
puts "File: #{file}"
doc = PDF::Reader.new(file)
anot = Annotations.new(doc, 1)
puts "#{anot.links}"
puts "#{anot.notes}"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment