Skip to content

Instantly share code, notes, and snippets.

@jmnsf
Last active March 19, 2017 03:43
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jmnsf/61a53c2ab06950d6ad2bb953e23a023d to your computer and use it in GitHub Desktop.
Save jmnsf/61a53c2ab06950d6ad2bb953e23a023d to your computer and use it in GitHub Desktop.
Extract annotations from Marvin 2.0
require 'nokogiri'
if ARGV.length != 1
puts "Usage:\n$ ruby marvin-annotations.rb <path-to-library.mrvi>"
exit
end
def extract_highlights(book)
min_ts = 3000000000 # very large
max_ts = 0
highlights = book.xpath('highlights/highlight').map do |highlight|
ts = highlight.xpath('@dateTime').to_s.to_f
min_ts = [min_ts, ts].min
max_ts = [max_ts, ts].max
highlight.xpath('@text').to_s
end
{ first_ts: min_ts, last_ts: max_ts, highlights: highlights }
end
def extract_book_annotations(book)
title = book.xpath('@title').to_s
author = book.xpath('@authorSort').to_s
subjects = book.xpath('subjects/subject').map do |subject|
subject.xpath('text()').to_s
end.join(', ')
first_ts, last_ts, highlights = extract_highlights(book)
.values_at(:first_ts, :last_ts, :highlights)
return '' if highlights.length.zero?
text_highlights = highlights.join("\n\n")
%Q{
#{title}
#{'-' * title.length}
_By: #{author}_
**About:** #{subjects}
**Timespan:** From #{Time.at(first_ts).strftime('%b %d, %Y')} to #{Time.at(last_ts).strftime('%b %d, %Y')}
### Annotations
#{text_highlights}
}
end
def extract_annotations(books)
books.map { |book| extract_book_annotations book }
end
xml = File.open(ARGV[0]) { |f| Nokogiri::XML f }
annotations = extract_annotations xml.xpath "/marvin/book"
puts %Q{
Marvin Annotations
==================
#{annotations.select { |an| an.length > 0 }.join("\n\n")}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment