Last active
March 19, 2017 03:43
-
-
Save jmnsf/61a53c2ab06950d6ad2bb953e23a023d to your computer and use it in GitHub Desktop.
Extract annotations from Marvin 2.0
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'nokogiri' | |
if ARGV.length != 1 | |
puts "Usage:\n$ ruby marvin-annotations.rb <path-to-library.mrvi>" | |
exit | |
end | |
def extract_highlights(book) | |
min_ts = 3000000000 # very large | |
max_ts = 0 | |
highlights = book.xpath('highlights/highlight').map do |highlight| | |
ts = highlight.xpath('@dateTime').to_s.to_f | |
min_ts = [min_ts, ts].min | |
max_ts = [max_ts, ts].max | |
highlight.xpath('@text').to_s | |
end | |
{ first_ts: min_ts, last_ts: max_ts, highlights: highlights } | |
end | |
def extract_book_annotations(book) | |
title = book.xpath('@title').to_s | |
author = book.xpath('@authorSort').to_s | |
subjects = book.xpath('subjects/subject').map do |subject| | |
subject.xpath('text()').to_s | |
end.join(', ') | |
first_ts, last_ts, highlights = extract_highlights(book) | |
.values_at(:first_ts, :last_ts, :highlights) | |
return '' if highlights.length.zero? | |
text_highlights = highlights.join("\n\n") | |
%Q{ | |
#{title} | |
#{'-' * title.length} | |
_By: #{author}_ | |
**About:** #{subjects} | |
**Timespan:** From #{Time.at(first_ts).strftime('%b %d, %Y')} to #{Time.at(last_ts).strftime('%b %d, %Y')} | |
### Annotations | |
#{text_highlights} | |
} | |
end | |
def extract_annotations(books) | |
books.map { |book| extract_book_annotations book } | |
end | |
xml = File.open(ARGV[0]) { |f| Nokogiri::XML f } | |
annotations = extract_annotations xml.xpath "/marvin/book" | |
puts %Q{ | |
Marvin Annotations | |
================== | |
#{annotations.select { |an| an.length > 0 }.join("\n\n")} | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment