Last active
December 20, 2015 04:29
-
-
Save billdueber/6070784 to your computer and use it in GitHub Desktop.
Using marc4j from jruby
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# I just nabbed the source of marc4j and built it with "ant jar" | |
require 'marc4j-2.5.1-beta.jar' | |
# Conveniently add Enumerable to the reader interface so I can get #each, #each_with_index, etc. | |
# This would be automatic if MarcReader were specified as an iterable, as per a recent github issue | |
# on the marc4j repo (https://github.com/marc4j/marc4j/issues/11) | |
module org.marc4j::MarcReader | |
include Enumerable | |
def each | |
if block_given? | |
while self.hasNext | |
yield self.next | |
end | |
else | |
self.to_enum(:each) | |
end | |
end | |
end | |
# Pull in the 'batch.dat' file from the ruby-marc test suite | |
istream = java.io.FileInputStream.new('batch.dat') | |
# Make a reader out of it. I'm specifying utf-8, but you could leave it blank | |
# and get the "best guess" as well | |
reader = org.marc4j.MarcStreamReader.new(istream, 'UTF-8') | |
reader.each do |r| | |
# whatever | |
end | |
# Do the same thing with a permissive reader | |
istream = java.io.FileInputStream.new('batch.dat') | |
reader = org.marc4j.MarcPermissiveStreamReader.new(istream, true, true) | |
iter = reader.each # get the iterator | |
puts iter.next | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment