Skip to content

Instantly share code, notes, and snippets.

@billdueber
Created February 9, 2015 18:43
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save billdueber/a388b2d0d3aec7f96134 to your computer and use it in GitHub Desktop.
Save billdueber/a388b2d0d3aec7f96134 to your computer and use it in GitHub Desktop.
Ruby-marc about half as fast as marc4j doing MARC21 and MARC-XML under Traject
jruby 1.7.18 (1.9.3p551) 2014-12-22 625381c on Java HotSpot(TM) 64-Bit Server VM 1.8.0-b132 +jit [darwin-x86_64]
Rehearsal ------------------------------------------------
marc4j w/bin 19.640000 1.000000 20.640000 ( 8.313000)
ruby w/bin 22.470000 0.600000 23.070000 ( 14.930000)
marc4j w/xml 12.950000 1.130000 14.080000 ( 6.189000)
ruby w/xml 19.970000 0.390000 20.360000 ( 12.946000)
-------------------------------------- total: 78.150000sec
user system total real
marc4j w/bin 8.430000 0.970000 9.400000 ( 6.782000)
ruby w/bin 15.590000 0.500000 16.090000 ( 13.988000)
marc4j w/xml 7.800000 1.090000 8.890000 ( 5.717000)
ruby w/xml 11.720000 0.220000 11.940000 ( 10.790000)
$:.unshift "#{File.dirname(__FILE__)}/../lib"
require 'traject'
require 'traject/marc_reader'
require 'traject/marc4j_reader'
require 'traject/null_writer'
require 'benchmark'
xmlfilename = '10k.xml'
mrcfilename = '10k.mrc'
indexer = Traject::Indexer.new(
"log.level" => "error"
)
indexer.writer_class = Traject::NullWriter
indexer.to_field('constant') do |rec, acc|
acc << 'Constant'
end
puts "\n\n#{RUBY_DESCRIPTION}"
Benchmark.bmbm do |x|
x.report('marc4j w/bin') do
indexer.settings['marc_source.type'] = 'binary'
indexer.settings['marc4j_reader.permissive'] = true
indexer.reader_class = Traject::Marc4JReader
indexer.process(File.open(mrcfilename))
end
x.report('ruby w/bin ') do
indexer.settings['marc_source.type'] = 'binary'
indexer.reader_class = Traject::MarcReader
indexer.process(File.open(mrcfilename))
end
puts "\n"
x.report('marc4j w/xml') do
indexer.settings['marc_source.type'] = 'xml'
indexer.reader_class = Traject::Marc4JReader
indexer.process(File.open(xmlfilename))
end
x.report('ruby w/xml ') do
indexer.settings['marc_source.type'] = 'xml'
indexer.reader_class = Traject::MarcReader
indexer.process(File.open(xmlfilename))
end
puts "\n\n"
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment