Skip to content

Instantly share code, notes, and snippets.

@billdueber
Created July 9, 2013 19:27
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save billdueber/5960450 to your computer and use it in GitHub Desktop.
Save billdueber/5960450 to your computer and use it in GitHub Desktop.
Compare nokogiri xml parsing vs. marc4j4r xml parsing with a roundtrip through marchash
require 'marc'
require 'marc4j4r'
require 'benchmark'
iterations = 1
xmlsourcefile = 'topics.xml' # 18k records as a MARC-XML collection
puts RUBY_DESCRIPTION
puts "#{iterations} iteration(s)\n"
Benchmark.bm do |x|
x.report("nokogiri ") do
title_length = 0
iterations.times do
reader = MARC::XMLReader.new(xmlsourcefile, :parser=>'nokogiri')
reader.each do |r|
title_length += r['245'].value.size
end
end
end
x.report("marc4j4r with marchash") do
title_length = 0
iterations.times do
reader = MARC4J4R::Reader.new(xmlsourcefile, :marcxml)
reader.each do |m4jr|
r = MARC::Record.new_from_marchash(m4jr.to_marchash)
title_length += r['245'].value.size
end
end
end
x.report("marc4j4r raw") do
title_length = 0
iterations.times do
reader = MARC4J4R::Reader.new(xmlsourcefile, :marcxml)
reader.each do |r|
title_length += r['245'].value.size
end
end
end
end
/tmp> jruby --server -J-Djruby.compile.invokedynamic=false t.rb
jruby 1.7.4 (1.9.3p392) 2013-05-16 2390d3b on Java HotSpot(TM) 64-Bit Server VM 1.7.0_40-ea-b27 [darwin-x86_64]
1 iteration(s)
user system total real
nokogiri 56.680000 0.420000 57.100000 ( 54.955000)
marc4j4r with marchash 44.800000 0.590000 45.390000 ( 35.089000)
marc4j4r raw 19.170000 0.370000 19.540000 ( 14.428000)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment