public
Last active

example of parsing large xml files in ruby using ox, define a handler, look up for a particular root element

  • Download Gist
ox_parsing.rb
Ruby
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98
require "awesome_print"
 
module XmlParsing
require "ox"
 
class Reader < ::Ox::Sax
def initialize file_path, target, target_handler
@target_handler = target_handler
@target = target
@file_path = file_path
@elements = []
end
 
def count
@count ||= `grep "<#{@target}>" #{@file_path} -o | wc -l`.to_s.strip.to_i
@count = `grep "<#{@target}" #{@file_path} -o | wc -l`.to_s.strip.to_i if @count==0
@count
end
 
def parse
xmlio = IO.new(IO.sysopen @file_path)
Ox.sax_parse self, xmlio
end
 
def start_element(name)
name = name.to_s.strip
@elements.push({ name=>{} })
end
 
def end_element(name)
name = name.to_s.strip
 
if @elements.last[name]
@element = @elements.pop
 
@element.delete name
 
if @element.keys.count==1 and @element[:text]
inject_into_last name, @element[:text]
else
inject_into_last name, @element
end
end
 
@target_handler.next_element @element if @target==name
end
 
def inject_into_last name, value
return unless @elements.last
if @elements.last[name]
@elements.last[name] = [ @elements.last[name] ] unless @elements.last[name].is_a? Array
@elements.last[name].push value
 
else
@elements.last[name] = value
end
end
 
def attr(name, value)
return unless @elements.last
 
name = name.to_s.strip
value = value.to_s.strip
 
@elements.last[:attrs] ||= {}
@elements.last[:attrs][name] = value
end
 
def text(value)
return unless @elements.last
value = value.to_s.strip
@elements.last[:text] = value
end
end
end
 
module XMLPropertiesHandler
class Premthus
def next_element property
ap property
exit
end
end
end
 
XmlParsing::Reader.new("mits.xml", "Property", XMLPropertiesHandler::Premthus.new).parse
 
 
 
 
#
#
#
# below one is a nokogiri example which is trying easiest way to deal with nokogiri sax parsing
# http://amolnpujari.wordpress.com/2012/03/31/reading_huge_xml-rb/
#
#

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.