Skip to content

Instantly share code, notes, and snippets.

Created February 15, 2011 12:53
Show Gist options
  • Save kmile/827475 to your computer and use it in GitHub Desktop.
Save kmile/827475 to your computer and use it in GitHub Desktop.
A small nokogiri xml reader DSL.
# A small DSL for helping parsing documents using Nokogiri::XML::Reader. The
# XML Reader is a good way to move a cursor through a (large) XML document fast,
# but is not as cumbersome as writing a full SAX document handler. Read about
# it here:
# Just pass the reader in this parser and specificy the nodes that you are interested
# in in a block. You can just parse every node or only look inside certain nodes.
# A small example:
# do
# inside_element 'User' do
# for_element 'Name' do puts "Username: #{inner_xml}" end
# for_element 'Email' do puts "Email: #{inner_xml}" end
# for_element 'Address' do
# puts 'Start of address:'
# inside_element do
# for_element 'Street' do puts "Street: #{inner_xml}" end
# for_element 'Zipcode' do puts "Zipcode: #{inner_xml}" end
# for_element 'City' do puts "City: #{inner_xml}" end
# end
# puts 'End of address'
# end
# end
# end
# It does NOT fail on missing tags, and does not guarantee order of execution. It parses
# every tag regardless of nesting. The only way to guarantee scope is by using
# the `inside_element` method. This limits the parsing to the current or the named tag.
# If tags are encountered multiple times, their blocks will be called multiple times.
require 'nokogiri'
module Xml
class Parser
def initialize(node, &block)
@node = node
@node.each do
self.instance_eval &block
def name
def inner_xml
def is_start?
@node.node_type == Nokogiri::XML::Reader::TYPE_ELEMENT
def is_end?
@node.node_type == Nokogiri::XML::Reader::TYPE_END_ELEMENT
def attribute(attribute)
def for_element(name, &block)
return unless == name and is_start?
self.instance_eval &block
def inside_element(name=nil, &block)
return if @node.self_closing?
return unless name.nil? or ( == name and is_start?)
name =
depth = @node.depth
@node.each do
return if == name and is_end? and @node.depth == depth
self.instance_eval &block
Copy link

Very nice code! Thanks a lot!

Copy link

lesterz commented Nov 15, 2013

Is there any more documentation or examples on how to use this anywhere? I'm having a hard time instantiating classes inside for_element. Keep getting NoMethodErrors...

Copy link

joonty commented Feb 7, 2014

This really is fantastic - excellent work!

Copy link

I just woke up in the middle of the night envisioning something like this.

And it already exists.

Good work.

Copy link

nicka commented Jan 13, 2015

OMG! This is awesome!!!! +1

Copy link

inner_xml doesn't seem to unescape & -- what's the recommended way to do this?

Copy link

saroar commented Apr 7, 2016

Is anybody can help me i have xml file which is 1gb i need find some category and import 100 product from 1gb xml file
here is my code

in controller

def import
    if params[:xml_file]
      file = params[:xml_file]
      doc = Nokogiri::XML::Document.parse(file)
      total_product = doc.xpath('//shop/offers/offer').take(2).length

      Product.import(doc, params[:category_id])
      redirect_to products_path, notice: "#{total_product} Product added."

and in product model
def self.import(doc, category)
parsed_products = doc.xpath('//shop/offers/offer').take(2)

if !
  self.transaction do
    parsed_products.each do |product|
      if product.at_xpath('categoryId').text == category
          price: product.at_xpath('price').text,
          category_id: product.at_xpath('categoryId').text,
          remote_image_url: product.at_xpath('picture').text.strip,
          brand_id: product.at_xpath('vendor').text,
          title: product.at_xpath('name').text,
          description: product.at_xpath('description').text,

          gender: product.at_xpath('fashion/gender').present? ? product.at_xpath('fashion/gender').text.gsub("m","Male").gsub("f","Female") : nil,

          product_type: product.at_xpath('fashion/type').present? ? product.at_xpath('fashion/type').text : '',




h2.text-center Import Products

= form_tag import_products_path, multipart: true do |f|
  = file_field_tag :xml_file

  = submit_tag "Import"

any advice will be appreciated thanks advance

Copy link

Had a 60+GB xml on my hands - and until @kmile showed me the path I was utterly lost in XML up above my ears :)

Thank you - from the bottom of my ❤️

Copy link

This is beautiful and saved me so much time and pain. Thank you @kmile.

Copy link

Thanks a lot for this wonderful piece of code. Did anyone get it to work with JRuby?

Copy link

cmalpeli commented Mar 7, 2017

@kmile this is awesome! Is there a way to prevent the text coming back with CDATA wrappers?

<![CDATA[My Text]]>

Copy link

aurels commented Oct 24, 2018

Still rocking the house in 2018 !

Copy link

hrieke commented Dec 5, 2018


Copy link

It's 4 years after the last comment. And still this is useful. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment