Skip to content
Create a gist now

Instantly share code, notes, and snippets.

My submission for RFCN#8
This program, reads stdin, or all files named as arguments, and outputs
to stdout the transformed XML.
Alternatively, the submission can be require'd and the Transformer
class used in other applications.
Transformation rules are (currently) at the end of this file in YAML.
Rules can define name mappings, nesting, and whether they are optional
in the input document. The rule also indicates whether or not to copy
the contents of input elements to output elements or not.
I have created an additional "optional" element <age/> to test the
extensibility of the rules.
require 'yaml'
require 'nokogiri'
class Transformer
def initialize rules=YAML::load(DATA)
@rules = rules
def transform indoc
transform_recursively( @rules, indoc, )
def transform_recursively( current_rule_node, current_input_node, current_output_node )
current_rule_node.each do |symbol,tree|
instructions, paths, child_rules = tree
found_nodes = current_input_node.xpath( *paths )
node_not_found = found_nodes.empty?
found_nodes = [ current_input_node ] if node_not_found and instructions.include? :optional
raise "Missing element <#{symbol}>" if node_not_found and not instructions.include? :optional
raise "Too many <#{symbol}> elements" if found_nodes.count > 1 and instructions.include? :singular
found_nodes.each do |found|
new_node = current_output_node.add_child(, current_output_node) )
new_node.content = found.content if instructions.include? :copy_content and not node_not_found
transform_recursively( child_rules, found, new_node ) if child_rules
# simple test run
transformer =
for filename in ARGV
open(filename) do |f|
indoc = Nokogiri::XML(f)
outdoc = transformer.transform(indoc)
puts outdoc
rescue Exception => e
$stderr.puts e
Define the rules used for the transformation. Nested dictionaries
are used to indicate the output structure. Each dictionary key/symbol
is used as the output xml tag name.
These could be read from a file, or placed elsewhere, but are here
now for convenience. Currently in YAML format.
Format is a recursive structure:
node = { symbol: [ [instructions], [xpath locations], {child nodes} ] }
where the following instructions are currently understood:
:copy_content -> will copy the node content from input to output nodes
:optional -> if we don't need to find this node in the input doc
(note, we still create it in output doc)
:singular -> this element can only appear once
(note: the outer element can only appear once by definition)
- []
- - //people
- //address_book
- //employees
- :person:
- []
- - person
- contact
- employee
- :name:
- - :optional
- - name
- :first:
- - :copy_content
- - first_name
- first
- - :copy_content
- - last_name
- last
- surname
- - :optional
- :copy_content
- - age
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.