Skip to content

Instantly share code, notes, and snippets.

@yorickpeterse
Created November 29, 2014 21:40
Show Gist options
  • Save yorickpeterse/fb82f69ec4f06a217aa3 to your computer and use it in GitHub Desktop.
Save yorickpeterse/fb82f69ec4f06a217aa3 to your computer and use it in GitHub Desktop.
class Oga::XML::Parser
extend LL::Parser
tokens :T_TEXT, :T_STRING_SQUOTE, :T_STRING_DQUOTE, :T_STRING_BODY,
:T_DOCTYPE_START, :T_DOCTYPE_END, :T_DOCTYPE_TYPE, :T_DOCTYPE_NAME,
:T_DOCTYPE_INLINE, :T_CDATA, :T_COMMENT,
:T_ELEM_START, :T_ELEM_NAME, :T_ELEM_NS, :T_ELEM_END, :T_ATTR,
:T_ATTR_NS, :T_XML_DECL_START, :T_XML_DECL_END,
:T_PROC_INS_START, :T_PROC_INS_NAME, :T_PROC_INS_END
start :document
rule :document,
:expressions => -> (exp) { on_document(exp[0]) }
rule :expressions,
:expressions_ => -> (exp) { exp[0] },
nil => -> { [] }
# Assuming proper repetition operators are available (see below) the above
# rule can be merged together with the rule below. LALR(1) doesn't allow this
# due to conflicts (in case of Oga's XML parser).
rule :expressions_,
[:expressions_, :expression] => -> (left, right) { left[0] << right[1] },
:expression => -> (exp) { exp }
# Or without using left recursion (which wouldn't work in LL apparently):
rule :expressions_,
one_or_many(:expression) # translates to `expression+`
# or:
rule :expression_,
none_or_one(:expression) # translates to `expression*`
rule :expression,
:doctype,
:cdata,
:comment,
:element,
:text,
:xmldecl,
:proc_ins
rule :doctype,
[:T_DOCTYPE_START, :T_DOCTYPE_NAME, :T_DOCTYPE_END] =>
-> (_, name) { on_doctype(:name => name) },
[:T_DOCTYPE_START, :T_DOCTYPE_NAME, :T_DOCTYPE_TYPE, :T_DOCTYPE_END] =>
-> (_, name, type) { on_doctype(:name => name, :type => type) },
[:T_DOCTYPE_START, :T_DOCTYPE_NAME, :T_DOCTYPE_TYPE, :string, :T_DOCTYPE_END] =>
-> (_, name, type, id, _)
{
on_doctype(:name => name, :type => type, :public_id => id)
}
# ...
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment