Skip to content

Instantly share code, notes, and snippets.

Last active Feb 6, 2020
What would you like to do?
Gecko Rust XML Parser Requirements

Goal: Replace Gecko's XML parser, libexpat, with a rust-based XML parser


  • Various integer overflow CVEs
  • Buffer overflows
  • Simplify, we don't need character conversion (which has lead to several CVEs)


  • push/sax based interface (lower memory, streaming)
  • supports DTD, entities
  • hook to load external entities

Current nsExpatDriver implementation:

  int HandleExternalEntityRef(const char16_t *aOpenEntityNames,
                              const char16_t *aBase,
                              const char16_t *aSystemId,
                              const char16_t *aPublicId);
  nsresult HandleStartElement(const char16_t *aName, const char16_t **aAtts);
  nsresult HandleEndElement(const char16_t *aName);
  nsresult HandleCharacterData(const char16_t *aCData, const uint32_t aLength);
  nsresult HandleComment(const char16_t *aName);
  nsresult HandleProcessingInstruction(const char16_t *aTarget,
                                       const char16_t *aData);
  nsresult HandleXMLDeclaration(const char16_t *aVersion,
                                const char16_t *aEncoding,
                                int32_t aStandalone);
  nsresult HandleDefault(const char16_t *aData, const uint32_t aLength);
  nsresult HandleStartCdataSection();
  nsresult HandleEndCdataSection();
  nsresult HandleStartDoctypeDecl(const char16_t* aDoctypeName,
                                  const char16_t* aSysid,
                                  const char16_t* aPubid,
                                  bool aHasInternalSubset);
  nsresult HandleEndDoctypeDecl();
  nsresult HandleStartNamespaceDecl(const char16_t* aPrefix,
                                    const char16_t* aUri);
  nsresult HandleEndNamespaceDecl(const char16_t* aPrefix);
  nsresult HandleNotationDecl(const char16_t* aNotationName,
                              const char16_t* aBase,
                              const char16_t* aSysid,
                              const char16_t* aPubid);
  nsresult HandleUnparsedEntityDecl(const char16_t* aEntityName,
                                    const char16_t* aBase,
                                    const char16_t* aSysid,
                                    const char16_t* aPubid,
                                    const char16_t* aNotationName);

We'll want a similar interface in our rust library. So streaming data in and those callbacks hit.

Existing libraries:

  • xml-rs
    • pull-only, not streaming
    • doesn't support DTD, entities, utf-8 only
    • build is currently failing, but seems semi-active
  • RustyXML
    • sax-like
    • doesn't support DTD, entties, maybe only utf-8?
    • doesn't seem to be actively developed
  • xml5ever
    • used in servo
    • only aims to support XML5, so probably a no go
    • permissive about malformed XML, no DTD etc
Copy link

kaiakz commented Feb 6, 2020

Hi, I'm interested in it. I am a new rustacean.
The requirement was posted three years ago, so I am afraid that some information is outdated now. Can you give me more details?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment