Skip to content

Instantly share code, notes, and snippets.

@EricRahm EricRahm/rust-xml.md
Last active May 23, 2017

Embed
What would you like to do?
Gecko Rust XML Parser Requirements

Goal: Replace Gecko's XML parser, libexpat, with a rust-based XML parser

Reasoning:

  • Various integer overflow CVEs
  • Buffer overflows
  • Simplify, we don't need character conversion (which has lead to several CVEs)

Requirements:

  • push/sax based interface (lower memory, streaming)
  • supports DTD, entities
  • hook to load external entities

Current nsExpatDriver implementation:

  int HandleExternalEntityRef(const char16_t *aOpenEntityNames,
                              const char16_t *aBase,
                              const char16_t *aSystemId,
                              const char16_t *aPublicId);
  nsresult HandleStartElement(const char16_t *aName, const char16_t **aAtts);
  nsresult HandleEndElement(const char16_t *aName);
  nsresult HandleCharacterData(const char16_t *aCData, const uint32_t aLength);
  nsresult HandleComment(const char16_t *aName);
  nsresult HandleProcessingInstruction(const char16_t *aTarget,
                                       const char16_t *aData);
  nsresult HandleXMLDeclaration(const char16_t *aVersion,
                                const char16_t *aEncoding,
                                int32_t aStandalone);
  nsresult HandleDefault(const char16_t *aData, const uint32_t aLength);
  nsresult HandleStartCdataSection();
  nsresult HandleEndCdataSection();
  nsresult HandleStartDoctypeDecl(const char16_t* aDoctypeName,
                                  const char16_t* aSysid,
                                  const char16_t* aPubid,
                                  bool aHasInternalSubset);
  nsresult HandleEndDoctypeDecl();
  nsresult HandleStartNamespaceDecl(const char16_t* aPrefix,
                                    const char16_t* aUri);
  nsresult HandleEndNamespaceDecl(const char16_t* aPrefix);
  nsresult HandleNotationDecl(const char16_t* aNotationName,
                              const char16_t* aBase,
                              const char16_t* aSysid,
                              const char16_t* aPubid);
  nsresult HandleUnparsedEntityDecl(const char16_t* aEntityName,
                                    const char16_t* aBase,
                                    const char16_t* aSysid,
                                    const char16_t* aPubid,
                                    const char16_t* aNotationName);

We'll want a similar interface in our rust library. So streaming data in and those callbacks hit.

Existing libraries:

  • xml-rs
    • pull-only, not streaming
    • doesn't support DTD, entities, utf-8 only
    • build is currently failing, but seems semi-active
  • RustyXML
    • sax-like
    • doesn't support DTD, entties, maybe only utf-8?
    • doesn't seem to be actively developed
  • xml5ever
    • used in servo
    • only aims to support XML5, so probably a no go
    • permissive about malformed XML, no DTD etc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.