Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@EricRahm
Last active February 6, 2020 16:54
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save EricRahm/f718c4d8a862cc08b69d7d4290c02927 to your computer and use it in GitHub Desktop.
Save EricRahm/f718c4d8a862cc08b69d7d4290c02927 to your computer and use it in GitHub Desktop.
Gecko Rust XML Parser Requirements

Goal: Replace Gecko's XML parser, libexpat, with a rust-based XML parser

Reasoning:

  • Various integer overflow CVEs
  • Buffer overflows
  • Simplify, we don't need character conversion (which has lead to several CVEs)

Requirements:

  • push/sax based interface (lower memory, streaming)
  • supports DTD, entities
  • hook to load external entities

Current nsExpatDriver implementation:

  int HandleExternalEntityRef(const char16_t *aOpenEntityNames,
                              const char16_t *aBase,
                              const char16_t *aSystemId,
                              const char16_t *aPublicId);
  nsresult HandleStartElement(const char16_t *aName, const char16_t **aAtts);
  nsresult HandleEndElement(const char16_t *aName);
  nsresult HandleCharacterData(const char16_t *aCData, const uint32_t aLength);
  nsresult HandleComment(const char16_t *aName);
  nsresult HandleProcessingInstruction(const char16_t *aTarget,
                                       const char16_t *aData);
  nsresult HandleXMLDeclaration(const char16_t *aVersion,
                                const char16_t *aEncoding,
                                int32_t aStandalone);
  nsresult HandleDefault(const char16_t *aData, const uint32_t aLength);
  nsresult HandleStartCdataSection();
  nsresult HandleEndCdataSection();
  nsresult HandleStartDoctypeDecl(const char16_t* aDoctypeName,
                                  const char16_t* aSysid,
                                  const char16_t* aPubid,
                                  bool aHasInternalSubset);
  nsresult HandleEndDoctypeDecl();
  nsresult HandleStartNamespaceDecl(const char16_t* aPrefix,
                                    const char16_t* aUri);
  nsresult HandleEndNamespaceDecl(const char16_t* aPrefix);
  nsresult HandleNotationDecl(const char16_t* aNotationName,
                              const char16_t* aBase,
                              const char16_t* aSysid,
                              const char16_t* aPubid);
  nsresult HandleUnparsedEntityDecl(const char16_t* aEntityName,
                                    const char16_t* aBase,
                                    const char16_t* aSysid,
                                    const char16_t* aPubid,
                                    const char16_t* aNotationName);

We'll want a similar interface in our rust library. So streaming data in and those callbacks hit.

Existing libraries:

  • xml-rs
    • pull-only, not streaming
    • doesn't support DTD, entities, utf-8 only
    • build is currently failing, but seems semi-active
  • RustyXML
    • sax-like
    • doesn't support DTD, entties, maybe only utf-8?
    • doesn't seem to be actively developed
  • xml5ever
    • used in servo
    • only aims to support XML5, so probably a no go
    • permissive about malformed XML, no DTD etc
@kaiakz
Copy link

kaiakz commented Feb 6, 2020

Hi, I'm interested in it. I am a new rustacean.
The requirement was posted three years ago, so I am afraid that some information is outdated now. Can you give me more details?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment