Skip to content

Instantly share code, notes, and snippets.

@ForbesLindesay
Created November 14, 2022 14:58
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ForbesLindesay/f30b317013b9851149178c9f30993f6c to your computer and use it in GitHub Desktop.
Save ForbesLindesay/f30b317013b9851149178c9f30993f6c to your computer and use it in GitHub Desktop.

Saxes Parser State Machine

stateDiagram-v2

  # Consume optional 0xFEFF char
  BEGIN

  # Only space characters are allowed outside root element
  TEXT


  [*] --> BEGIN
  BEGIN --> [*]
  note left of BEGIN
    Consume optional 0xFEFF char
  end note
  BEGIN --> BEGIN_WHITESPACE

  BEGIN_WHITESPACE --> OpenWaka
  BEGIN_WHITESPACE --> TEXT

  OpenWaka --> TEXT

  note left of OpenWaka
    Main loop to process "<"
  end note

  ENTITY --> TEXT
  TEXT --> [*]
  TEXT --> OpenWaka
  TEXT --> ENTITY

Open Waka

The "OPEN_WAKA" state represents the state when a < character is encountered in "TEXT".

stateDiagram-v2
  [*] --> OPEN_WAKA

  OPEN_WAKA --> OpenTag
  OPEN_WAKA --> CloseTag
  OPEN_WAKA --> OPEN_WAKA_BANG
  OPEN_WAKA --> ProcessingInstruction

  OPEN_WAKA_BANG --> CData
  OPEN_WAKA_BANG --> Comment
  OPEN_WAKA_BANG --> Doctype

  Comment --> TEXT

  CData --> TEXT

  ProcessingInstruction --> TEXT

  OpenTag --> TEXT
  Doctype --> TEXT

  TEXT --> [*]

Comment

Comments in XML cannot contain two consecutive "-" characters.

stateDiagram-v2
  [*] --> COMMENT
  COMMENT --> COMMENT_ENDING: encountered one "-"

  COMMENT_ENDING --> COMMENT_ENDED: encountered a second "-"
  COMMENT_ENDING --> COMMENT

  COMMENT_ENDED --> [*]: consumed ">"

CData

Fails if outside root node. CData can contain ]] but not ]]>.

stateDiagram-v2
  [*] --> CDATA
  CDATA --> CDATA_ENDING: encountered one "]"

  CDATA_ENDING --> CDATA_ENDING_2: encountered a second "]"
  CDATA_ENDING --> CDATA

  CDATA_ENDING_2 --> [*]: encountered ">"
  CDATA_ENDING_2 --> CDATA

Processing Instruction

stateDiagram-v2
  direction LR
  [*] --> PI_FIRST_CHAR
  PI_FIRST_CHAR --> PI_REST
  PI_FIRST_CHAR --> PI_ENDING
  PI_FIRST_CHAR --> PI_BODY

  PI_REST --> XmlDeclaration
  PI_REST --> PI_ENDING
  PI_REST --> PI_BODY

  PI_BODY --> PI_ENDING

  PI_ENDING --> PI_BODY
  PI_ENDING --> [*]

  XmlDeclaration --> [*]

XML Declaration

Fails if not first thing in the document (i.e. this.xmlDeclPossible).

stateDiagram-v2
  direction LR
  [*] --> XML_DECL_ENDING
  [*] --> XML_DECL_NAME_START
  XML_DECL_NAME_START --> XML_DECL_ENDING
  XML_DECL_NAME_START --> XML_DECL_NAME

  XML_DECL_NAME --> XML_DECL_ENDING
  XML_DECL_NAME --> XML_DECL_VALUE_START
  XML_DECL_NAME --> XML_DECL_EQ

  XML_DECL_EQ --> XML_DECL_ENDING
  XML_DECL_EQ --> XML_DECL_VALUE_START

  XML_DECL_VALUE_START --> XML_DECL_ENDING
  XML_DECL_VALUE_START --> XML_DECL_VALUE

  XML_DECL_VALUE --> XML_DECL_ENDING
  XML_DECL_VALUE --> XML_DECL_SEPARATOR

  XML_DECL_SEPARATOR --> XML_DECL_ENDING
  XML_DECL_SEPARATOR --> XML_DECL_NAME_START

  XML_DECL_ENDING --> [*]

Doctype

Fails if if multiple the document has multiple doctype nodes or if the root node opened is before doctype.

stateDiagram-v2
  direction LR
    [*] --> DOCTYPE
    DOCTYPE --> [*]
    DOCTYPE --> DTD
    DOCTYPE --> DOCTYPE_QUOTE

    DOCTYPE_QUOTE --> DOCTYPE

    DTD --> DOCTYPE
    DTD --> DTD_OPEN_WAKA
    DTD --> DTD_QUOTED

    DTD_QUOTED --> DTD

    DTD_OPEN_WAKA --> DTD_OPEN_WAKA_BANG
    DTD_OPEN_WAKA --> DTD_PI
    DTD_OPEN_WAKA --> DTD

    DTD_OPEN_WAKA_BANG --> DTD_COMMENT
    DTD_OPEN_WAKA_BANG --> DTD

    DTD_COMMENT --> DTD_COMMENT_ENDING

    DTD_COMMENT_ENDING --> DTD_COMMENT_ENDED
    DTD_COMMENT_ENDING --> DTD_COMMENT

    DTD_COMMENT_ENDED --> DTD

    DTD_PI --> DTD_PI_ENDING
    DTD_PI_ENDING --> DTD

Open Tag

Fails at "OPEN_TAG" if root tag is closed.

stateDiagram-v2
  [*] --> OPEN_TAG
  OPEN_TAG --> [*]
  OPEN_TAG --> OPEN_TAG_SLASH
  OPEN_TAG --> ATTRIB

  OPEN_TAG_SLASH --> ATTRIB

  ATTRIB --> ATTRIB_NAME
  ATTRIB --> OPEN_TAG_SLASH

  ATTRIB_NAME --> ATTRIB_VALUE
  ATTRIB_NAME --> ATTRIB_NAME_SAW_WHITE

  ATTRIB_NAME_SAW_WHITE --> ATTRIB_VALUE
  ATTRIB_NAME_SAW_WHITE --> ATTRIB_NAME

  ATTRIB_VALUE --> ATTRIB_VALUE_QUOTED
  ATTRIB_VALUE --> ATTRIB_VALUE_UNQUOTED

  ATTRIB_ENTITY_QUOTED: ENTITY
  ATTRIB_ENTITY_QUOTED --> ATTRIB_VALUE_QUOTED

  ATTRIB_VALUE_QUOTED --> ATTRIB_VALUE_CLOSED
  ATTRIB_VALUE_QUOTED --> ATTRIB_ENTITY_QUOTED

  ATTRIB_VALUE_CLOSED --> ATTRIB
  ATTRIB_VALUE_CLOSED --> OPEN_TAG_SLASH
  ATTRIB_VALUE_CLOSED --> ATTRIB_NAME

  ATTRIB_ENTITY_UNQUOTED: ENTITY
  ATTRIB_ENTITY_UNQUOTED --> ATTRIB_VALUE_UNQUOTED
  ATTRIB_VALUE_UNQUOTED --> ATTRIB_ENTITY_UNQUOTED
  ATTRIB_VALUE_UNQUOTED --> ATTRIB

Close Tag

Fails if it didn't match a corresponding open tag.

stateDiagram-v2
  direction LR
  [*] --> CLOSE_TAG
  CLOSE_TAG --> CLOSE_TAG_SAW_WHITE
  CLOSE_TAG --> [*]
  CLOSE_TAG_SAW_WHITE --> [*]: fail if !skipSpaces
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment