I am probably never going to get to this so I am documenting my thoughts publicly.
- Stack parse
- Consult the stack methodology that json2.js uses (should be on Google Code)
- Resig (HTML2XML) + jQuery (XMLInterpretter)
- Deep equal HTML to DOM tree/attributes and nodeType
- Get as close to DOM specification as possible within unit tests
- Another unit test, make a div, parse it via our interpretter and compare it to its innerHTML
- Throw away comments and end tags
- Throw away text nodes (optimization for size + attribution inspection only)
- Emit events (e.g. 'setAttr') during parsing