Skip to content

Instantly share code, notes, and snippets.

@atk
Forked from 140bytes/LICENSE.txt
Created August 18, 2011 15:01
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save atk/1154242 to your computer and use it in GitHub Desktop.
Save atk/1154242 to your computer and use it in GitHub Desktop.
checkMarkup - (X)HTML Markup check

checkMarkup

Will check (X)HTML Markup on if every opened tag is closed (self-contained tags need to be closed at the end XHTML-Style, e.g. <img .../>, receives an HTML string and returns either true or false.

Currently does not work on HTML single tags - maybe some golfing could resolve this issue.

function(
a, // XHTML input string
b, // result placeholder (starts as undefined, turns true on error)
c // Node stack
){
// initialize nodeName stack as array
c=[];
// search all tags withtin the strings and interpret them within a callback
a.replace(/<(\/?)(\w+).*?(\/?)>/g, function(
d, // full matched string (unused)
e, // closing tag? "/" vs. ""
f, // node name
g // self-closing tag? "/" vs. ""
){
// is it a closing tag?
e ?
// b is true (and stays true) if the last node name on the stack is unequal to the current one
b = b || c.pop() != f :
// check if self-closing tag, otherwise add current node name to stack
g || c.push(f)
});
// result is if no node name mismatch and nothing left on the stack
return !b && !c[0]
}
function(a,b,c){c=[];a.replace(/<(\/?)(\w+).*?(\/?)>/g,function(d,e,f,g){e?b=b||c.pop()!=f:g||c.push(f)});return!b&&!c[0]}
DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
Version 2, December 2004
Copyright (C) 2011 Alex Kloss <alexthkloss@web.de>
Everyone is permitted to copy and distribute verbatim or modified
copies of this license document, and changing it is allowed as long
as the name is changed.
DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
0. You just DO WHAT THE FUCK YOU WANT TO.
{
"name": "checkMarkup",
"description": "checks if all opened nodes within a string are closed correctly for XHTML-Style markup",
"keywords": [
"XHTML",
"markup",
"validation",
"string"
]
}
<!DOCTYPE html>
<title>checkMarkup</title>
<div>Expected value: <b>true,false,false</b></div>
<div>Actual value: <b id="ret"></b></div>
<script>
var myFunction = function(a,b,c){c=[];a.replace(/<(\/?)(\w+).*?(\/?)>/g,function(d,e,f,g){e?b=b||c.pop()!=f:g||c.push(f)});return!b&&!c[0]}
document.getElementById( "ret" ).innerHTML = [
myFunction('<div id="nav"><ul><li class="first"><a href="#1"><img/></a></li><li class="last"><a href="#2"><img/></a></li></ul></div>'),
myFunction('<p>this is invalid XHTML<p>because the closing tags are missing'),
myFunction('<p>Does not work on HTML-style single-tags (due to size limit): <img></p>')
];
</script>
@nikola
Copy link

nikola commented Aug 18, 2011

One byte less: /<(/?)(\w+)\b(/?)>/g

Also, in most regex engines \b is faster than .*?

@atk
Copy link
Author

atk commented Aug 19, 2011

That is true, but it would fail on attributes within the tags. This example has a practical purpose: to check 3rd-party code before inserting into our pages...

@nikola
Copy link

nikola commented Aug 19, 2011

Ah ok, I didn't see any attributes in the examples so I thought this would be constrained to attribute-less markup.

@atk
Copy link
Author

atk commented Aug 19, 2011

Added some attributes to the example in order to make this more clear.

@atk
Copy link
Author

atk commented Aug 19, 2011

Skimmed another 2 bytes off by exchanging !g&& with g|| and removed unnecessary ;... we're now down to 123 bytes

@tsaniel
Copy link

tsaniel commented Aug 19, 2011

\w contains numbers and underlines(_), and i don't think non-alphabet characters are valid for tag names...
Maybe [a-z] instead?

@atk
Copy link
Author

atk commented Aug 19, 2011

[a-zA-Z:] would be more like it, so it could feature XML namespaces, too

@tsaniel
Copy link

tsaniel commented Aug 19, 2011

Good idea.
Also, it seems the colon should follow some rules when using XML namespaces.
By the way, should something like <p>some texts</P> pass the checking?

@atk
Copy link
Author

atk commented Aug 19, 2011

Nope, because not valid XHTML - the closing tag should have the same case as the opening tag

This should take care of the tag names (138bytes):

function(a,b,c){return b=c=[],a.replace(/<(\/?)([a-zA-Z][a-zA-Z:]+).*?(\/?)>/g,function(d,e,f,g){e?b+=c.pop()!=f:g||c.push(f)}),!b&&!c[0]}

...not enough place for .toLowerCase(), anyway.

@nikola
Copy link

nikola commented Aug 19, 2011

/<(/?)([a-z][a-z:]+).*?(/?)>/gi

@atk
Copy link
Author

atk commented Aug 19, 2011

...shaving off another 5 bytes. Nice one :)

@jed
Copy link

jed commented Aug 20, 2011

switch up the declaration for 1 more byte:

function(a,b,c){c=[];a.replace(/<(\/?)(\w+).*?(\/?)>/g,function(d,e,f,g){e?b=b||c.pop()!=f:g||c.push(f)});return!b&&!c[0]}

@subzey
Copy link

subzey commented Aug 22, 2011

Even for well-formed XML we should first filter input string with something like

s.replace(/<!(\[CDATA\[[\s\S]*?\]\]|--[\s\S]*?--|[\s\S]*?)>|<\?[\s\S]*?\?>/g,0)

in order to strip CDATA sections, PI's, doctypes and comments.

And... Parsing (X)HTML with regex isn't a good idea, you know.

@atk
Copy link
Author

atk commented Aug 22, 2011

I lol'd. The next answer seems quite sensible and quite matching the current case, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment