public
Last active

DOMParser HTML extension - Now a polyfill since HTML parsing was added to the DOMParser specification

  • Download Gist
html-domparser.js
JavaScript
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
/*
* DOMParser HTML extension
* 2012-09-04
*
* By Eli Grey, http://eligrey.com
* Public domain.
* NO WARRANTY EXPRESSED OR IMPLIED. USE AT YOUR OWN RISK.
*/
 
/*! @source https://gist.github.com/1129031 */
/*global document, DOMParser*/
 
(function(DOMParser) {
"use strict";
 
var
DOMParser_proto = DOMParser.prototype
, real_parseFromString = DOMParser_proto.parseFromString
;
 
// Firefox/Opera/IE throw errors on unsupported types
try {
// WebKit returns null on unsupported types
if ((new DOMParser).parseFromString("", "text/html")) {
// text/html parsing is natively supported
return;
}
} catch (ex) {}
 
DOMParser_proto.parseFromString = function(markup, type) {
if (/^\s*text\/html\s*(?:;|$)/i.test(type)) {
var
doc = document.implementation.createHTMLDocument("")
;
if (markup.toLowerCase().indexOf('<!doctype') > -1) {
doc.documentElement.innerHTML = markup;
}
else {
doc.body.innerHTML = markup;
}
return doc;
} else {
return real_parseFromString.apply(this, arguments);
}
};
}(DOMParser));

This doesn't work correctly, it markup contains external scripts.

I cannot reproduce your problem. I used (new DOMParser).parseFromString("<script src='http://foo/bar.js'></script>", "text/html").querySelector("script").src === "http://foo/bar.js". Do you mean that the script isn't executed? DOMParser is for parsing HTML, not executing it. Create an iframe and manipulate its content document after appending into the current document if you wish to create an active document.

@eligrey, how would this stack up to the html parser by @jeresig?

It will always be faster than @jeresig's parser as it uses the browser's native HTML5 parser.

And in terms of browser support?

Every browser that supports document.implementation.createHTMLDocument should work. I think IE <8 might not support that. A workaround for IE <8 could be to use an iframe, but that creates an active document context, which is dangerous and should only be used for parsing trusted HTML.

In short, all current browsers support it.

Thanks for heads up @RobertXGreen; fixed.

What's the reason you pass DOMParser as an argument to the anonymous function instead of just accessing it where you need it? Micro-optimization?

doesn't work in IE9 because innerHTML is a read-only property (line 36 fails).

Need to clarify my previous comment. You may be able to set doc.body.innerHTML, but that doesn't work if passed-in markup is an entire document ('

....'). You might hope to fix this by setting innerHTML on doc.documentElement instead of doc.body, but IE (at least 9) doesn't let you do that.

This won't work correctly on document strings that contain a full document with a doctype, a head tag, a title tag, etc. Here's a gist based loosely on the suggestion from @karger to try out doc.documentElement instead: https://gist.github.com/kethinov/4760460

I also made another gist that takes a more targeted approach to just making sure the title tag's content makes it into the new document irrespective of whatever else may be in the head. It's pretty hacky and I don't see how it would be all that useful to anyone, but here it is just in case anyone wants to look it over: https://gist.github.com/kethinov/4760431

@eligrey, can you merge my changes from my first gist (https://gist.github.com/kethinov/4760460) into your version?

Or if you don't think my changes are a good idea, let me know. Comments/feedback are totally welcome. Anywho, thanks for this polyfill. I wish more browsers had full support for DOMParser.

@kethinov I merged in your changes.

This is nice, but attributes on the documentElement will not be available in the DOMParser since this uses 'doc.documentElement.innerHTML' and the documentElement is read only in a DOM Implementation. So the attributes would need to be added manually. It's an edge case, but just adding a note for anyone that might run into that.

Since this polyfill assumes DOMParser is defined, you should add the following wrapper for your code :

if (window.DOMParser !== undefined){
    [   your code  ]
}

That way, your polyfill will just be ignored in browsers that don't support DOMParser (which happens to include IE8) instead of generating an error.

@jslegers: The code excerpt below is solving for that. You can't just do a check against the window for "DOMParser" because Safari supports DOMParser for XML, just not HTML. This can't be determined any other way besides using a try/catch.

try {
        // WebKit returns null on unsupported types
        if ((new DOMParser).parseFromString("", "text/html")) {
            // text/html parsing is natively supported
            return;
        }
    } catch (ex) {}

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.