The most secure way to strip script tags by using the browsers built-in methods. 134 bytes.
-
-
Save sindresorhus/1993156 to your computer and use it in GitHub Desktop.
function(a,b){ | |
with(new Option){ // Temp element | |
innerHTML=a; // Create a real element from the string | |
for(a=getElementsByTagName('script');b=a[0];) // Loop trough all the scripts | |
b.parentNode.removeChild(b); // Remove each script | |
return innerHTML // Return the cleaned HTML | |
} | |
} |
function(a,b){with(new Option){innerHTML=a;for(a=getElementsByTagName('script');b=a[0];)b.parentNode.removeChild(b);return innerHTML}} |
DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE | |
Version 2, December 2004 | |
Copyright (C) 2012 Sindre Sorhus <http://sindresorhus.com> | |
Everyone is permitted to copy and distribute verbatim or modified | |
copies of this license document, and changing it is allowed as long | |
as the name is changed. | |
DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE | |
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION | |
0. You just DO WHAT THE FUCK YOU WANT TO. |
{ | |
"name": "stripScripts", | |
"description": "Strips script tags from an HTML string", | |
"keywords": [ | |
"140bytes", | |
"strip", | |
"scripts", | |
"html", | |
"sanitize" | |
] | |
} |
<!doctype html> | |
<title>stripScripts</title> | |
<div>Expected value: <b><div><span></span></div></b></div> | |
<div>Actual value: <b id="ret"></b></div> | |
<script> | |
var stripScripts = function(a,b,c){b=new Option;b.innerHTML=a;for(a=b.getElementsByTagName('script');c=a[0];)c.parentNode.removeChild(c);return b.innerHTML} | |
var textInsertion = 'textContent' in document.body ? 'textContent' : 'innerText'; | |
document.getElementById('ret')[textInsertion] = stripScripts('<div><span></span><script><\/script></div>') | |
</script> |
You could simply use
function(a){return a.replace(/<script[^>]*>.*?<\/script>/gi,'')}
to achieve the same effect.
@tsaniel Thanks, didn't think of using with
. Implemented your suggestion ;)
@atk Sure, but you shouldn't parse/modify HTML with regex, it's error-prone and unsecure.
I have yet to encounter a real-life-case where this matters...
@atk What about this?
var s = function(a){return a.replace(/<script[^>]*>.*?<\/script>/gi,'')};
s('<s<script></script>cript>alert("Repeat after me; Regex should never be used to parse HTML")</script>');
will return:
<script>alert("Repeat after me; Regex should never be used to parse HTML")</script>
- I don't see a real-life-case there.
- If you don't insert valid HTML, YMMV, of course.
@atk I don't think you see the issue here. This is not about the HTML code I control, which if it was, I would just leave out the script tags in the first place. This is about opening yourself up to possible XSS attacks when getting unknown HTML from an external source. My method is not foolproof against this either, but it's orders of magnitude more secure.
Even then this can be fixed easily: function(a,b){for(b=/<script[^>]*>.*?<\/script>/gi;b.test(a);)a=a.replace(b,'');return a}
@atk Didn't fix it.
var s = function(a,b){for(b=/<script[^>]*>.*?<\/script>/gi;b.test(a);)a=a.replace(b,'');return a}
s('<s<script></script>cript>alert("Repeat after me; Regex should never be used to parse HTML")</script>');
still outputs:
<script>alert("Repeat after me; Regex should never be used to parse HTML")</script>
But even if it had worked, that's beyond the point. I could easily find another way to bypass the regex...
Strange enough - for me it worked and returned an empty string (as I had intended). And I don't think it would be that easy to bypass the regex (Challenge accepted?).
In Firefox it returns and empty string yes, but not in Chrome...
Feel free to submit an improved regex, and I'll try to bypass it :)
Just remember, it has to do it's thing correctly in all supported browsers.
It's also possible to use /* comments */
to trick the regular expression. @sindresorhus is right. It's impossible to replace a real parser with a regular expression (it's hard to explain but there is a proof). However, in real-life it's not a problem if you are dealing with [b]visualization[/b]
. But it's dangerous if you are dealing with security.
PS: By the way, here is an other way to trick the all scripts listed here: <img src="http://valid.uri/here.jpg" onload="alert('vulnerable')">
.
The original script should let that through, too.
Save some bytes:
function(a,b){with(new Option){innerHTML=a;for(a=getElementsByTagName('script');b=a[0];)b.parentNode.removeChild(b);return innerHTML}}