Skip to content

Instantly share code, notes, and snippets.

@sindresorhus
Forked from 140bytes/LICENSE.txt
Created March 7, 2012 13:34
Show Gist options
  • Star 13 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save sindresorhus/1993156 to your computer and use it in GitHub Desktop.
Save sindresorhus/1993156 to your computer and use it in GitHub Desktop.
stripScripts - Strip script tags from an HTML string (140byt.es)

stripScripts (140byt.es)

The most secure way to strip script tags by using the browsers built-in methods. 134 bytes.

function(a,b){
with(new Option){ // Temp element
innerHTML=a; // Create a real element from the string
for(a=getElementsByTagName('script');b=a[0];) // Loop trough all the scripts
b.parentNode.removeChild(b); // Remove each script
return innerHTML // Return the cleaned HTML
}
}
function(a,b){with(new Option){innerHTML=a;for(a=getElementsByTagName('script');b=a[0];)b.parentNode.removeChild(b);return innerHTML}}
DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
Version 2, December 2004
Copyright (C) 2012 Sindre Sorhus <http://sindresorhus.com>
Everyone is permitted to copy and distribute verbatim or modified
copies of this license document, and changing it is allowed as long
as the name is changed.
DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
0. You just DO WHAT THE FUCK YOU WANT TO.
{
"name": "stripScripts",
"description": "Strips script tags from an HTML string",
"keywords": [
"140bytes",
"strip",
"scripts",
"html",
"sanitize"
]
}
<!doctype html>
<title>stripScripts</title>
<div>Expected value: <b>&lt;div&gt;&lt;span&gt;&lt;/span&gt;&lt;/div&gt;</b></div>
<div>Actual value: <b id="ret"></b></div>
<script>
var stripScripts = function(a,b,c){b=new Option;b.innerHTML=a;for(a=b.getElementsByTagName('script');c=a[0];)c.parentNode.removeChild(c);return b.innerHTML}
var textInsertion = 'textContent' in document.body ? 'textContent' : 'innerText';
document.getElementById('ret')[textInsertion] = stripScripts('<div><span></span><script><\/script></div>')
</script>
@tsaniel
Copy link

tsaniel commented Mar 7, 2012

Save some bytes:
function(a,b){with(new Option){innerHTML=a;for(a=getElementsByTagName('script');b=a[0];)b.parentNode.removeChild(b);return innerHTML}}

@atk
Copy link

atk commented Mar 8, 2012

You could simply use

function(a){return a.replace(/<script[^>]*>.*?<\/script>/gi,'')}

to achieve the same effect.

@sindresorhus
Copy link
Author

@tsaniel Thanks, didn't think of using with. Implemented your suggestion ;)

@atk Sure, but you shouldn't parse/modify HTML with regex, it's error-prone and unsecure.

@atk
Copy link

atk commented Mar 13, 2012

I have yet to encounter a real-life-case where this matters...

@sindresorhus
Copy link
Author

@atk What about this?

var s = function(a){return a.replace(/<script[^>]*>.*?<\/script>/gi,'')};
s('<s<script></script>cript>alert("Repeat after me; Regex should never be used to parse HTML")</script>');

will return:

<script>alert("Repeat after me; Regex should never be used to parse HTML")</script>

@atk
Copy link

atk commented Mar 14, 2012

  1. I don't see a real-life-case there.
  2. If you don't insert valid HTML, YMMV, of course.

@sindresorhus
Copy link
Author

@atk I don't think you see the issue here. This is not about the HTML code I control, which if it was, I would just leave out the script tags in the first place. This is about opening yourself up to possible XSS attacks when getting unknown HTML from an external source. My method is not foolproof against this either, but it's orders of magnitude more secure.

@atk
Copy link

atk commented Mar 14, 2012

Even then this can be fixed easily: function(a,b){for(b=/<script[^>]*>.*?<\/script>/gi;b.test(a);)a=a.replace(b,'');return a}

@sindresorhus
Copy link
Author

@atk Didn't fix it.

var s = function(a,b){for(b=/<script[^>]*>.*?<\/script>/gi;b.test(a);)a=a.replace(b,'');return a}
s('<s<script></script>cript>alert("Repeat after me; Regex should never be used to parse HTML")</script>');

still outputs:

<script>alert("Repeat after me; Regex should never be used to parse HTML")</script>

But even if it had worked, that's beyond the point. I could easily find another way to bypass the regex...

@atk
Copy link

atk commented Mar 15, 2012

Strange enough - for me it worked and returned an empty string (as I had intended). And I don't think it would be that easy to bypass the regex (Challenge accepted?).

@sindresorhus
Copy link
Author

In Firefox it returns and empty string yes, but not in Chrome...

Feel free to submit an improved regex, and I'll try to bypass it :)

Just remember, it has to do it's thing correctly in all supported browsers.

@maettig
Copy link

maettig commented Mar 20, 2012

It's also possible to use /* comments */ to trick the regular expression. @sindresorhus is right. It's impossible to replace a real parser with a regular expression (it's hard to explain but there is a proof). However, in real-life it's not a problem if you are dealing with [b]visualization[/b]. But it's dangerous if you are dealing with security.

PS: By the way, here is an other way to trick the all scripts listed here: <img src="http://valid.uri/here.jpg" onload="alert('vulnerable')">.

@atk
Copy link

atk commented Mar 20, 2012

The original script should let that through, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment