Create a gist now

Instantly share code, notes, and snippets.

What would you like to do?
XSS filtering in PHP (cleans various UTF encodings & nested exploits)
* XSS filter, recursively handles HTML tags & UTF encoding
* Optionally handles base64 encoding
* ***DEPRECATION RECOMMENDED*** Not updated or maintained since 2011
* This was built from numerous sources
* (thanks all, sorry I didn't track to credit you)
* It was tested against *most* exploits here:
* WARNING: Some weren't tested!!!
* Those include the Actionscript and SSI samples, or any newer than Jan 2011
class xssClean {
* Recursive worker to strip risky elements
* @param string $input Content to be cleaned. It MAY be modified in output
* @return string $output Modified $input string
public function clean_input( $input, $safe_level = 0 ) {
$output = $input;
do {
// Treat $input as buffer on each loop, faster than new var
$input = $output;
// Remove unwanted tags
$output = $this->strip_tags( $input );
$output = $this->strip_encoded_entities( $output );
// Use 2nd input param if not empty or '0'
if ( $safe_level !== 0 ) {
$output = $this->strip_base64( $output );
} while ( $output !== $input );
return $output;
* Focuses on stripping encoded entities
* *** This appears to be why people use this sample code. Unclear how well Kses does this ***
* @param string $input Content to be cleaned. It MAY be modified in output
* @return string $input Modified $input string
private function strip_encoded_entities( $input ) {
// Fix &entity\n;
$input = str_replace(array('&amp;','&lt;','&gt;'), array('&amp;amp;','&amp;lt;','&amp;gt;'), $input);
$input = preg_replace('/(&#*\w+)[\x00-\x20]+;/u', '$1;', $input);
$input = preg_replace('/(&#x*[0-9A-F]+);*/iu', '$1;', $input);
$input = html_entity_decode($input, ENT_COMPAT, 'UTF-8');
// Remove any attribute starting with "on" or xmlns
$input = preg_replace('#(<[^>]+?[\x00-\x20"\'])(?:on|xmlns)[^>]*+[>\b]?#iu', '$1>', $input);
// Remove javascript: and vbscript: protocols
$input = preg_replace('#([a-z]*)[\x00-\x20]*=[\x00-\x20]*([`\'"]*)[\x00-\x20]*j[\x00-\x20]*a[\x00-\x20]*v[\x00-\x20]*a[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iu', '$1=$2nojavascript...', $input);
$input = preg_replace('#([a-z]*)[\x00-\x20]*=([\'"]*)[\x00-\x20]*v[\x00-\x20]*b[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iu', '$1=$2novbscript...', $input);
$input = preg_replace('#([a-z]*)[\x00-\x20]*=([\'"]*)[\x00-\x20]*-moz-binding[\x00-\x20]*:#u', '$1=$2nomozbinding...', $input);
// Only works in IE: <span style="width: expression(alert('Ping!'));"></span>
$input = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?expression[\x00-\x20]*\([^>]*+>#i', '$1>', $input);
$input = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?behaviour[\x00-\x20]*\([^>]*+>#i', '$1>', $input);
$input = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:*[^>]*+>#iu', '$1>', $input);
return $input;
* Focuses on stripping unencoded HTML tags & namespaces
* @param string $input Content to be cleaned. It MAY be modified in output
* @return string $input Modified $input string
private function strip_tags( $input ) {
// Remove tags
$input = preg_replace('#</*(?:applet|b(?:ase|gsound|link)|embed|frame(?:set)?|i(?:frame|layer)|l(?:ayer|ink)|meta|object|s(?:cript|tyle)|title|xml)[^>]*+>#i', '', $input);
// Remove namespaced elements
$input = preg_replace('#</*\w+:\w[^>]*+>#i', '', $input);
return $input;
* Focuses on stripping entities from Base64 encoded strings
* NOT ENABLED by default!
* To enable 2nd param of clean_input() can be set to anything other than 0 or '0':
* ie: xssClean->clean_input( $input_string, 1 )
* @param string $input Maybe Base64 encoded string
* @return string $output Modified & re-encoded $input string
private function strip_base64( $input ) {
$decoded = base64_decode( $input );
$decoded = $this->strip_tags( $decoded );
$decoded = $this->strip_encoded_entities( $decoded );
$output = base64_encode( $decoded );
return $output;

isunilk commented Apr 14, 2014

Thanks for it man! it helped :)

opya commented May 2, 2014

Thanks man, help me alot

Thank for your effort

Does this also cover tags with uppercase text. I know that's not proper formatting for a tag, but what if someone is inserting data into their DB after turning it lowercase?

Lofanmi commented Jan 2, 2015

Thanks for your effort:)

soaj1664 commented Feb 6, 2015

Here is the bypass, I think:

<img src =x onerror=confirm(document.cookie);

The regular expression ( expects that attacker will use the closing angular bracket which is missing in the above vector and all browsers will render this ...

mrhsce commented Mar 6, 2015

Thanks a lot my fellow developer
I think If you take a look at the anti xss library in the link
and compare it with yours you can improve your code even further

shyandsy commented Jul 2, 2015

thanks for a lot


mbijon commented Jul 22, 2015

@ALL -- FYI that I haven't updated this function since 2011 that's likely one of MANY VULNS in this lib. Unless you have performance constraints I recommend HTML Purifier instead of this quick & dirty method.

It might be time to update or just pull this lib down.:

  • @soaj1664 Thanks for catching that <img src =x onerror=confirm(document.cookie);.
  • @voku I'll try running your anit-xss tester from bc003bf against this.


I tried to add to test DB xss_clean("< a href="#">a</ a>") (without spaces). It added and destroy some data. Is it bug? Or it's not XSS category?

MrQuiet commented Sep 29, 2015

thanks for a lot

ymakux commented Oct 26, 2015

A better alternative to this class
Ready for production. Based on Kses and Drupal 7 filter

please I need a case that this filter can not catch i try most of cases but these cases were catches please and need at least one case that bypass this filter

Asynth commented Nov 11, 2015

rola2010 : test" onmouseover="alert(document.cookie);"

Asynth: unfortunately the code catch this,it seems it can't be broken


mbijon commented Mar 8, 2016

I still recommend HTML Purifier or kses instead of this gist.


  • UPDATED: This class-based version of xssClean recurses BOTH the encoded entity & tag removal routines. This solves a vulnerability found by 0xmitsurugi & reported privately.
  • UPDATED: The exploit reported by @soaj1664 in this comment is fixed. It could have only been effective at the end of the file or if there were no other ">" characters after their exploit ...the problem with filtering that exploint without ONLY a closing ">" is that users would see removal of the entire input body. Instead of removing the entire message body after that exploit, I've chosen to remove up to the next non-word character (such as a newline or file ending char). This could still remove the rest of the input if there are no non-word characters before the end, but it helps keep some of the message in MOST cases, but still making this filter more secure.


yxs1987 commented Jun 22, 2016

thank you so much

Hello, I am new to all xss stuff
how to prevent this

$_GET['a'] = 'javascript:alert(document.cookie)';
$href = xssClean($_GET['a']);
echo '<a href="'.$href.'">XSS link</a>';

properties commented Aug 2, 2016


Just echo it with htmlspecialchars(). Example:

$_GET['a'] = 'javascript:alert(document.cookie)';
$href = $_GET['a'];
echo '<a href="'.htmlspecialchars($href).'">XSS link</a>';

pice of code ;)

nat4tq commented Feb 8, 2018


I was testing your filter against a set of XSS test inputs. It seems that your filter is still vulnerable to XSS such as with inputs that contain XSS payloads in the comment-type, anchor and image tags. Examples of one of each are:

<!--#exec cmd=""/usr/X11R6/bin/xterm ?display &""-->
<a href="jAvAsCrIpT&colon;alert&lpar;1&rpar;">X</a>
/><img/onerror=\x09javascript:alert(1)\x09src=xxx:x />

A full report can be read in our paper, "Assessment of Dynamic Open-source Cross-site Scripting Filters as Security Devices in Web Applications". I kindly suggest that you add these tags to the blacklist to make it more robust against XSS.

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment