Skip to content
Create a gist now

Instantly share code, notes, and snippets.

XSS filtering in PHP (cleans various UTF encodings & nested exploits)
* XSS filter
* This was built from numerous sources
* (thanks all, sorry I didn't track to credit you)
* It was tested against *most* exploits here:
* WARNING: Some weren't tested!!!
* Those include the Actionscript and SSI samples, or any newer than Jan 2011
* TO-DO: compare to SymphonyCMS filter:
* (Symphony's is probably faster than my hack)
function xss_clean($data)
// Fix &entity\n;
$data = str_replace(array('&','<','>'), array('&','<','>'), $data);
$data = preg_replace('/(&#*\w+)[\x00-\x20]+;/u', '$1;', $data);
$data = preg_replace('/(&#x*[0-9A-F]+);*/iu', '$1;', $data);
$data = html_entity_decode($data, ENT_COMPAT, 'UTF-8');
// Remove any attribute starting with "on" or xmlns
$data = preg_replace('#(<[^>]+?[\x00-\x20"\'])(?:on|xmlns)[^>]*+>#iu', '$1>', $data);
// Remove javascript: and vbscript: protocols
$data = preg_replace('#([a-z]*)[\x00-\x20]*=[\x00-\x20]*([`\'"]*)[\x00-\x20]*j[\x00-\x20]*a[\x00-\x20]*v[\x00-\x20]*a[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iu', '$1=$2nojavascript...', $data);
$data = preg_replace('#([a-z]*)[\x00-\x20]*=([\'"]*)[\x00-\x20]*v[\x00-\x20]*b[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iu', '$1=$2novbscript...', $data);
$data = preg_replace('#([a-z]*)[\x00-\x20]*=([\'"]*)[\x00-\x20]*-moz-binding[\x00-\x20]*:#u', '$1=$2nomozbinding...', $data);
// Only works in IE: <span style="width: expression(alert('Ping!'));"></span>
$data = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?expression[\x00-\x20]*\([^>]*+>#i', '$1>', $data);
$data = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?behaviour[\x00-\x20]*\([^>]*+>#i', '$1>', $data);
$data = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:*[^>]*+>#iu', '$1>', $data);
// Remove namespaced elements (we do not need them)
$data = preg_replace('#</*\w+:\w[^>]*+>#i', '', $data);
// Remove really unwanted tags
$old_data = $data;
$data = preg_replace('#</*(?:applet|b(?:ase|gsound|link)|embed|frame(?:set)?|i(?:frame|layer)|l(?:ayer|ink)|meta|object|s(?:cript|tyle)|title|xml)[^>]*+>#i', '', $data);
while ($old_data !== $data);
// we are done...
return $data;

Thanks for it man! it helped :)


Thanks man, help me alot


Thank for your effort


Does this also cover tags with uppercase text. I know that's not proper formatting for a tag, but what if someone is inserting data into their DB after turning it lowercase?


Thanks for your effort:)


Here is the bypass, I think:

<img src =x onerror=confirm(document.cookie);

The regular expression ( expects that attacker will use the closing angular bracket which is missing in the above vector and all browsers will render this ...


Thanks a lot my fellow developer
I think If you take a look at the anti xss library in the link
and compare it with yours you can improve your code even further


thanks for a lot


@all -- FYI that I haven't updated this function since 2011 that's likely one of MANY VULNS in this lib. Unless you have performance constraints I recommend HTML Purifier instead of this quick & dirty method.

It might be time to update or just pull this lib down.:

  • @soaj1664 Thanks for catching that <img src =x onerror=confirm(document.cookie);.
  • @voku I'll try running your anit-xss tester from bc003bf against this.

I tried to add to test DB xss_clean("< a href="#">a</ a>") (without spaces). It added and destroy some data. Is it bug? Or it's not XSS category?


thanks for a lot


A better alternative to this class
Ready for production. Based on Kses and Drupal 7 filter


please I need a case that this filter can not catch i try most of cases but these cases were catches please and need at least one case that bypass this filter


rola2010 : test" onmouseover="alert(document.cookie);"


Asynth: unfortunately the code catch this,it seems it can't be broken

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.