Skip to content
Create a gist now

Instantly share code, notes, and snippets.

XSS filtering in PHP (cleans various UTF encodings & nested exploits)
<?php
/*
* XSS filter, recursively handles HTML tags & UTF encoding
* Optionally handles base64 encoding
*
* ***DEPRECATION RECOMMENDED*** Not updated or maintained since 2011
* A MAINTAINED & BETTER ALTERNATIVE => kses
* https://github.com/RichardVasquez/kses/
*
* This was built from numerous sources
* (thanks all, sorry I didn't track to credit you)
*
* It was tested against *most* exploits here: http://ha.ckers.org/xss.html
* WARNING: Some weren't tested!!!
* Those include the Actionscript and SSI samples, or any newer than Jan 2011
*
*/
class xssClean {
/*
* Recursive worker to strip risky elements
*
* @param string $input Content to be cleaned. It MAY be modified in output
* @return string $output Modified $input string
*/
public function clean_input( $input, $safe_level = 0 ) {
$output = $input;
do {
// Treat $input as buffer on each loop, faster than new var
$input = $output;
// Remove unwanted tags
$output = $this->strip_tags( $input );
$output = $this->strip_encoded_entities( $output );
// Use 2nd input param if not empty or '0'
if ( $safe_level !== 0 ) {
$output = $this->strip_base64( $output );
}
} while ( $output !== $input );
return $output;
}
/*
* Focuses on stripping encoded entities
* *** This appears to be why people use this sample code. Unclear how well Kses does this ***
*
* @param string $input Content to be cleaned. It MAY be modified in output
* @return string $input Modified $input string
*/
private function strip_encoded_entities( $input ) {
// Fix &entity\n;
$input = str_replace(array('&amp;','&lt;','&gt;'), array('&amp;amp;','&amp;lt;','&amp;gt;'), $input);
$input = preg_replace('/(&#*\w+)[\x00-\x20]+;/u', '$1;', $input);
$input = preg_replace('/(&#x*[0-9A-F]+);*/iu', '$1;', $input);
$input = html_entity_decode($input, ENT_COMPAT, 'UTF-8');
// Remove any attribute starting with "on" or xmlns
$input = preg_replace('#(<[^>]+?[\x00-\x20"\'])(?:on|xmlns)[^>]*+[>\b]?#iu', '$1>', $input);
// Remove javascript: and vbscript: protocols
$input = preg_replace('#([a-z]*)[\x00-\x20]*=[\x00-\x20]*([`\'"]*)[\x00-\x20]*j[\x00-\x20]*a[\x00-\x20]*v[\x00-\x20]*a[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iu', '$1=$2nojavascript...', $input);
$input = preg_replace('#([a-z]*)[\x00-\x20]*=([\'"]*)[\x00-\x20]*v[\x00-\x20]*b[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iu', '$1=$2novbscript...', $input);
$input = preg_replace('#([a-z]*)[\x00-\x20]*=([\'"]*)[\x00-\x20]*-moz-binding[\x00-\x20]*:#u', '$1=$2nomozbinding...', $input);
// Only works in IE: <span style="width: expression(alert('Ping!'));"></span>
$input = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?expression[\x00-\x20]*\([^>]*+>#i', '$1>', $input);
$input = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?behaviour[\x00-\x20]*\([^>]*+>#i', '$1>', $input);
$input = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:*[^>]*+>#iu', '$1>', $input);
return $input;
}
/*
* Focuses on stripping unencoded HTML tags & namespaces
*
* @param string $input Content to be cleaned. It MAY be modified in output
* @return string $input Modified $input string
*/
private function strip_tags( $input ) {
// Remove tags
$input = preg_replace('#</*(?:applet|b(?:ase|gsound|link)|embed|frame(?:set)?|i(?:frame|layer)|l(?:ayer|ink)|meta|object|s(?:cript|tyle)|title|xml)[^>]*+>#i', '', $input);
// Remove namespaced elements
$input = preg_replace('#</*\w+:\w[^>]*+>#i', '', $input);
return $input;
}
/*
* Focuses on stripping entities from Base64 encoded strings
*
* NOT ENABLED by default!
* To enable 2nd param of clean_input() can be set to anything other than 0 or '0':
* ie: xssClean->clean_input( $input_string, 1 )
*
* @param string $input Maybe Base64 encoded string
* @return string $output Modified & re-encoded $input string
*/
private function strip_base64( $input ) {
$decoded = base64_decode( $input );
$decoded = $this->strip_tags( $decoded );
$decoded = $this->strip_encoded_entities( $decoded );
$output = base64_encode( $decoded );
return $output;
}
}
@isunilk
isunilk commented Apr 14, 2014

Thanks for it man! it helped :)

@opya
opya commented May 2, 2014

Thanks man, help me alot

@sitthykun

Thank for your effort

@donovanjackson

Does this also cover tags with uppercase text. I know that's not proper formatting for a tag, but what if someone is inserting data into their DB after turning it lowercase?

@Lofanmi
Lofanmi commented Jan 2, 2015

Thanks for your effort:)

@soaj1664
soaj1664 commented Feb 6, 2015

Here is the bypass, I think:

<img src =x onerror=confirm(document.cookie);

The regular expression (https://gist.github.com/mbijon/1098477#file-xss_clean-php-L26) expects that attacker will use the closing angular bracket which is missing in the above vector and all browsers will render this ...

@mrhsce
mrhsce commented Mar 6, 2015

Thanks a lot my fellow developer
I think If you take a look at the anti xss library in the link
and compare it with yours you can improve your code even further
https://code.google.com/p/php-antixss/downloads/detail?name=AntiXSS.php&can=2&q=

@shyandsy
shyandsy commented Jul 2, 2015

thanks for a lot
helpful

@mbijon
Owner
mbijon commented Jul 22, 2015

@all -- FYI that I haven't updated this function since 2011 ...so that's likely one of MANY VULNS in this lib. Unless you have performance constraints I recommend HTML Purifier instead of this quick & dirty method.


It might be time to update or just pull this lib down.:

  • @soaj1664 Thanks for catching that <img src =x onerror=confirm(document.cookie);.
  • @voku I'll try running your anit-xss tester from bc003bf against this.
@github-wuzhh

Tks

@BobJack
BobJack commented Aug 7, 2015

I tried to add to test DB xss_clean("< a href="#">a</ a>") (without spaces). It added and destroy some data. Is it bug? Or it's not XSS category?

@MrQuiet
MrQuiet commented Sep 29, 2015

thanks for a lot
helpful

@ymakux
ymakux commented Oct 26, 2015

A better alternative to this class https://github.com/ymakux/xss
Ready for production. Based on Kses and Drupal 7 filter

@rola2010

please I need a case that this filter can not catch i try most of cases but these cases were catches please and need at least one case that bypass this filter

@Asynth
Asynth commented Nov 11, 2015

rola2010 : test" onmouseover="alert(document.cookie);"

@rola2010

Asynth: unfortunately the code catch this,it seems it can't be broken

@mbijon
Owner
mbijon commented Mar 8, 2016

I still recommend HTML Purifier or kses instead of this gist.

However:

  • UPDATED: This class-based version of xssClean recurses BOTH the encoded entity & tag removal routines. This solves a vulnerability found by 0xmitsurugi & reported privately.

  • UPDATED: The exploit reported by @soaj1664 in this comment is fixed. It could have only been effective at the end of the file or if there were no other ">" characters after their exploit ...the problem with filtering that exploint without ONLY a closing ">" is that users would see removal of the entire input body. Instead of removing the entire message body after that exploit, I've chosen to remove up to the next non-word character (shch as a newline or file ending char). This could still remove the rest of the input if there are no non-word characters before the end, but it helps keep some of the message in MOST cases, but still making this filter more secure.

@webhacking

👍

@yxs1987
yxs1987 commented Jun 22, 2016

thank you so much

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.