Instantly share code, notes, and snippets.

Embed
What would you like to do?
XSS filtering in PHP (cleans various UTF encodings & nested exploits)
<?php
/*
* XSS filter, recursively handles HTML tags & UTF encoding
* Optionally handles base64 encoding
*
* ***DEPRECATION RECOMMENDED*** Not updated or maintained since 2011
* A MAINTAINED & BETTER ALTERNATIVE => kses
* https://github.com/RichardVasquez/kses/
*
* This was built from numerous sources
* (thanks all, sorry I didn't track to credit you)
*
* It was tested against *most* exploits here: http://ha.ckers.org/xss.html
* WARNING: Some weren't tested!!!
* Those include the Actionscript and SSI samples, or any newer than Jan 2011
*
*/
class xssClean {
/*
* Recursive worker to strip risky elements
*
* @param string $input Content to be cleaned. It MAY be modified in output
* @return string $output Modified $input string
*/
public function clean_input( $input, $safe_level = 0 ) {
$output = $input;
do {
// Treat $input as buffer on each loop, faster than new var
$input = $output;
// Remove unwanted tags
$output = $this->strip_tags( $input );
$output = $this->strip_encoded_entities( $output );
// Use 2nd input param if not empty or '0'
if ( $safe_level !== 0 ) {
$output = $this->strip_base64( $output );
}
} while ( $output !== $input );
return $output;
}
/*
* Focuses on stripping encoded entities
* *** This appears to be why people use this sample code. Unclear how well Kses does this ***
*
* @param string $input Content to be cleaned. It MAY be modified in output
* @return string $input Modified $input string
*/
private function strip_encoded_entities( $input ) {
// Fix &entity\n;
$input = str_replace(array('&amp;','&lt;','&gt;'), array('&amp;amp;','&amp;lt;','&amp;gt;'), $input);
$input = preg_replace('/(&#*\w+)[\x00-\x20]+;/u', '$1;', $input);
$input = preg_replace('/(&#x*[0-9A-F]+);*/iu', '$1;', $input);
$input = html_entity_decode($input, ENT_COMPAT, 'UTF-8');
// Remove any attribute starting with "on" or xmlns
$input = preg_replace('#(<[^>]+?[\x00-\x20"\'])(?:on|xmlns)[^>]*+[>\b]?#iu', '$1>', $input);
// Remove javascript: and vbscript: protocols
$input = preg_replace('#([a-z]*)[\x00-\x20]*=[\x00-\x20]*([`\'"]*)[\x00-\x20]*j[\x00-\x20]*a[\x00-\x20]*v[\x00-\x20]*a[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iu', '$1=$2nojavascript...', $input);
$input = preg_replace('#([a-z]*)[\x00-\x20]*=([\'"]*)[\x00-\x20]*v[\x00-\x20]*b[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iu', '$1=$2novbscript...', $input);
$input = preg_replace('#([a-z]*)[\x00-\x20]*=([\'"]*)[\x00-\x20]*-moz-binding[\x00-\x20]*:#u', '$1=$2nomozbinding...', $input);
// Only works in IE: <span style="width: expression(alert('Ping!'));"></span>
$input = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?expression[\x00-\x20]*\([^>]*+>#i', '$1>', $input);
$input = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?behaviour[\x00-\x20]*\([^>]*+>#i', '$1>', $input);
$input = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:*[^>]*+>#iu', '$1>', $input);
return $input;
}
/*
* Focuses on stripping unencoded HTML tags & namespaces
*
* @param string $input Content to be cleaned. It MAY be modified in output
* @return string $input Modified $input string
*/
private function strip_tags( $input ) {
// Remove tags
$input = preg_replace('#</*(?:applet|b(?:ase|gsound|link)|embed|frame(?:set)?|i(?:frame|layer)|l(?:ayer|ink)|meta|object|s(?:cript|tyle)|title|xml)[^>]*+>#i', '', $input);
// Remove namespaced elements
$input = preg_replace('#</*\w+:\w[^>]*+>#i', '', $input);
return $input;
}
/*
* Focuses on stripping entities from Base64 encoded strings
*
* NOT ENABLED by default!
* To enable 2nd param of clean_input() can be set to anything other than 0 or '0':
* ie: xssClean->clean_input( $input_string, 1 )
*
* @param string $input Maybe Base64 encoded string
* @return string $output Modified & re-encoded $input string
*/
private function strip_base64( $input ) {
$decoded = base64_decode( $input );
$decoded = $this->strip_tags( $decoded );
$decoded = $this->strip_encoded_entities( $decoded );
$output = base64_encode( $decoded );
return $output;
}
}
@isunilk

This comment has been minimized.

isunilk commented Apr 14, 2014

Thanks for it man! it helped :)

@opya

This comment has been minimized.

opya commented May 2, 2014

Thanks man, help me alot

@sitthykun

This comment has been minimized.

sitthykun commented Jun 26, 2014

Thank for your effort

@donovanjackson

This comment has been minimized.

donovanjackson commented Jul 3, 2014

Does this also cover tags with uppercase text. I know that's not proper formatting for a tag, but what if someone is inserting data into their DB after turning it lowercase?

@Lofanmi

This comment has been minimized.

Lofanmi commented Jan 2, 2015

Thanks for your effort:)

@soaj1664

This comment has been minimized.

soaj1664 commented Feb 6, 2015

Here is the bypass, I think:

<img src =x onerror=confirm(document.cookie);

The regular expression (https://gist.github.com/mbijon/1098477#file-xss_clean-php-L26) expects that attacker will use the closing angular bracket which is missing in the above vector and all browsers will render this ...

@mrhsce

This comment has been minimized.

mrhsce commented Mar 6, 2015

Thanks a lot my fellow developer
I think If you take a look at the anti xss library in the link
and compare it with yours you can improve your code even further
https://code.google.com/p/php-antixss/downloads/detail?name=AntiXSS.php&can=2&q=

@voku

This comment has been minimized.

voku commented Jun 17, 2015

@shyandsy

This comment has been minimized.

shyandsy commented Jul 2, 2015

thanks for a lot
helpful

@mbijon

This comment has been minimized.

Owner

mbijon commented Jul 22, 2015

@ALL -- FYI that I haven't updated this function since 2011 ...so that's likely one of MANY VULNS in this lib. Unless you have performance constraints I recommend HTML Purifier instead of this quick & dirty method.


It might be time to update or just pull this lib down.:

  • @soaj1664 Thanks for catching that <img src =x onerror=confirm(document.cookie);.
  • @voku I'll try running your anit-xss tester from bc003bf against this.
@github-wuzhh

This comment has been minimized.

github-wuzhh commented Aug 1, 2015

Tks

@SergeSysoev

This comment has been minimized.

SergeSysoev commented Aug 7, 2015

I tried to add to test DB xss_clean("< a href="#">a</ a>") (without spaces). It added and destroy some data. Is it bug? Or it's not XSS category?

@MrQuiet

This comment has been minimized.

MrQuiet commented Sep 29, 2015

thanks for a lot
helpful

@ymakux

This comment has been minimized.

ymakux commented Oct 26, 2015

A better alternative to this class https://github.com/ymakux/xss
Ready for production. Based on Kses and Drupal 7 filter

@rola2010

This comment has been minimized.

rola2010 commented Oct 31, 2015

please I need a case that this filter can not catch i try most of cases but these cases were catches please and need at least one case that bypass this filter

@Asynth

This comment has been minimized.

Asynth commented Nov 11, 2015

rola2010 : test" onmouseover="alert(document.cookie);"

@rola2010

This comment has been minimized.

rola2010 commented Nov 24, 2015

Asynth: unfortunately the code catch this,it seems it can't be broken

@mbijon

This comment has been minimized.

Owner

mbijon commented Mar 8, 2016

I still recommend HTML Purifier or kses instead of this gist.

However:

  • UPDATED: This class-based version of xssClean recurses BOTH the encoded entity & tag removal routines. This solves a vulnerability found by 0xmitsurugi & reported privately.
  • UPDATED: The exploit reported by @soaj1664 in this comment is fixed. It could have only been effective at the end of the file or if there were no other ">" characters after their exploit ...the problem with filtering that exploint without ONLY a closing ">" is that users would see removal of the entire input body. Instead of removing the entire message body after that exploit, I've chosen to remove up to the next non-word character (such as a newline or file ending char). This could still remove the rest of the input if there are no non-word characters before the end, but it helps keep some of the message in MOST cases, but still making this filter more secure.
@webhacking

This comment has been minimized.

webhacking commented Mar 31, 2016

👍

@yxs1987

This comment has been minimized.

yxs1987 commented Jun 22, 2016

thank you so much

@petrospap

This comment has been minimized.

petrospap commented Jul 13, 2016

Hello, I am new to all xss stuff
how to prevent this

<?php
$_GET['a'] = 'javascript:alert(document.cookie)';
$href = xssClean($_GET['a']);
echo '<a href="'.$href.'">XSS link</a>';
?>
@properties

This comment has been minimized.

properties commented Aug 2, 2016

@petrospap

Just echo it with htmlspecialchars(). Example:

$_GET['a'] = 'javascript:alert(document.cookie)';
$href = $_GET['a'];
echo '<a href="'.htmlspecialchars($href).'">XSS link</a>';
?>
@rudSarkar

This comment has been minimized.

rudSarkar commented Jun 15, 2017

pice of code ;)

@nat4tq

This comment has been minimized.

nat4tq commented Feb 8, 2018

Hi,

I was testing your filter against a set of XSS test inputs. It seems that your filter is still vulnerable to XSS such as with inputs that contain XSS payloads in the comment-type, anchor and image tags. Examples of one of each are:

<!--#exec cmd=""/usr/X11R6/bin/xterm ?display 127.0.0.1:0 &""-->
<a href="jAvAsCrIpT&colon;alert&lpar;1&rpar;">X</a>
/><img/onerror=\x09javascript:alert(1)\x09src=xxx:x />

A full report can be read in our paper, "Assessment of Dynamic Open-source Cross-site Scripting Filters as Security Devices in Web Applications". I kindly suggest that you add these tags to the blacklist to make it more robust against XSS.

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment