Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Per-Context Sanitizer Functions
<?php
/**
* XSS protection function for HTML context only
* @usecases
* <title>use this function if output reflects here or as a content of any HTML tag.</title>
* e.g., <span>use this function if output reflects here</span>
* e.g., <div>use this function if output reflects here</div>
* @description
* Sanitize/Filter < and > so that attacker can not leverage them for JavaScript execution.
* @author Ashar Javed
* @Link https://twitter.com/soaj1664ashar
* @demo http://xssplaygroundforfunandlearn.netai.net/final.html
*/
function htmlContextCleaner($input) {
$bad_chars = array("<", ">");
$safe_chars = array("&lt;", "&gt;");
$output = str_replace($bad_chars, $safe_chars, $input);
return stripslashes($output);
}
/**
* XSS protection function for script context only. It does not support unquoted string literals.
* @usecases
* @double quoted case e.g.,
* <script> var searchquery = "use this function if output reflects here"; </script>
* @single quoted case e.g.,
* <script> var searchquery = 'use this function if output reflects here'; </script>
* @back-tick quoted case (ES6 Template Strings or Multi-line Strings) e.g.,
* <script> var searchquery = `use this function if output reflects here`; </script>
* see https://leanpub.com/understandinges6/read#leanpub-auto-template-strings for reference
* @description
* Sanitize/Filter meta or control characters that attacker may use to break the context e.g.,
* "; confirm(1); " OR '; prompt(1); // OR </script><script>alert(1)</script> OR `;alert(1);`
* \ and % are filtered because they may break the page e.g., \n or %0a
* The attacker may use \ in case of double injection points in JavaScript string literal context.
* % only cause syntax error.
* & is sanitized because of complex or nested context (if in use)
* File Descriptor (@filedescriptor) reported a ES6 based bypass making use of ES6 template string substitutions.
* Thanks to him for heads up. The template string substitutions starts with opening ${ and by keeping this in
* mind both characters are now converted into their harmless form.
* @author Ashar Javed
* @Link https://twitter.com/soaj1664ashar
* @demo http://xssplaygroundforfunandlearn.netai.net/final.html
*/
function scriptContextCleaner($input) {
$bad_chars = array("\"", "<", "'", "`", "$", "{", "\\\\", "%", "&");
$safe_chars = array("&quot;", "&lt;", "&apos;", "&grave;", "&dollar;", "&lbrace;", "&bsol;", "&percnt;", "&amp;");
$output = str_replace($bad_chars, $safe_chars, $input);
return stripslashes($output);
}
/**
* XSS protection function for an attribute context only. It does not support unquoted attribute values.
* Use quotes (either single or double) around attribute values.
* @usecases
* @double quoted case e.g.,
* <div class="use this function if output reflects here">attribute context</div>
* In above example class attribute have been used but it can be any like id or alt etc.
* @single quoted case e.g.,
* <input type='text' value='use this function if output reflects here'>
* @description
* Sanitize/Filter meta or control characters that attacker may use to break the context e.g.,
* "onmouseover="alert(1) OR 'onfocus='confirm(1) OR ``onmouseover=prompt(1)
* back-tick ` is filtered because old IE browsers treat it as a valid separator. The attacker may use
* `` or `\` in order to break the quoted attribute value but for exploitation it needs innerHTML assignment.
* Even in old IE compat view ` may be used as quoted attribute value. Credit to @garethheyes for finding and bypassing
* attributeContextCleaner funtion's old implementation in old IE browsers.
* @author Ashar Javed
* @Link https://twitter.com/soaj1664ashar
* @demo http://xssplaygroundforfunandlearn.netai.net/final.html
*/
function attributeContextCleaner($input) {
$bad_chars = array("\"", "'", "`");
$safe_chars = array("&quot;", "&apos;", "&grave;");
$output = str_replace($bad_chars, $safe_chars, $input);
return stripslashes($output);
}
/**
* XSS protection function for style context only. It does not support unquoted style attribute value.
* @usecases
* @double quoted case e.g.,
* <span style="use this function if output reflects here"></span>
* @single quoted case e.g.,
* <div style='use this function if output reflects here'></div>
* OR <style>use this function if output reflects here</style>
* @description
* Sanitize/Filter meta or control characters that attacker may use to execute JavaScript e.g.,
* ( is filtered because width:expression(alert(1))
* & is filtered in order to stop decimal + hex + HTML5 entity encoding
* < is filtered in case developers are using <style></style> tags instead of style attribute.
* < is filtered because attacker may close the </style> tag and then execute JavaScript.
* The function allows simple styles e.g., color:red, height:100px etc.
* @author Ashar Javed
* @Link https://twitter.com/soaj1664ashar
* @demo http://xssplaygroundforfunandlearn.netai.net/final.html
*/
function styleContextCleaner($input) {
$bad_chars = array("\"", "'", "(", "\\\\", "<", "&");
$safe_chars = array("&quot;", "&apos;", "&lpar;", "&bsol;", "&lt;", "&amp;");
$output = str_replace($bad_chars, $safe_chars, $input);
return stripslashes($output);
}
/**
* XSS protection function for URL context. Please use quoted (either single or double) attribute.
* @usecases
* <a href="use this function if output reflects here">click</a>
* <img src='use this function if output reflects here'>
* <iframe src="use this function if output reflects here">
* @description
* Only allows URLs that start with http(s) or ftp. e.g.,
* https://www.google.com
* Protection against JavaScript, VBScript and Data URI JavaScript code execution etc.
* @author Ashar Javed
* @Link https://twitter.com/soaj1664ashar
* @demo http://xssplaygroundforfunandlearn.netai.net/final.html
*/
function urlContextCleaner($url) {
if(preg_match("#^(?:(?:https?|ftp):{1})\/\/[^\"\'\s\\\\]*.[^\"\'\s\\\\]*$#iu",(string)$url,$match))
{
return $match[0];
}
else {
$noxss='javascript:void(0)';
return $noxss;
}
}
@cure53

This comment has been minimized.

Copy link

cure53 commented Jan 4, 2015

It is strongly recommended not to use this approach in any real-world application - aside from e.g. "XSS injection training grounds".

Three of the five functions allow for known and documented XSS bypasses and do not do their job as described. Furthermore, three (different) of five functions cripple perfectly harmless and sane user input and affect user experience for no reason without even any feedback.

The design and philosophy of this approach is bad. Using black-listing to protect against injection attacks, especially in an ever-evolving field of web applications and browsers is pointless and dangerous.

There is many well-tested anti-XSS solutions available for free. This one however does not qualify and should be avoided.

@ikarius6

This comment has been minimized.

Copy link

ikarius6 commented Jan 7, 2015

cure53: Can your prove your point?

@cure53

This comment has been minimized.

Copy link

cure53 commented Jan 7, 2015

@ikarius6 Yes, that point can be proven. I do believe however that the code itself proves the point. Study the regular expressions and have a closer look at the demo vectors at the HTML5 Security Cheatsheet and you will have all the proof you need.

If you have trouble spotting the bypasses as well as the unnecessary crippling of legitimate content feel free to get in touch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.