Skip to content

Instantly share code, notes, and snippets.

@seanconaty
Last active August 29, 2015 14:04
Show Gist options
  • Save seanconaty/34552342e864139f32ca to your computer and use it in GitHub Desktop.
Save seanconaty/34552342e864139f32ca to your computer and use it in GitHub Desktop.
Escape From HyperText

Escaping HTML can be tricky. There are may different ways to do it and many different ways to send the browser HTML whether it's part of the original document or a string in Javascript tag, a template script tag or a string in an AJAX response. Srings can also be user generated or not and you should be aware of which ones are an which ones aren't. Below I talk about different ways HTML can be escaped using our technology stack.

If you are going to put JSON in embedded <script> tags like:

Case 1

<html>
<script>
var myJson = {{ json }};
</script>
...
</html>

Then it NEEDS to be JS-escaped.

If you are rendering JSON to a response via AJAX or via an external JS file, escaping is not necessary because it won't be parsed as HTML.

Case 2

<script src="external.js" /><!-- this is OK unescaped -->
<script>
    var myJson = $.get('/api/content/'); // this is OK unescaped
</script>

You should NEVER EVER do Case 1. If you need to pass in values from the server. Use JSContext, this takes care of escaping things for you.

How Does the Escaping Work?

Let's say I was being malicious and an uninformed developer did Case 1 with unescaped JSON. It could look a like this, after it was rendered by the server.

<html>
<script>
// in the web app I maliciously set my title
var myJson = [{title: '</script><script src=evil.js />'}];
</script>
...
</html>

Since it's part of an HTML document, things are parsed as HTML first! The browser sees the closing script tag (and the subsequent opening script tag) in the JSON as different (albeit malformed) HTML <script> tags. Then it goes on to download and execute the evil JS.

Escaping works by replacing the dangerous HTML-parsing characters < > & with Unicode literals. They act just like newlines \n. Open up your JS console and try '\u0026' === '&' -- the strings are triple-equals equal. If you do document.write('\u0026') you will see a & on the page. If you do window.location = '\u0026' you will see & in the address bar. So long as the string is actually being used in JS as a string things are OK.

Could I Just Use HTML-escaping Instead?

That can get messy. You could do that but there are some limitations. This means using HTML entities &gt; &lt; in lieu of the Unicode literals to escape. This is what Jinja2 does for you out of the bag. That way if my title is </script><script src=evil.js /> and I'm outputting it as part of the HTML document (not JSON and outside of a <script> tag) I can simply do <p>Name: {{ title }}</p> and I'm safe.

This is limiting because sometimes you want to pass in legit HTML strings, like say a template:

{
   contactTemplate: '<h1><%- title -%></h1>', // we don't want this escaped, it's not user-generated
   contactName: 'blah' // we want this escaped, it is user-generated
}

and we could be getting data via AJAX, where the HTML isn't escaped. Then we have to keep track of what's escaped and what's not and remember that when we updating the DOM. It's the having to keep track that makes this solution messy.

The rule of thumb is, when escaping strings meant to be part of the HTML document, use HTML-escaping (Jinja2 does this for you automatically). When escaping JS strings to be used as part of a <script> tag in an HTML document, do not HTML-escape but do JS-escape with Unicode (JSContext does this for you automatically).

Other Random Escaping, a Refresher

jQuery

There are 2 methods in jQuery to manipulate the DOM one will escape HTML and one won't.

$('.somediv').html('<b>hi</b>');  // 'hi' will show as bolded. String is not HTML-escaped

$('.somediv').text('<b>hi</b>'); // string will be escaped, you will see the bold tags (or HTML entities if you view source)

// we don't want to strip the tags b/c a lot of times you want the HTML to show.
// Think about if you're talking about code or saying I <3 you.
Underscore

Likewise, Underscore templates have two tags for specifying whether its content should be HTML-escaped or not.

<script type="text/html>
   <%- unsafeString %><!-- will be escaped -->
   
   <%= safeString %><!-- will not be escaped -->
</script>

@octobeard actually added a git pre-commit hook that makes sure you're not using <%= unless you explicitly mark it as safe. This one is tough because prior to this hook it required you to remember this little nuance.

Jinja2

If you don't want Jinja2 to autoescape, simply do {{ some_var | safe }}. But make sure that the string is actually safe (either escaped before this point or not user-generated and not variable). You could also use the {% autoescape false %}{% endautoescape %} blocks.

Summary

Escape JSON if and only if embedded in an <script> tag in an HTML document. But NEVER do this because JSContext does it for you.

@akshayjshah
Copy link

<script>alert("Thanks!")</script>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment