mikesamuel/README.md

## README.md

      
    Raw
  

              README.md
            
          
    Options for Hardening React &| JSX


Background
Problem
Options

Before Desugar
After Desugar
Mutate CreateElement
Mutate Reconciler
Hybrid of Before Desugar and Mutate Reconciler


Prior Art

Background

There is an impression that React resists XSS.  For example

If you use React.js, you never manipulate the DOM, so your app is secured against XSS attacks, too.

Many have pointed out that it's not perfect.  Bernhard Mueller summarizes well:

Script injection issues can result from bad programming practices including the following:

Creating React components from user-supplied objects;
Rendering links with user-supplied href attributes, or other HTML tags with injectable attributes (link tag, HTML5 imports);
Explicitly setting the dangerouslySetInnerHTML prop of an element;
Passing user-supplied strings to eval().


Number 1 was since mitigated by using a well-known symbol to mark DOM elements.
Number 4 is neither specific to nor (AFIACT) especially bad among React apps.
Similarly, number 3 is no worse than the HTMLElement.prototype.innerHTML setter so is not really specific to React either.  (Though it could be further mitigated by changing dangerouslySetInnerHTML to use a symbol to avoid mass assignment and/or requiring its input to be trademarked a la trusted types)
The rest of this document will address number 2 and Emilia Smith's related vector.  Specifically,
const payload = 'javascript:alert(1)'
const problems = [
  <script>{payload}</script>,
  <a href={payload}>link</a>,
]
Problem

JSX avoids many quoting confusion problems because it uses an abstract syntax tree as its intermediate form.  Instead of producing a string:
(<script>{payload}</script>)
=== React.createElement("script", null, payload)
===  // roughly
{
    "type": "script",
    "key": null,
    "ref": null,
    "props": {
        "children": "javascript:alert(1)"
    },
    "_owner": null,
    "_store": {}
}
The problem is that many text nodes and attribute values in HTML are specified in micro languages, but the abstract syntax tree approach
does not extend past language embedding boundaries.  Embedded languages include:


element
member
language


<a>
href attribute
URL and via data: others


*
style attribute
CSS


<script>
#text children
JavaScript


<iframe>
srcdoc attribute
HTML


...
...
...


(And some of these languages can specify code that is trusted to operate alongside sensitive data and authority-granting credentials)
Options

The typical flow of control from specifying a document fragment to insertion into the DOM is:

Babel (or equiv) transpiles JSX into calls to JS via the JSX parser plugin which feeds the corresponding transform plugin
React.createElement produces a tree like structure.  Attribute processing seems to happen at
// Remaining properties are added to a new props object
for (propName in config) { ...

ReactDOM or the React Native equivalent updates part of the DOM/View to mirror the tree structure.
const DOMRenderer = ReactFiberReconciler({ ... })


This suggests four broad strategies which I summarize here and will address individually with pros&cons below.
I then recommend a hybrid approach that might balance the concerns raised.

before desugar: Hook into the transpilation pipeline to find JSXAttributeValue, JSXSpreadAttribute, and JSXChildren nodes.
after desugar: Hook into the transpilation pipeline to find calls to React.createElement.
mut create element: Hook or monkey patch into React.createElement to either intercept the inputs or check the output.
mut reconciler: Hook or monkey patch ReactDOM and the React Native equivalent to check inputs.

Before Desugar

If we see
(<a href={payload}>)
use our knowledge of HTML tags and attributes to wrap {x} interpolations in functions that can check values.
(<a href={someLibrary.requireSafeUrl(payload)}>)
The context that payload is interpolated at the beginning of an href attribute in an a tag could be passed
to a generic intercession function, or could be implicit in the function chosen.
Pros

Could differentiate between literal values that come from a trusted developer.  For example, we could trust the URL in <a href="javascript:alert(1)"> even if we wouldn't trust <a href={x}> when x is the same at runtime.
Polymer resin does this.
Pug uses constantinople to distinguish constant attribute values from expressions, but mainly to identify opportunities for optimization.

Cons

Requires coordinating changes across multiple transpilers.  Clients of legacy transpilers default to unsafe.
Does not support React without JSX
Lacks context to handle <>var x = {payload}</> when injected into an existing <script> element.
Logic that specifies that <a title> is safe but that <a href> needs mitigation would have to be available to the transpiler.
This would require the transpiler to know whether to assume a React DOM or Native target.
Might interfere with elements that are never destined for the DOM, e.g. if an element is used to compose an XML POST body.

After Desugar

Find React.createElement calls and rewrite arguments to achieve the same effect as before desugar.
Cons

Most of the same as before desugar.

Meh

Slightly better support for React without JSX if clients happen to use a transpiler with the plugin turned on.  This seems of marginal benefit.

Mutate CreateElement

We could monkey-patch or edit React.createElement to apply filter functions like requireSafeUrl to either the input or the output.
Pros

Handles React without JSX use case

Cons

Lacks context about which attribute and element names are specified literally and which might be attacker-controlled inputs.
Similar to before desugar, we need to know at runtime whether elements are destined for the DOM or for a native View.
Again similarly, we need to make this assumption globally.

Mutate Reconciler

We could monkey-patch or edit ReactDOM and React Native's root view to check the received output of React.createElement.
Pros

Have definitive context on the kind of syntax tree we need to vet

Cons

Lack context to differentiate attribute values from a trusted developer from those that might be attacker controlled.

Meh

Need to maintain two separate sets of vetting logic hooks: one for sensitive IDL properties and one for View sinks that can be targeted by malicious intents.  There's unlikely to be a lot of overlap in the actual vetting logic so this might not be a con.

Suggestion: Hybrid of Before Desugar and Mutate Reconciler

To avoid the cons of both Before Desugar and Mutate Reconciler, we could

Desugar string literals in JSXAttributeValue and JSXText nodes so that they are clearly marked as specified by a trusted developer.
Perhaps transpile
(<a href="javascript:alert(1)">link</a>)
to
React.createElement(
  "a",
  { href: React.literal("javascript:alert(1)") },
  React.literal("link")
);
where we define a new runtime function like
React.literal = (content) => Object.freeze({
  content: String(content),
  [Symbol.for('React.literal')]: true,
  toString () { return this.content }
})
// content should be the value after decoding HTML character references.

Add hooks to reconcilers so that a policy can intercept and check values before they reach IDL/Native sinks
but treat values marked with the Symbol React.literal as privileged.

This still has
Cons

Requires coordinating changes across multiple transpilers.

but clients of legacy transpilers that use a modern reconciler default to safe.
Prior Art

Contextually autoescaped template systems solve the problems of nested languages by using context interpolations that involve nested languages.
Polymer resin uses the same kind of logic to intercept and check values before they reach powerful IDL properties like HTMLAElement.prototype.href but in a way that allows for flexible, type-safe exceptions based on "Securing the Tangled Web".  It's been shown to be a viable migration target within Google and outside, notably on Gerrit where it protects source-base integrity.
element	member	language
`<a>`	`href` attribute	URL and via `data:` others
*	`style` attribute	CSS
`<script>`	#text children	JavaScript
`<iframe>`	`srcdoc` attribute	HTML
...	...	...