Skip to content

Instantly share code, notes, and snippets.

@sverweij
Last active July 24, 2023 18:55
Show Gist options
  • Save sverweij/422a38153bcc1c3c3c6ec7d0a0a3e8ce to your computer and use it in GitHub Desktop.
Save sverweij/422a38153bcc1c3c3c6ec7d0a0a3e8ce to your computer and use it in GitHub Desktop.
Fixing a security problem: Polynomial regular expression used on uncontrolled data

Fixing a security problem: Polynomial regular expression used on uncontrolled data

Using cookies as an attack vector on your server

You have a cookie that stores a client's session id. To validate the session you probably need to check it against a server. An attacker might exploit this. E.g. with the big list of naughty strings [^0], or with a big string crafted for the occasion.

We're going to look at one such example and try to find a way to prevent these attacks from becoming a problem.

A sample cookie

A sample client session cookie might look like this:

s:98840c37-438d-4696-81aa-cc98857935e9.fitd2ove0zc5izs61r618p9xrs2se6bmi
  • a fixed string,
  • a :,
  • a uuidv4,
  • a . and
  • some junk.

For #reasons you only need the uuidv4 from this cookie to validate the customer's session on the server.

Initial approach

As the uuidv4 is neatly between a : and a . a regular expression that susses out anything sandwiched between those looks reasonable at first glance: /:(.*)\./. You test it and presto:

const cookieValue = `secret:98840c37-438d-4696-81aa-cc98857935e9.fitd2ove0zc5izs61r618p9xrs2se6bmi`;
console.dir(cookieValue.match(/:(.*)\./));
// [
//   ':98840c37-438d-4696-81aa-cc98857935e9.',
//   '98840c37-438d-4696-81aa-cc98857935e9', <== whoop whoop, we extracted a uuid !!
//   index: 6,
//   input: 's:98840c37-438d-4696-81aa-cc98857935e9.fitd2ove0zc5izs61r618p9xrs2se6bmi',
//   groups: undefined
// ]

Is it a safe regular expression though? Security conscious as you are you run it through safe-regex a regular expression safety checker used by many eslint plugins, a.o. eslint-plugin-security and eslint-plugin-unicorn.

import safeRegex from 'safe-regex'

safeRegex(/:(.*)\./);
// true <= it's safe according to safe-regex's checks

There's various things wrong with the expression, though:

  • it'll match the empty string (:. => ""), which might trigger unhandled edge cases
  • it'll include stuffin the junk part of the cookie we have no control over (s:98840c37-438d-4696-81aa-cc98857935e9.fitd2.ove0.zc5.izs61r618p9xrs2se6bmi => 98840c37-438d-4696-81aa-cc98857935e9.fitd2.ove0.zc5), which might make that a valid session is flagged as an erroneous one

That's not the worst, though. Even though this expression doesn't do evidently unsecure things by itself, it still can hang up your server.

An evil string

Your static code analyzer (if it's a good one) will flag the regular expression, and trace the string back to where it came from. It might even explain that a string with many repetitions of :a will take a long time to run - it might even use words like polynomial.

Because of the way the regular expression engine works, with this particular expression, it will evaluate every instance of :a.

We can validate this by running the following code and looking at the timing numbers it emits, (on a 2017 macbook pro: 7.5 seconds)

// node --test --test-reporter spec the.test.js
import { test } from "node:test";
import { equal } from "node:assert";

const STRING_FROM_HELL = `${":a".repeat(50000)}`;

test("baseline - /:(.*)\\./i", () => {
  const lMatch = STRING_FROM_HELL.match(/:(.*)\./i);
  equal(Boolean(lMatch), false);
});

// ✔ baseline - /:(.*)\./i (7734.777159ms)

Fixing this, attempt 1 - match the UUID

A first thought might be to attack the problem by making the pattern more specific, e.g. by replacing .* (match any character zero ore more times) with something that only matches uuid's1:

const moreSpecificRE = /:([a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12})\./i

Against the initial attack with many repetitions of :a, this indeed works wonders. With this new pattern the test above takes about 1 millisecond - ~7000x faster. However Evil Alice never sleeps and injecting 50k (or 500k) uuid's is just as easy as injecting 50k :a's. It turns out that 50k uuid's make the regex run for 18ms, and when you crank up the repetitions even higher, that number will rise as well, burning your precious AWS budget.

Fixing this, attempt 2 - prevent repetition

Being more specific evidently helps a bit, but doesn't attack the root cause, which is the regex engine repeatedly trying to match. By requiring it only matches from the start of the string, the problem should be gone:

const moreSpecificRE = /^s:([a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12})\./i

Against 50k :a's this runs in ~0.2 milliseconds, against 50k uuid's ~0.3 milliseconds, which is much more reasonable.

Try it for yourself

This gist includes a JavaScript test file you can run with node's native test runner (node 18 or up):

node --test --test-reporter spec polynomial.test.mjs

Which will yield something like this for output:

✔ alternative 1 - /^s:(.*)\./i (1.881229ms)
✔ alternative 2 - /^s:(.+)\./i (0.189199ms)
✔ alternative 3 - /^s:([^.]+)\./i (0.175063ms)
✔ alternative 4 - /^s:([a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12})\./i (0.198885ms)
✔ alternative 5 - /:([a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12})\./i (1.190747ms)
✔ alternative 5 against a 50k UUID's - /:([a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12})\./i (18.42299ms)
✔ alternative 4 against 50k UUID's - /^s:([a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12})\./i (0.35321ms)
✔ baseline - /:(.*)\./i (7818.769965ms)
ℹ tests 8
ℹ suites 0
ℹ pass 8
ℹ fail 0
ℹ cancelled 0
ℹ skipped 0
ℹ todo 0
ℹ duration_ms 7942.484351

Footnotes

  1. This expression could be even more specific as not all hex characters can occur on any position in a uuidv4. E.g. the second group of four will always start with a 4 - if you search for uuidv4 regex with your favourite search engine more elaborate ones will spring up like daisies in the summer.

/*
run with
node --test --test-reporter spec polynomial.test.mjs
*/
import { test } from "node:test";
import { equal } from "node:assert";
/*
Typical input strings would look like
secret:98840c37-438d-4696-81aa-cc98857935e9.fitd2ove0zc5izs61r618p9xrs2se6bmi
But Evil Alice is not interested in that:
*/
const STRING_FROM_HELL = ":a".repeat(50000);
const STRING_WITH_REPEATING_UUIDS =
":98840c37-438d-4696-81aa-cc98857935e9".repeat(50000);
// ensure it only matches the :yadda. at the start of the string,
// and not those in further down.
test("alternative 1 - /^s:(.*)\\./i", () => {
const lMatch = STRING_FROM_HELL.match(/^s:(.*)\./i);
equal(Boolean(lMatch), false);
});
// same as before, but disallow empty strings
test("alternative 2 - /^s:(.+)\\./i", () => {
const lMatch = STRING_FROM_HELL.match(/^s:(.+)\./i);
equal(Boolean(lMatch), false);
});
// same as before, but prevent matching against dots in the junk part of the cookie
// s:98840c37-438d-4696-81aa-cc98857935e9.fitd2.ove0.zc5.izs61r618p9xrs2se6bmi => 98840c37-438d-4696-81aa-cc98857935e9
test("alternative 3 - /^s:([^.]+)\\./i", () => {
const lMatch = STRING_FROM_HELL.match(/^s:([^.]+)\./i);
equal(Boolean(lMatch), false);
});
// same as before, but prevents that something _not_ a uuidv4 is sent to the server
// for validating s:hahaha-not-a-uuid-at-all:fitd2ove0zc5izs61r618p9xrs2se6bmi
test("alternative 4 - /^s:([a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12})\\./i", () => {
const lMatch = STRING_FROM_HELL.match(
/^s:([a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12})\./i
);
equal(Boolean(lMatch), false);
});
// When we now remove the ^s:, seemingly there's no problem with exponential
// timing anymore, given just our string from hell. This next test runs just fine:
test("alternative 5 - /:([a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12})\\./i", () => {
const lMatch = STRING_FROM_HELL.match(
/:([a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12})\./i
);
equal(Boolean(lMatch), false);
});
// However, Evil Alice never sleeps, and she's now sending 50k UUID's to the server:
test("alternative 5 against a 50k UUID's - /:([a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12})\\./i", () => {
const lMatch = STRING_WITH_REPEATING_UUIDS.match(
/:([a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12})\./i
);
equal(Boolean(lMatch), false);
});
// With the ^s: in place, there's no problem with exponential timing anymore, not even with 50k UUID's:
test("alternative 4 against 50k UUID's - /^s:([a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12})\\./i", () => {
const lMatch = STRING_WITH_REPEATING_UUIDS.match(
/^s:([a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12})\./i
);
equal(Boolean(lMatch), false);
});
test("baseline - /:(.*)\\./i", () => {
const lMatch = STRING_FROM_HELL.match(/:(.*)\./i);
equal(Boolean(lMatch), false);
});
// the baseline against 50k UUID's is so slow that it's not even funny.
// On my 2016 macbook this test took about 2.5 _minutes_.
// test("baseline against 50k UUID's - /:(.*)\\./i", () => {
// const lMatch = STRING_WITH_REPEATING_UUIDS.match(/:(.*)\./i);
// equal(Boolean(lMatch), false);
// });
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment