Skip to content

Instantly share code, notes, and snippets.

@mathiasbynens
Last active October 27, 2016 14:46
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mathiasbynens/5760113 to your computer and use it in GitHub Desktop.
Save mathiasbynens/5760113 to your computer and use it in GitHub Desktop.
Let’s create a JavaScript-compatible regular expression that matches any URL code point, as per the URL Standard.
// “The URL code points are ASCII alphanumeric, "!", "$", "&", "'", "(", ")",
// "*", "+", ",", "-", ".", "/", ":", ";", "=", "?", "@", "_", "~", and code
// points in the ranges U+00A0 to U+D7FF, U+E000 to U+FDCF, U+FDF0 to U+FFEF,
// U+10000 to U+1FFFD, U+20000 to U+2FFFD, U+30000 to U+3FFFD, U+40000 to
// U+4FFFD, U+50000 to U+5FFFD, U+60000 to U+6FFFD, U+70000 to U+7FFFD, U+80000
// to U+8FFFD, U+90000 to U+9FFFD, U+A0000 to U+AFFFD, U+B0000 to U+BFFFD,
// U+C0000 to U+CFFFD, U+D0000 to U+DFFFD, U+E1000 to U+EFFFD, U+F0000 to
// U+FFFFD, U+100000 to U+10FFFD.”
// — http://url.spec.whatwg.org/#url-code-points
// Let’s create a JavaScript-compatible regular expression that matches any URL
// code point, as per the above definition.
var regenerate = require('regenerate'); // http://mths.be/regenerate
var set = regenerate()
.addRange(0x0030, 0x0039) // ASCII digits
.addRange(0x0041, 0x005A).addRange(0x0061, 0x007A) // ASCII alpha
.add(
'!', '$', '&', '\'', '(', ')', '*', '+', ',', '-', '.', '/', ':', ';',
'=', '?', '@', '_', '~'
)
.addRange(0x00A0, 0xD7FF)
.addRange(0xE000, 0xFDCF)
.addRange(0xFDF0, 0xFFEF)
.addRange(0x10000, 0x1FFFD)
.addRange(0x20000, 0x2FFFD)
.addRange(0x30000, 0x3FFFD)
.addRange(0x40000, 0x4FFFD)
.addRange(0x50000, 0x5FFFD)
.addRange(0x60000, 0x6FFFD)
.addRange(0x70000, 0x7FFFD)
.addRange(0x80000, 0x8FFFD)
.addRange(0x90000, 0x9FFFD)
.addRange(0xA0000, 0xAFFFD)
.addRange(0xB0000, 0xBFFFD)
.addRange(0xC0000, 0xCFFFD)
.addRange(0xD0000, 0xDFFFD)
.addRange(0xE1000, 0xEFFFD)
.addRange(0xF0000, 0xFFFFD)
.addRange(0x100000, 0x10FFFD);
console.log(set.toString());
@rhgb
Copy link

rhgb commented Aug 8, 2014

Thanks, this script helps me a lot.
There's something in the generated string I don't understand. The pattern like [.....]|[.....], is that necessary? Can I simply replace that with a single [..........]?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment