Skip to content

Instantly share code, notes, and snippets.

@kangax
Last active August 29, 2015 13:57
Show Gist options
  • Star 8 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save kangax/9698100 to your computer and use it in GitHub Desktop.
Save kangax/9698100 to your computer and use it in GitHub Desktop.

RegExp.escape(string)

Computes a new version of a String value in which certain characters have been escaped, so that the regular expression engine will interpret any metacharacters that it may contain as character literals.

When the escape function is called with one argument string, the following steps are taken:

  1. Let string be ToString(string).
  2. ReturnIfAbrupt(string).
  3. Let length be the number of characters in string.
  4. Let R be the empty string.
  5. Let k be 0.
  6. Repeat, while k < length, 1. Let C be the character at position k within string. 1. If C is one of the 16 nonblank characters "-[]{}()*+?.,\^$|" then,
    1. Let S be a String containing two characters "\x" where x is a C character. 1. Else,
    2. Let S be a String containing the single C character. 1. Let R be a new String value computed by concatenating the previous value of R and S. 1. Increase k by 1.
  7. Return R.
@inexorabletash
Copy link

Rather than the explicit list in 6.2 you should define it in terms of SyntaxCharacter production in the RegExp grammar https://people.mozilla.org/~jorendorff/es6-draft.html#sec-patterns

Also, best to avoid "character" when discussing strings. Use "code units" like B.2.1.1 escape does.

@cscott
Copy link

cscott commented Mar 22, 2014

There's also a precendent in the EscapeRegExpPattern production, which is careful to specify the results without constraining the exact implementation too closely. On the other hand, specifying an exact result might be better...

@zloirock
Copy link

What about the backslash character?
Upd: It's lost in Markdown. Fix, please

@kangax
Copy link
Author

kangax commented Apr 10, 2014

@inexorabletash

Rather than the explicit list in 6.2 you should define it in terms of SyntaxCharacter production in the RegExp grammar

Good point. Only SyntaxCharacter doesn't have - and ,.

Also, best to avoid "character" when discussing strings. Use "code units" like B.2.1.1 escape does.

I thought about this as well, but saw that escape gives a list of characters as well, so left it as is. Or do you mean to rephrase "If char is the code point of one of the 16 nonblank characters"?

@zloirock

What about the backslash character?

Thanks, fixed :)

@cscott
Copy link

cscott commented May 22, 2014

@kangax re code unit, i think @inexorabletash was referring to step 3 and 6i. In particular, since regular expressions now have a "unicode" mode, you probably need to decide how you want to represent astral characters -- is that single "code point" encoded as two UTF-16 "code units" in an ordinary regexp, or as a single code point in a unicode mode regexp?

@rwaldron
Copy link

rwaldron commented Sep 8, 2014

@arv can you take a look at this?

@jdalton
Copy link

jdalton commented Feb 2, 2015

For reference here is lodash's implementation of lodash.escaperegexp.
The primary difference is it doesn't escape - or , because they fall out of escaping {} and [].
It also escapes / though I need to dig as to why it was added.

@zloirock
Copy link

@kangax any news about state of this proposal? I don't see it in this list, but it would be great to see it in ES7.

@benjamingr
Copy link

I made a repo here: https://github.com/benjamingr/RexExp.escape/blob/master/README.md any help would be appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment