Skip to content

Instantly share code, notes, and snippets.

@slevithan
Created June 13, 2012 01:15
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save slevithan/2921183 to your computer and use it in GitHub Desktop.
Save slevithan/2921183 to your computer and use it in GitHub Desktop.
Dot matches code point
// Updated for XRegExp 3.0.0
// Make unescaped dots outside of character classes match any code point rather
// than code unit. Accounts for XRegExp's flag s (aka dotall or singleline).
XRegExp.addToken(/\./, function(match, scope, flags) {
return flags.indexOf("s") > -1 ?
"(?:[\ud800-\udbff][\udc00-\udfff]|[\0-\uffff])" :
"(?:[\ud800-\udbff][\udc00-\udfff]|[\0-\x09\x0b\x0c\x0e-\u2027\u202a-\uffff])";
});
@slevithan
Copy link
Author

It should be obvious, but this requires XRegExp (GitHub). Surrogate pairs are used to match code points beyond the BMP (U+010000 to U+10ffff).

Following are the four line breaks not matched by dots unless flag s is used:

  • U+00000a — Line feed — \n
  • U+00000d — Carriage return — \r
  • U+002028 — Line separator
  • U+002029 — Paragraph separator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment