/\p{..}\P{..}/u
) cheat sheet
Unicode-aware JavaScript regex (Unicode property escapes MDN
Browser support✅ Chrome 64 & Edge 79✅ Safari 11.1✅ Firefox 78✅ nodejs: 10.0✅ babel
High level intro
\p{...}
in a regex with/u
flag is a positive match\P{...}
in a regex with/u
flag is a negative match
[a-zA-Z]
Unicode-aware 'Gérard' .match(/\p{Letter}+/gu) // ["Gérard"]
'Łódź' .match(/\p{Letter}+/gu) // ["Łódź"]
'Википедия' .match(/\p{Letter}+/gu) // ["Википедия"]
\w
([a-zA-Z0-9_]+
)
Unicode-aware /([\p{Alphabetic}\p{Mark}\p{Decimal_Number}\p{Connector_Punctuation}\p{Join_Control}]+)/gu
\W
(^\w
)
Unicode-aware /[^\p{Alphabetic}\p{Mark}\p{Decimal_Number}\p{Connector_Punctuation}\p{Join_Control}]/gu
\b
Unicode-aware \b
is a word boundary for words defined like \w
. AFAIU there's no short and good equivalent which works for non-ASCII.
You probably want to be more specific with your regex and rely on ^
, $
, \s
and other characters.
Matching emoji
/\p{Emoji_Modifier_Base}\p{Emoji_Modifier}?|\p{Emoji_Presentation}|\p{Emoji}\uFE0F/gu
Sources
- https://github.com/tc39/proposal-regexp-unicode-property-escapes#illustrative-examples
- https://mathiasbynens.be/notes/es-unicode-property-escapes
- https://github.com/mathiasbynens/regexpu-core/blob/master/property-escapes.md
- https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Unicode_Property_Escapes
- https://javascript.info/regexp-unicode
- https://github.com/mathiasbynens/es-regexp-unicode-character-class-escapes/blob/master/d-w-b.md
- https://exploringjs.com/es2018-es2019/ch_regexp-unicode-property-escapes.html