andjc/1-unicode-js-regex.md

## 1-unicode-js-regex.md

      
    Raw
  

              1-unicode-js-regex.md
            
          
    Unicode-aware JavaScript regex (Unicode property escapes /\p{..}\P{..}/u) cheat sheet

Browser support MDN


✅ Chrome 64 & Edge 79
✅ Safari 11.1
✅ Firefox 78
✅ nodejs: 10.0
✅ babel

High level intro


\p{...} in a regex with /u flag is a positive match
\P{...} in a regex with /u flag is a negative match

Unicode-aware [a-zA-Z]

'Gérard'    .match(/\p{Letter}+/gu)     // ["Gérard"]
'Łódź'      .match(/\p{Letter}+/gu)     // ["Łódź"]
'Википедия' .match(/\p{Letter}+/gu)     // ["Википедия"]
Unicode-aware \w ([a-zA-Z0-9_]+)

/([\p{Alphabetic}\p{Mark}\p{Decimal_Number}\p{Connector_Punctuation}\p{Join_Control}]+)/gu
Unicode-aware \W (^\w)

/[^\p{Alphabetic}\p{Mark}\p{Decimal_Number}\p{Connector_Punctuation}\p{Join_Control}]/gu
Unicode-aware \b

⚠ WIP ⚠
\b is a word boundary for words defined like \w. AFAIU there's no short and good equivalent which works for non-ASCII.
You probably want to be more specific with your regex and rely on ^, $, \s and other characters.
Matching emoji

/\p{Emoji_Modifier_Base}\p{Emoji_Modifier}?|\p{Emoji_Presentation}|\p{Emoji}\uFE0F/gu
Sources


https://github.com/tc39/proposal-regexp-unicode-property-escapes#illustrative-examples
https://mathiasbynens.be/notes/es-unicode-property-escapes
https://github.com/mathiasbynens/regexpu-core/blob/master/property-escapes.md
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Unicode_Property_Escapes
https://javascript.info/regexp-unicode
https://github.com/mathiasbynens/es-regexp-unicode-character-class-escapes/blob/master/d-w-b.md
https://exploringjs.com/es2018-es2019/ch_regexp-unicode-property-escapes.html


## 2-js-regex-named-catpure-groups.md

      
    Raw
  

              2-js-regex-named-catpure-groups.md
            
          
    JavaScript RegEx named capture groups

Browser support caniuse


✅ Chrome 64 & Edge 79
✅ Safari 11.1
✅ Firefox 78
✅ nodejs: 10.0
✅ babel

Description

It's possible to name the capturing group (...) via (?<name>...) syntax and later access the match via match.groups.name, instead of match[0], match[1] etc.
var re = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
console.log(re.exec("1999-02-29").groups.year)