Skip to content

Instantly share code, notes, and snippets.

@andjc
Forked from jakub-g/1-unicode-js-regex.md
Created December 22, 2022 08:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save andjc/70a3c3aafa176185d0160ac0b93b630e to your computer and use it in GitHub Desktop.
Save andjc/70a3c3aafa176185d0160ac0b93b630e to your computer and use it in GitHub Desktop.
Unicode-aware JavaScript regex cheat sheet

Unicode-aware JavaScript regex (Unicode property escapes /\p{..}\P{..}/u) cheat sheet

Browser support MDN

High level intro

  • \p{...} in a regex with /u flag is a positive match
  • \P{...} in a regex with /u flag is a negative match

Unicode-aware [a-zA-Z]

'Gérard'    .match(/\p{Letter}+/gu)     // ["Gérard"]
'Łódź'      .match(/\p{Letter}+/gu)     // ["Łódź"]
'Википедия' .match(/\p{Letter}+/gu)     // ["Википедия"]

Unicode-aware \w ([a-zA-Z0-9_]+)

/([\p{Alphabetic}\p{Mark}\p{Decimal_Number}\p{Connector_Punctuation}\p{Join_Control}]+)/gu

Unicode-aware \W (^\w)

/[^\p{Alphabetic}\p{Mark}\p{Decimal_Number}\p{Connector_Punctuation}\p{Join_Control}]/gu

Unicode-aware \b

⚠ WIP ⚠

\b is a word boundary for words defined like \w. AFAIU there's no short and good equivalent which works for non-ASCII. You probably want to be more specific with your regex and rely on ^, $, \s and other characters.

Matching emoji

/\p{Emoji_Modifier_Base}\p{Emoji_Modifier}?|\p{Emoji_Presentation}|\p{Emoji}\uFE0F/gu

Sources

JavaScript RegEx named capture groups

Browser support caniuse

Description

It's possible to name the capturing group (...) via (?<name>...) syntax and later access the match via match.groups.name, instead of match[0], match[1] etc.

var re = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
console.log(re.exec("1999-02-29").groups.year)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment