Skip to content

Instantly share code, notes, and snippets.

@CMCDragonkai
Last active November 13, 2024 04:06
Show Gist options
  • Save CMCDragonkai/6c933f4a7d713ef712145c5eb94a1816 to your computer and use it in GitHub Desktop.
Save CMCDragonkai/6c933f4a7d713ef712145c5eb94a1816 to your computer and use it in GitHub Desktop.
Regular Expression Engine Comparison Chart

Regular Expression Engine Comparison Chart

Many different applications claim to support regular expressions. But what does that even mean?

Well there are lots of different regular expression engines, and they all have different feature sets and different time-space efficiencies.

The information here is just copied from: http://regular-expressions.mobi/refflavors.html

But for some reason, it's not accessible unless you have a mobile phone user agent.

Go to the main site for lots of regular expression information and their commercial product called RegexBuddy.

Regular Expression Flavors (Engines)

  • JGsoft: This flavor is used by the Just Great Software products, including PowerGREP and EditPad Pro.
  • .NET: This flavor is used by programming languages based on the Microsoft .NET framework versions 1.x, 2.0 or 3.x. It is generally also the regex flavor used by applications developed in these programming languages.
  • Java: The regex flavor of the java.util.regex package, available in the Java 4 (JDK 1.4.x) and later. A few features were added in Java 5 (JDK 1.5.x) and Java 6 (JDK 1.6.x). It is generally also the regex flavor used by applications developed in Java.
  • Perl: The regex flavor used in the Perl programming language, versions 5.6 and 5.8. Versions prior to 5.6 do not support Unicode.
  • PCRE: The open source PCRE library. The feature set described here is available in PCRE 5.x and 6.x. PCRE is the regex engine used by the TPerlRegEx Delphi component and the RegularExrpessions and RegularExpressionsCore units in Delphi XE and C++Builder XE.
  • ECMA (JavaScript): The regular expression syntax defined in the 3rd edition of the ECMA-262 standard, which defines the scripting language commonly known as JavaScript.
  • Python: The regex flavor supported by Python's built-in re module.
  • Ruby: The regex flavor built into the Ruby programming language.
  • Tcl ARE: The regex flavor developed by Henry Spencer for the regexp command in Tcl 8.2 and 8.4, dubbed Advanced Regular Expressions.
  • POSIX BRE: Basic Regular Expressions as defined in the IEEE POSIX standard 1003.2.
  • POSIX ERE: Extended Regular Expressions as defined in the IEEE POSIX standard 1003.2.
  • GNU BRE: GNU Basic Regular Expressions, which are POSIX BRE with GNU extensions, used in the GNU implementations of classic UNIX tools.
  • GNU ERE: GNU Extended Regular Expressions, which are POSIX ERE with GNU extensions, used in the GNU implementations of classic UNIX tools.
  • XML: The regular expression flavor defined in the XML Schema standard.
  • XPath: The regular expression flavor defined in the XQuery 1.0 and XPath 2.0 Functions and Operators standard.

Applications Implementing a Regular Expression Flavor

  • AceText: Version 2 and later use the JGsoft engine. Version 1 did not support regular expressions at all.
  • awk: The awk UNIX tool and programming language uses POSIX ERE.
  • C#: As a .NET programming language, C# can use the System.Text.RegularExpressions classes, listed as ".NET" below.
  • Delphi for .NET: As a .NET programming language, the .NET version of Delphi can use the System.Text.RegularExpressions classes, listed as ".NET" below.
  • Delphi for Win32: Delphi for Win32 does not have built-in regular expression support. Many free PCRE wrappers are available.
  • EditPad Pro: Version 6 and later use the JGsoft engine. Earlier versions used PCRE, without Unicode support.
  • egrep: The traditional UNIX egrep command uses the "POSIX ERE" flavor, though not all implementations fully adhere to the standard. Linux usually ships with the GNU implementation, which use "GNU ERE".
  • grep: The traditional UNIX grep command uses the "POSIX BRE" flavor, though not all implementations fully adhere to the standard. Linux usually ships with the GNU implementation, which use "GNU BRE".
  • Emacs: The GNU implementation of this classic UNIX text editor uses the "GNU ERE" flavor, except that POSIX classes, collations and equivalences are not supported.
  • Java: The regex flavor of the java.util.regex package is listed as "Java" in the table below.
  • JavaScript: JavaScript's regex flavor is listed as "ECMA" in the table below.
  • MySQL: MySQL uses POSIX Extended Regular Expressions, listed as "POSIX ERE" in the table below.
  • Oracle: Oracle Database 10g implements POSIX Extended Regular Expressions, listed as "POSIX ERE" in the table below. Oracle supports backreferences \1 through \9, though these are not part of the POSIX ERE standard.
  • Perl: Perl's regex flavor is listed as "Perl" in the table below.
  • PHP: PHP's ereg functions implement the "POSIX ERE" flavor, while the preg functions implement the "PCRE" flavor.
  • PostgreSQL: PostgreSQL 7.4 and later uses Henry Spencer's "Advanced Regular Expressions" flavor, listed as "Tcl ARE" in the table below. Earlier versions used POSIX Extended Regular Expressions, listed as POSIX ERE.
  • PowerGREP: Version 3 and later use the JGsoft engine. Earlier versions used PCRE, without Unicode support.
  • PowerShell: PowerShell's built-in -match and -replace operators use the .NET regex flavor. PowerShell can also use the System.Text.RegularExpressions classes directly.
  • Python: Python's regex flavor is listed as "Python" in the table below.
  • R: The regular expression functions in the R language for statistical programming use either the POSIX ERE flavor (default), the PCRE flavor (perl = true) or the POSIX BRE flavor (perl = false, extended = false).
  • REALbasic: REALbasic's RegEx class is a wrapper around PCRE.
  • RegexBuddy: Version 3 and later use a special version of the JGsoft engine that emulates all the regular expression flavors in this comparison. Version 2 supported the JGsoft regex flavor only. Version 1 used PCRE, without Unicode support.
  • Ruby: Ruby's regex flavor is listed as "Ruby" in the table below.
  • sed: The sed UNIX tool uses POSIX BRE. Linux usually ships with the GNU implementation, which use "GNU BRE".
  • Tcl: Tcl's Advanced Regular Expression flavor, the default flavor in Tcl 8.2 and later, is listed as "Tcl ARE" in the table below. Tcl's Extended Regular Expression and Basic Regular Expression flavors are listed as "POSIX ERE" and "POSIX BRE" in the table below.
  • VBScript: VBScript's RegExp object uses the same regex flavor as JavaScript, which is listed as "ECMA" in the table below.
  • Visual Basic 6: Visual Basic 6 does not have built-in support for regular expressions, but can easily use the "Microsoft VBScript Regular Expressions 5.5" COM object, which implements the "ECMA" flavor listed below.
  • Visual Basic.NET: As a .NET programming language, VB.NET can use the System.Text.RegularExpressions classes, listed as ".NET" below.
  • wxWidgets: The wxRegEx class supports 3 flavors. wxRE_ADVANCED is the "Tcl ARE" flavor, wxRE_EXTENDED is "POSIX ERE" and wxRE_BASIC is "POSIX BRE".
  • XML Schema: The XML Schema regular expression flavor is listed as "XML" in the table below.
  • XPath: The regex flavor used by XPath functions is listed as "XPath" in the table below.
  • XQuery: The regex flavor used by XQuery functions is listed as "XPath" in the table below.

Feature Comparison

Characters
Feature JGsoft .NET Java Perl PCRE ECMA Python Ruby Tcl ARE POSIX BRE POSIX ERE GNU BRE GNU ERE XML XPath
Backslash escapes one metacharacter YES YES YES YES YES YES YES YES YES YES YES YES YES YES YES
\Q...\E escapes a string of metacharacters YES no Java 6 YES YES no no no no no no no no no no
\x00 through \xFF (ASCII character) YES YES YES YES YES YES YES YES YES no no no no no no
\n (LF), \r (CR) and \t (tab) YES YES YES YES YES YES YES YES YES no no no no YES YES
\f (form feed) and \v (vtab) YES YES YES YES YES YES YES YES YES no no no no no no
\a (bell) YES YES YES YES YES no YES YES YES no no no no no no
\e (escape) YES YES YES YES YES no no YES YES no no no no no no
\b (backspace) and \B (backslash) no no no no no no no no YES no no no no no no
\cA through \cZ (control character) YES YES YES YES YES YES no no YES no no no no no no
\ca through \cz (control character) YES YES no YES YES YES no no YES no no no no no no
Character Classes or Character Sets [abc]
Feature JGsoft .NET Java Perl PCRE ECMA Python Ruby Tcl ARE POSIX BRE POSIX ERE GNU BRE GNU ERE XML XPath
[abc] character class YES YES YES YES YES YES YES YES YES YES YES YES YES YES YES
[^abc] negated character class YES YES YES YES YES YES YES YES YES YES YES YES YES YES YES
[a-z] character class range YES YES YES YES YES YES YES YES YES YES YES YES YES YES YES
Hyphen in [\d-z] is a literal YES YES YES YES YES no no no no no no no no no no
Hyphen in [a-\d] is a literal YES no no no YES no no no no no no no no no no
Backslash escapes one character class metacharacter YES YES YES YES YES YES YES YES YES no no no no YES YES
\Q...\E escapes a string of character class metacharacters YES no Java 6 YES YES no no no no no no no no no no
\d shorthand for digits YES YES ascii YES ascii ascii option ascii YES no no no no YES YES
\w shorthand for word characters YES YES ascii YES ascii ascii option ascii YES no no YES YES YES YES
\s shorthand for whitespace YES YES ascii YES ascii YES option ascii YES no no YES YES ascii ascii
\D, \W and \S shorthand negated character classes YES YES YES YES YES YES YES YES YES no no YES YES YES YES
[\b] backspace YES YES YES YES YES YES YES YES YES no no no no no no
Dot
Feature JGsoft .NET Java Perl PCRE ECMA Python Ruby Tcl ARE POSIX BRE POSIX ERE GNU BRE GNU ERE XML XPath
. (dot; any character except line break) YES YES YES YES YES YES YES YES YES YES YES YES YES YES YES
Anchors
Feature JGsoft .NET Java Perl PCRE ECMA Python Ruby Tcl ARE POSIX BRE POSIX ERE GNU BRE GNU ERE XML XPath
^ (start of string/line) YES YES YES YES YES YES YES YES YES YES YES YES YES no YES
$ (end of string/line) YES YES YES YES YES YES YES YES YES YES YES YES YES no YES
\A (start of string) YES YES YES YES YES no YES YES YES no no no no no no
\Z (end of string, before final line break) YES YES YES YES YES no no YES YES no no no no no no
\z (end of string) YES YES YES YES YES no \Z YES no no no no no no no
\` (start of string) no no no no no no no no no no no YES YES no no
\' (end of string) no no no no no no no no no no no YES YES no no
Word Boundaries
Feature JGsoft .NET Java Perl PCRE ECMA Python Ruby Tcl ARE POSIX BRE POSIX ERE GNU BRE GNU ERE XML XPath
\b (at the beginning or end of a word) YES YES YES YES ascii ascii option ascii no no no YES YES no no
\B (NOT at the beginning or end of a word) YES YES YES YES ascii ascii option ascii no no no YES YES no no
\y (at the beginning or end of a word) YES no no no no no no no YES no no no no no no
\Y (NOT at the beginning or end of a word) YES no no no no no no no YES no no no no no no
\m (at the beginning of a word) YES no no no no no no no YES no no no no no no
\M (at the end of a word) YES no no no no no no no YES no no no no no no
\< (at the beginning of a word) no no no no no no no no no no no YES YES no no
\> (at the end of a word) no no no no no no no no no no no YES YES no no
Alternation
Feature JGsoft .NET Java Perl PCRE ECMA Python Ruby Tcl ARE POSIX BRE POSIX ERE GNU BRE GNU ERE XML XPath
| (alternation) YES YES YES YES YES YES YES YES YES no YES \| YES YES YES
Quantifiers
Feature JGsoft .NET Java Perl PCRE ECMA Python Ruby Tcl ARE POSIX BRE POSIX ERE GNU BRE GNU ERE XML XPath
? (0 or 1) YES YES YES YES YES YES YES YES YES no YES \? YES YES YES
* (0 or more) YES YES YES YES YES YES YES YES YES YES YES YES YES YES YES
+ (1 or more) YES YES YES YES YES YES YES YES YES no YES \+ YES YES YES
{n} (exactly n) YES YES YES YES YES YES YES YES YES \{n\} YES \{n\} YES YES YES
{n,m} (between n and m) YES YES YES YES YES YES YES YES YES \{n,m\} YES \{n,m\} YES YES YES
{n,} (n or more) YES YES YES YES YES YES YES YES YES \{n,\} YES \{n,\} YES YES YES
? after any of the above quantifiers to make it "lazy" YES YES YES YES YES YES YES YES YES no no no no no YES
Grouping and Backreferences
Feature JGsoft .NET Java Perl PCRE ECMA Python Ruby Tcl ARE POSIX BRE POSIX ERE GNU BRE GNU ERE XML XPath
(regex) (numbered capturing group) YES YES YES YES YES YES YES YES YES \( \) YES \( \) YES YES YES
(?:regex) (non-capturing group) YES YES YES YES YES YES YES YES YES no no no no no no
\1 through \9 (backreferences) YES YES YES YES YES YES YES YES YES YES no YES YES no YES
\10 through \99 (backreferences) YES YES YES YES YES YES YES YES YES no n/a no no n/a YES
Forward references \1 through \9 YES YES YES YES YES no no YES no no n/a no no n/a no
Nested references \1 through \9 YES YES YES YES YES YES no YES no no n/a no no n/a no
Backreferences non-existent groups are an error YES YES YES YES YES no YES no YES YES n/a YES YES n/a YES
Backreferences to failed groups also fail YES YES YES YES YES no YES YES YES YES n/a YES YES n/a YES
Modifiers
Feature JGsoft .NET Java Perl PCRE ECMA Python Ruby Tcl ARE POSIX BRE POSIX ERE GNU BRE GNU ERE XML XPath
(?i) (case insensitive) YES YES YES YES YES /i only YES YES YES no no no no no flag
(?s) (dot matches newlines) YES YES YES YES YES no YES (?m) no no no no no no flag
(?m) (^ and $ match at line breaks) YES YES YES YES YES /m only YES always on no no no no no no flag
(?x) (free-spacing mode) YES YES YES YES YES no YES YES YES no no no no no flag
(?n) (explicit capture) YES YES no no no no no no no no no no no no no
(?-ismxn) (turn off mode modifiers) YES YES YES YES YES no no YES no no no no no no no
(?ismxn:group) (mode modifiers local to group) YES YES YES YES YES no no YES no no no no no no no
Atomic Grouping and Possessive Quantifiers
Feature JGsoft .NET Java Perl PCRE ECMA Python Ruby Tcl ARE POSIX BRE POSIX ERE GNU BRE GNU ERE XML XPath
(?>regex) (atomic group) YES YES YES YES YES no no YES no no no no no no no
?+, *+, ++ and {m,n}+ (possessive quantifiers) YES no YES no YES no no no no no no no no no no
Lookaround
Feature JGsoft .NET Java Perl PCRE ECMA Python Ruby Tcl ARE POSIX BRE POSIX ERE GNU BRE GNU ERE XML XPath
(?=regex) (positive lookahead) YES YES YES YES YES YES YES YES YES no no no no no no
(?!regex) (negative lookahead) YES YES YES YES YES YES YES YES YES no no no no no no
(?<=text) (positive lookbehind) full regex full regex finite length fixed length fixed + alternation no fixed length no no no no no no no no
(?<!text) (negative lookbehind) full regex full regex finite length fixed length fixed + alternation no fixed length no no no no no no no no
Continuing from The Previous Match
Feature JGsoft .NET Java Perl PCRE ECMA Python Ruby Tcl ARE POSIX BRE POSIX ERE GNU BRE GNU ERE XML XPath
\G (start of match attempt) YES YES YES YES YES no no YES no no no no no no no
Conditionals
Feature JGsoft .NET Java Perl PCRE ECMA Python Ruby Tcl ARE POSIX BRE POSIX ERE GNU BRE GNU ERE XML XPath
(?(?=regex)then|else) (using any lookaround) YES YES no YES YES no no no no no no no no no no
(?(regex)then|else) no YES no no no no no no no no no no no no no
(?(1)then|else) YES YES no YES YES no YES no no no no no no no no
(?(group)then|else) YES YES no no YES no YES no no no no no no no no
Comments
Feature JGsoft .NET Java Perl PCRE ECMA Python Ruby Tcl ARE POSIX BRE POSIX ERE GNU BRE GNU ERE XML XPath
(?#comment) YES YES no YES YES no YES YES YES no no no no no no
Free-Spacing Syntax
Feature JGsoft .NET Java Perl PCRE ECMA Python Ruby Tcl ARE POSIX BRE POSIX ERE GNU BRE GNU ERE XML XPath
Free-spacing syntax supported YES YES YES YES YES no YES YES YES no no no no no YES
Character class is a single token YES YES no YES YES n/a YES YES YES n/a n/a n/a n/a n/a YES
# starts a comment YES YES YES YES YES n/a YES YES YES n/a n/a n/a n/a n/a no
Unicode Characters
Feature JGsoft .NET Java Perl PCRE ECMA Python Ruby Tcl ARE POSIX BRE POSIX ERE GNU BRE GNU ERE XML XPath
\X (Unicode grapheme) YES no no YES option no no no no no no no no no no
\u0000 through \uFFFF (Unicode character) YES YES YES no no YES u"string" no YES no no no no no no
\x{0} through \x{FFFF} (Unicode character) YES no no YES option no no no no no no no no no no
Unicode Properties, Scripts and Blocks
Feature JGsoft .NET Java Perl PCRE ECMA Python Ruby Tcl ARE POSIX BRE POSIX ERE GNU BRE GNU ERE XML XPath
\pL through \pC (Unicode properties) YES no YES YES option no no no no no no no no no no
\p{L} through \p{C} (Unicode properties) YES YES YES YES option no no no no no no no no YES YES
\p{Lu} through \p{Cn} (Unicode property) YES YES YES YES option no no no no no no no no YES YES
\p{L&} and \p{Letter&} (equivalent of [\p{Lu}\p{Ll}\p{Lt}] Unicode properties) YES no no YES option no no no no no no no no no no
\p{IsL} through \p{IsC} (Unicode properties) YES no YES YES no no no no no no no no no no no
\p{IsLu} through \p{IsCn} (Unicode property) YES no YES YES no no no no no no no no no no no
\p{Letter} through \p{Other} (Unicode properties) YES no no YES no no no no no no no no no no no
\p{Lowercase_Letter} through \p{Not_Assigned} (Unicode property) YES no no YES no no no no no no no no no no no
\p{IsLetter} through \p{IsOther} (Unicode properties) YES no no YES no no no no no no no no no no no
\p{IsLowercase_Letter} through \p{IsNot_Assigned} (Unicode property) YES no no YES no no no no no no no no no no no
\p{Arabic} through \p{Yi} (Unicode script) YES no no YES option no no no no no no no no no no
\p{IsArabic} through \p{IsYi} (Unicode script) YES no no YES no no no no no no no no no no no
\p{BasicLatin} through \p{Specials} (Unicode block) YES no no YES no no no no no no no no no no no
\p{InBasicLatin} through \p{InSpecials} (Unicode block) YES no YES YES no no no no no no no no no no no
\p{IsBasicLatin} through \p{IsSpecials} (Unicode block) YES YES no YES no no no no no no no no no YES YES
Part between {} in all of the above is case insensitive YES no no YES no no no no no no no no no no no
Spaces, hyphens and underscores allowed in all long names listed above (e.g. BasicLatin can be written as Basic-Latin or Basic_Latin or Basic Latin) YES no Java 5 YES no no no no no no no no no no no
\P (negated variants of all \p as listed above) YES YES YES YES option no no no no no no no no YES YES
\p{^...} (negated variants of all \p{...} as listed above) YES no no YES option no no no no no no no no no no
Named Capture and Backreferences
Feature JGsoft .NET Java Perl PCRE ECMA Python Ruby Tcl ARE POSIX BRE POSIX ERE GNU BRE GNU ERE XML XPath
(?<name>regex) (.NET-style named capturing group) YES YES no no no no no no no no no no no no no
(?'name'regex) (.NET-style named capturing group) YES YES no no no no no no no no no no no no no
\k<name> (.NET-style named backreference) YES YES no no no no no no no no no no no no no
\k'name' (.NET-style named backreference) YES YES no no no no no no no no no no no no no
(?P<name>regex) (Python-style named capturing group YES no no no YES no YES no no no no no no no no
(?P=name) (Python-style named backreference) YES no no no YES no YES no no no no no no no no
multiple capturing groups can have the same name YES YES n/a n/a no n/a no n/a n/a n/a n/a n/a n/a n/a n/a
XML Character Classes
Feature JGsoft .NET Java Perl PCRE ECMA Python Ruby Tcl ARE POSIX BRE POSIX ERE GNU BRE GNU ERE XML XPath
\i, \I, \c and \C shorthand XML name character classes no no no no no no no no no no no no no YES YES
[abc-[abc]] character class subtraction YES 2.0 no no no no no no no no no no no YES YES
POSIX Bracket Expressions
Feature JGsoft .NET Java Perl PCRE ECMA Python Ruby Tcl ARE POSIX BRE POSIX ERE GNU BRE GNU ERE XML XPath
[:alpha:] POSIX character class YES no no YES ascii no no YES YES YES YES YES YES no no
\p{Alpha} POSIX character class YES no ascii no no no no no no no no no no no no
\p{IsAlpha} POSIX character class YES no no YES no no no no no no no no no no no
[.span-ll.] POSIX collation sequence no no no no no no no no YES YES YES YES YES no no
[=x=] POSIX character equivalence no no no no no no no no YES YES YES YES YES no no
@atnak
Copy link

atnak commented Feb 9, 2018

FYI Hyphen in [\d-z] is a literal and Hyphen in [a-\d] is a literal appear to be YES / YES for ECMA.

'.-0Aa'.replace(/[a-\d]/g, 'x') --> ".xxAx"
'.-0Aa'.replace(/[\d-a]/g, 'x') --> ".xxAx"

@jandk
Copy link

jandk commented Aug 24, 2018

This looks pretty awesome. Could you add && for character classes? This is apparently a feature in Java. I was wondering which engines support this. Thanks!

@hamidb80
Copy link

From ECMAScript 2018 onwards, lookbehind assertions (even unbounded) are supported natively. https://github.com/tc39/proposal-regexp-lookbehind

@nuclight
Copy link

nuclight commented Feb 5, 2021

Hey, <name>regex things are not .NET, they are from Perl!

@masi
Copy link

masi commented Apr 11, 2021

@masi
Copy link

masi commented Apr 11, 2021

ECMA has named groups (? and can be back-refrenced \k
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Groups_and_Ranges

PHP supports a variety of styles:

Back references to the named subpatterns can be achieved by (?P=name) or, since PHP 5.2.2, also by \k or \k'name'. Additionally PHP 5.2.4 added support for \k{name} and \g{name}, and PHP 5.2.7 for \g and \g'name'.

@ajl000
Copy link

ajl000 commented Apr 30, 2021

@mahmoud-seleem
Copy link

I think java support, named-capturing groups 🤔

@george-computer-science
Copy link

george-computer-science commented Sep 5, 2022

I have basic knowledge of regex. It seems that there is no single compliant implementation as of 2022. It seems to me that maybe they are supersets of the standards. Maybe those in charge of the standardization are not catching up fast enough with the industry or they refuse to include features when there is some minimal agreement among the developers of these software products. What does not look good is the fact that none is identical to another. That means that developers and users must be always aware of this when building and using, respectively, applications that use regex and that also that inter-operate or interact with other applications.
The problem is greater for users as implementations can be (and usually are) black boxes for them.
Even if two people use the same product, let's say the same web browser, but version "a" of the browser uses a certain "flavor" of the regex engine and, version "b" of the browser uses another flavor or, simply introduced a change or whatever. They are not going to get the same result if the expression requires a feature that causes the engines to yield different results. T E R R I B L E !!!
But, if the application does not interact with anything that requires a matching engine, well, there is a benefit, a great one. Certain tasks can be done very quickly with such a compact notation.
Thanks to all the contributors. This page is UNIQUE and extremely HELPFUL. The people in charge of the standardization of regex should pay you all.

@dariocc
Copy link

dariocc commented Jan 17, 2023

This comparison table is being very useful. Thank you very much!

@teadrinker2015
Copy link

teadrinker2015 commented Apr 2, 2023

@dlqqq
Copy link

dlqqq commented Dec 11, 2023

Wow, thank you so much for documenting this. I do not see the comparison table on the regular-expressions.info website linked, even on mobile view. Is this now really the only freely-available reference that compares & outlines the features of each regex flavor?

I'm willing to help lead an effort to get this documented and hosted on a domain name. This information is too important to be lost!

@sancarn
Copy link

sancarn commented Jan 15, 2024

VBScript: VBScript's RegExp object uses the same regex flavor as JavaScript, which is listed as "ECMA" in the table below.

I'm not sure that's true anymore... There are many things VBScript does not allow which ECMA does I think...?

In fact, the regular expression flavor used in the version 5.5 VBScript object is the same one used by JavaScript and JScript. The regex flavor is part of the ECMA-262 standard for JavaScript. Therefore, everything said about JavaScript’s regular expression flavor in this book also applies to VBScript. JavaScript and VBScript implement Perl-style regular expressions. However, they lack quite a number of advanced features available in Perl and other modern regular expression flavors:
• No \A or \Z anchors to match the start or end of the string. Use a caret or dollar instead.
• Lookbehind is not supported at all. Lookahead is fully supported.
• No atomic grouping or possessive quantifiers
• No Unicode support, except for matching single characters with \uFFFF
• No named capturing groups. Use numbered capturing groups instead.
• No mode modifiers to set matching options within the regular expression.
• No conditionals.
• No regular expression comments.
Describe your regular expression with VBScript apostrophe comments instead, outside the regular expression string.
Version 1.0 of the RegExp object even lacks basic features like lazy quantifiers. This is the main reason this book does not discuss VBScript RegExp 1.0. All versions of Internet Explorer prior to 5.5 include version 1.0 of the RegExp object. There are no other versions than 1.0 and 5.5.

Src: Regular Expressions - The Complete Tutorial - Jan Goyvaerts

@1951FDG
Copy link

1951FDG commented Feb 1, 2024

Hi, what about re2?

@jidanni
Copy link

jidanni commented Apr 23, 2024

Perhaps add "what characters does \w match for each language."

@jubilatious1
Copy link

jubilatious1 commented Jun 8, 2024

What about Raku, the language formerly known as Perl6? Larry Wall and the rest of the crew re-wrote Perl's regex implementation to be more intuitive/powerful in Perl6. See:

https://docs.raku.org/language/regexes
https://docs.raku.org/
https://raku.org

Rakudo, the principal Raku interpreter/compiler, was released in 2015.
See: https://www.rakudo.org

@xzel23
Copy link

xzel23 commented Jun 25, 2024

Thanks for making this. The Java part seems outdated. Java has named capturing groups since Java 7. There's also \R that matches all line ends (LF, CR, and the different combinations in use) - I don't know if that exists in other regex dialects though.

Since I could not find any page listing the changes, I asked AI Assistant, I hope the information is valid and can help you updating this gist. This is what I got:

Here are some notable changes and additions to the Java Regex Implementation by various versions of Java:

  • Java 5.0 (2004): Named capturing groups feature is added. You can name the capturing groups in the form (?<name>X). Named-capturing group is still available via the group index.

    Example: Pattern.compile("(?<area>\\d+)-(?<prefix>\\d+)-(?<line>\\d+)")

  • Java 7 (2011): The UNICODE_CHARACTER_CLASS flag is introduced. This makes \b, \s, \w, \B, \S, \W, \d, \D match unicode characters, not only ASCII.

    Example: Pattern.compile("\\w", Pattern.UNICODE_CHARACTER_CLASS)

  • Java 8 (2014): A final new feature added to the regular expressions was the word boundary \b that now works on the Unicode.

  • Java 9 (2017): Introduction of \R as universal line end matcher. You can use \R to match any linebreak sequence.

    Example: Pattern.compile("\\R")

    This will match these line endings: \u000D\u000A|[\u000A\u000B\u000C\u000D\u001C\u001D\u001E\u0085\u2028\u2029]

This information is obtained from the different Java API documentations for these versions. Let me know if you need further information about how to use them etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment