trungnt13/regex.md

## regex.md

      
    Raw
  

              regex.md
            
          
    Regular Expressions


Python Regex

Basics:


.                    any character, except newline characters.


\d, \w, and \s   a digit, word, or space character, respectively.


\D, \W, and \S   anything except a digit, word, or space, respectively.


[abc]                any character between the brackets (such as a, b, ).


[^abc]               any character that isn’t between the brackets.


a|b                  matches either a or b.


[a-zA-Z]             alphabet letters


\b  Matches the boundary (or empty string) at the start and end of a word, that is, between \w and \W.


\B  Matches where \b does not, that is, the boundary of \w characters.


\A  Matches the expression to its right at the absolute start of a string whether in single or multi-line mode.


\Z  Matches the expression to its left at the absolute end of a string whether in single or multi-line mode.


Quantifiers:

?                    zero or one of the preceding group.
*                    zero or more of the preceding group.
+                    one or more of the preceding group.
{n}                  exactly n of the preceding group.
{n,}                 n or more of the preceding group.
{,m}                 0 to m of the preceding group.
{n,m}                at least n and at most m of the preceding p.
{n,m}?               or *? or +? performs a non-greedy match of the preceding p.

Flags:

^spam                means the string must begin with spam.
spam$                means the string must end with spam.

Groups:

( )                  matches whatever regular expression is inside the parentheses.
(? )                 ? acts as an extension notation, depends on the character immediately to its right.

(?:...)              non-capturing version of regular parentheses.
(?P<name>...)        matches whatever regular expression is inside the parentheses and gives the group the name name.
(?P=name)            matches whatever text was matched by the earlier group named name.
(?aiLmsux)           Here, a, i, L, m, s, u, and x are flags:

a  Matches ASCII only
i  Ignore case
L  Locale dependent
m  Multi-line
s  Matches all
u  Matches unicode
x  Verbose


(?#...)    A comment. Contents are for us to read, not for matching.
A(?=B)     Lookahead assertion. This matches the expression A only if it is followed by B.
A(?!B)     Negative lookahead assertion. This matches the expression A only if it is not followed by B.
(?<=B)A    Positive lookbehind assertion. This matches the expression A only if B is immediately to its left. This can only matched fixed length expressions.
(?<!B)A    Negative lookbehind assertion. This matches the expression A only if B is not immediately to its left. This can only matched fixed length expressions.
(...)\1    The number 1 corresponds to the first group to be matched. If we want to match more instances of the same expresion, simply use its number instead of writing out the whole expression again. We can use from 1 up to 99 such groups and their corresponding numbers.


\Y                   matches the Y'th captured group earlier in the regex.

\d+([_\-\.])\d+\1\d+: This matches 12-34-56, 12_34_56, 12.34.56, but not 12-34_56


Gitlab/Google Regex

https://ruby-doc.org/core-3.1.2/Regexp.html

A regular expression. Must start and end with /
Matches are found when using =~.
Matches are not found when using !~
Join variable expressions together with && or ||