Skip to content

Instantly share code, notes, and snippets.

@sreevinithaa
Last active June 20, 2022 03:31
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sreevinithaa/068b8ced6cc82082b96363ede923c031 to your computer and use it in GitHub Desktop.
Save sreevinithaa/068b8ced6cc82082b96363ede923c031 to your computer and use it in GitHub Desktop.

Regex tutorial

This tutorial will give a over view of rajex and covers main aspect of rajex.It will clearly explain you how you can use Rajex expression in your project.

Regular expressions (regex or regexp) are extremely useful in extracting information from any text by searching for one or more matches of a specific search pattern (i.e. a specific sequence of ASCII or unicode characters).Rajex can be used in almost all the programming language such as JavaScript, Java, VB, C #, C / C++, Python, Perl, Ruby, Delphi, R, Tcl, and many others.

Regular expressions use both basic and special characters. Basic characters are standard letters, numbers, and general keyboard characters, while all other characters are considered special.Special charecters are

   ?         Question Mark               Matches zero or one preceding character.
   *         Asterisk                    Matches zero or more preceding characters.
   +         Plus Sign                   Matches one or more preceding characters.
   \         Backslash                   BRE: Indicates the proceeding character is special.
                                         ERE: Indicates the proceeding character is basic.   
   []        Square Brackets             Creates a character group or range.
   ( )       Parentheses                 Creates a sequence or sub-expression.
   { }       Curly Braces                Creates a specific numerical quantifier range.
   ^         Caret                       Matches the beginning of a line.
   $         Dollar Sign                 Matches the end of a line.
   \b        Word Boundary               Matches a word boundary (a non-word character such as a space, tab, or period).
   .         Period                      Matches any single character.
   |         Pipe                        Logical OR operator.

Regular expressions, are a series of special characters that define a search pattern. Take the following example of a regular expression, which we’ll call “Matching an Email”:

/^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/

This series of characters might look like nonsense, but it’s actually a search pattern meant for basic email validation. That is, it checks to see if a string fulfills the requirements for an email. In a nutshell, here’s how it breaks down (we’ll explore it in more detail later):

  • The string can contain any lowercase letter between a–z before @ sign
  • 1st Capturing Group ([a-z0-9_\.-]+)
    • [a-z0-9_\.-] means
      • The string can contain any lowercase letter between a–z
      • The string can contain any lowercase letter between 0-9
      • The string can contain an underscore or hyphen or dot
    • + matches the previous token between one and unlimited times, as many times as possible, giving back as needed
  • @ matches the character @
  • 2nd Capturing Group ([\da-z\.-]+)
    • [\da-z\.-] means
      • \d matches a digit (equivalent to [0-9])
      • The string can contain any lowercase letter between a-z
      • The string can contain a hyphen
      • The string can contain\. matches the character .
    • + matches the previous token between one and unlimited times, as many times as possible, giving back as needed
  • \. matches the character .
  • 3rd Capturing Group ([a-z\.]{2,6})
    • [a-z\.] means
      • The string can contain any lowercase letter between a-z
      • The string can contain\. matches the character .
    • {2,6} matches the previous token between 2 and 6 times, as many times as possible, giving back as needed

Regular expressions can feel like their own language at times, but in fact they are universal and can be used within all programming languages. Let's break down the preceding “Matching an Email” regex in order to explore regex components in general.

Summary

Hope you would get a basic idea of Regular expressions and its syntax.I would like to share some of usefull Regular expressions below.

Digit

  • Whole Numbers – /^\d+$/
  • Decimal Numbers – /^\d*\.\d+$/
  • Whole + Decimal Numbers – /^\d*(\.\d+)?$/
  • Negative, Positive Whole + Decimal Numbers – /^-?\d*(\.\d+)?$/
  • Whole + Decimal + Fractions – /[-]?[0-9]+[,.]?[0-9]*([\/][0-9]+[,.]?[0-9]*)*/

Alphanumeric Characters

  • Alphanumeric without space – /^[a-zA-Z0-9]*$/
  • Alphanumeric with space – /^[a-zA-Z0-9 ]*$/

Email

  • Common email Ids – /^([a-zA-Z0-9._%-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,6})*$/
  • Uncommon email ids – /^([a-z0-9_\.\+-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/

Password Strength

  • Complex: Should have 1 lowercase letter, 1 uppercase letter, 1 number, 1 special character and be at least 8 characters long
    /(?=(.*[0-9]))(?=.*[\!@#$%^&*()\\[\]{}\-_+=~`|:;"'<>,./?])(?=.*[a-z])(?=(.*[A-Z]))(?=(.*)).{8,}/ 
    
  • Moderate: Should have 1 lowercase letter, 1 uppercase letter, 1 number, and be at least 8 characters long
    /(?=(.*[0-9]))((?=.*[A-Za-z0-9])(?=.*[A-Z])(?=.*[a-z]))^.{8,}$/
    

Username

  • Alphanumeric string that may include _ and – having a length of 3 to 16 characters – /^[a-z0-9_-]{3,16}$/

URL

  • Include http(s) Protocol
    /https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9@:%_\+.~#()?&//=]*)/ 
    
  • Protocol Optional
    /(https?:\/\/)?(www\.)?[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9@:%_\+.~#?&//=]*)/ 
    

Dates

  • Date Format YYYY-MM-dd using separator - /([12]\d{3}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01]))/

  • Date Format dd-MM-YYYY using separators - or . or /

    /^(?:(?:31(\/|-|\.)(?:0?[13578]|1[02]))\1|(?:(?:29|30)(\/|-|\.)(?:0?[1,3-9]|1[0-2])\2))(?:(?:1[6-9]|[2-9]\d)?\d{2})$|^(?:29(\/|-         |\.)0?2\3(?:(?:(?:1[6-9]|[2-9]\d)?(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00))))$|^(?:0?[1-9]|1\d|2[0-         8])(\/|-|\.)(?:(?:0?[1-9])|(?:1[0-2]))\4(?:(?:1[6-9]|[2-9]\d)?\d{2})$/
    
  • Date Format dd-mmm-YYYY using separators - or . or /

       /^(?:(?:31(\/|-|\.)(?:0?[13578]|1[02]|(?:Jan|Mar|May|Jul|Aug|Oct|Dec)))\1|(?:(?:29|30)(\/|-|\.)(?:0?[1,3-9]|1[0-2]|(?:Jan|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec))\2))(?:(?:1[6-9]|[2-9]\d)?\d{2})$|^(?:29(\/|-|\.)(?:0?2|(?:Feb))\3(?:(?:(?:1[6-9]|[2-9]\d)?(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00))))$|^(?:0?[1-9]|1\d|2[0-8])(\/|-|\.)(?:(?:0?[1-9]|(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep))|(?:1[0-2]|(?:Oct|Nov|Dec)))\4(?:(?:1[6-9]|[2-9]\d)?\d{2})$/
       
    

Time

  • Time Format HH:MM 12-hour, optional leading 0 - /^(0?[1-9]|1[0-2]):[0-5][0-9]$/
  • Time Format HH:MM 12-hour, optional leading 0, Meridiems (AM/PM) - /((1[0-2]|0?[1-9]):([0-5][0-9]) ?([AaPp][Mm]))/
  • Time Format HH:MM 24-hour with leading 0 - /^(0[0-9]|1[0-9]|2[0-3]):[0-5][0-9]$/
  • Time Format HH:MM 24-hour, optional leading 0 - /^([0-9]|0[0-9]|1[0-9]|2[0-3]):[0-5][0-9]$/
  • Time Format HH:MM:SS 24-hour - /(?:[01]\d|2[0123]):(?:[012345]\d):(?:[012345]\d)/

Enjoy write your own Regular expressions and test it on https://regex101.com/.

Table of Contents

Regex Components

A regex is considered a literal, so the pattern must be wrapped in slash characters (/). If we examine the “Matching an Email” regex, you'll see that this is true:

/^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/

Now let's take a look at the components of a regex.

Grouping Constructs

Grouping constructs describe sub-expressions of a regular expression and capture substrings of an input string. The following lists the grouping constructs.

(expr) Match or capture group. Captures the information that matches the expression in parentheses (?:expr) Non-capturing group. Groups the contained expressions together (e.g., to apply a quantifier to multiple symbols at once), but does not restrict the information to be captured to only that group. (?=expr) Captures information that is followed by the expression if the expression is true and the input matches the pattern that follows this expression. (?) Named capture group.* \k Named back reference. *

Example

a(bc)          parentheses create a capturing group with value bc 
a(?:bc)*       using ?: we disable the capturing group
a(?<foo>bc)    using ?<foo> we put a name to the group 

If we examine the “Matching an Email” regex, you'll see there are three groups which explain sub expression.They are ([a-z0-9_\.-]+) ([\da-z\.-]+) ([a-z\.]{2,6})

if you see the example first group start with ^.After first group we need @ sign then looking for a match for second group.After we got match for second group we can see \. matches the character . .and then join with third group of expression then we are ending with $ sign.Lets look at what is mean by ^ at the beginning and $ at the end in the next section.

Anchors

Anchors do not match any character at all. Instead, they match a position before, after, or between characters. They can be used to “anchor” the regex match at a certain position.

  ^       Caret         Matches the beginning of a line
  $       Dollar sign   Matches the end of a line.
  \b      Word Boundary Matches a word boundary (a non-word character such as a space, tab, or period).
  

Example 1

Below is a regex expression looking for any input that start with 25.

^25

Example 2

Below is a regex expression looking for any input that ends with 0.

0$

Example 3

Below is a regex expression looking for any input that includes the characters from a to z.

Expression: [a-z] Input: 123test!

Since there are no anchor characters in the expression, the input 123test! would match the expression because it includes letters within the a-z range: t, e, s or t. As your goal is to exclude any characters that are not letters from the a to z range, we need to add assertions to the expression. Update your expression as shown below, which will match the beginning and end of the expression to letters only.

^[a-z]$

If we examine the “Matching an Email” regex, you'll see the characters ^ and $ are both considered to be anchors.The ^ anchor signifies a string that begins with the characters that follow it.So in our “Matching an Email” regex, the string must start with something that matches the pattern ([a-z0-9_\.-]+).The $ anchor signifies a string that ends with the characters that precede it. So in our “Matching an Email” regex, the string must end with something that matches the pattern ([a-z\.]{2,6}).Lets split the group expression and see what does [a-z0-9_\.-] mean in next section.

Bracket Expressions

A bracket expression is either a matching list expression or a non-matching list expression. It consists of one or more expressions: ordinary characters, collating elements, collating symbols, equivalence classes, character classes, or range expressions.

Example

[abc]          matches a string that has either an a or a b or c
[a-c]          matches a string that has either an a or a b or c
[a-fA-F0-9]    a string that represents a single hexadecimal digit, case insensitively
[0-9]%         a string that has a character from 0 to 9 before a %
[^a-zA-Z]      a string that has not a letter from a to z or from A to Z. In this case the ^ is used as negation of the expression

Character Classes

With a “character class”, also called “character set”, you can tell the regex engine to match only one out of several characters. Simply place the characters you want to match between square brackets. If you want to match an a or an e, use [ae]. You could use this in gr[ae]y to match either gray or grey.

If we examine the “Matching an Email” regex, you'll see the bracket expression [a-z0-9_\.-] you have character class a-z(which means you can have charectors between a-z) you have charector class 0-9(which means can have numbers 0-9 ), you can have charector _ , - and..

Quantifiers

Quantifiers specify how many instances of a character, group, or character class must be present in the input for a match to be found.

 *	         Asterisk         Match zero or more times.
 +	         Plus Sign        Match one or more times.
 ?	         Question Mark    Match zero or one time.
 { n }         Curly Braces     Match exactly n times.
 { n ,}        Curly Braces     Match at least n times.
 { n , m }     Curly Braces     Match from n to m times.
 

Example 1

For example, you might want to match as many characters in the group [a-zA-Z] as possible, as long as at least two letter is present. In this case, you can use the plus sign character after the group to match one or more of the preceding characters.

[a-zA-Z]{2}+

Example 2

No quantifier is needed for the [47] group in our credit card filter since this will match one of those characters by default. There are 15 numbers in an American Express card, so excluding the two we already referenced in our expression, 13 numbers remain. Therefore, the final group should match 13 digits exactly. The full expression for our credit card example is below.

^3[47][0-9]{13}$

If we examine the “Matching an Email” regex, you'll see Quantifiers + and {2,6}.+ means can have one or more matching group.{2,6} means can have matyching group from 2 to 6 times.

The OR Operator

The alternation operator operates on the largest possible surrounding regular expressions. (Put another way, it has the lowest precedence of any regular expression operator.) Thus, the only way you can delimit its arguments is to use grouping.

Example

if `(' and `)' are the open and close-group operators, then `fo(o|b)ar' would match either `fooar' or `fobar'. (`foo|bar' would match `foo' or `bar'.)

Flags

Regular expression patterns are often used with modifiers (also called flags) that redefine regex behavior. Regex modifiers can be regular (e.g. /abc/i) and inline (or embedded) (e.g. (?i)abc). The most common modifiers are global, case-insensitive, multiline and dotall modifiers. However, regex flavors differ in the number of supported regex modifiers and their types.

Example

g          Global. Finds all matches instead of stopping after the first.
i          Ignore case. /[a-z]/i is equivalent to /[a-zA-Z]/.
m          Multiline. ^ and $ match the beginning and end of each line respectively treating \n and \r as delimiters instead                       of simply the beginning and end of the entire string.
u          Unicode. If this flag is not supported you must match specific Unicode characters with \uXXXX where XXXX is the                         character's value in hexadecimal.
y          Finds all consecutive/adjacent matches.

Character Escapes

The backslash in a regular expression precedes a literal character. You also escape certain letters that represent common character classes, such as \w for a word character or \s for a space.

Example

\\             single backslash
\A             start of a string
\b             word boundary. The zero-length string between \w and \W or \W and \w.
\B             not at a word boundary
\cX            ASCII control character
\d             single digit [0-9]
\D             single character that is NOT a digit [^0-9]
\l             match a single lowercase letter [a-z]
\L             single character that is not lowercase [^a-z]
\s             single whitespace character
\S             single character that is NOT white space
\u             single uppercase character [A-Z]
\U             single character that is not uppercase [^A-Z]
\w             word character [a-zA-Z0-9_]
\W             single character that is NOT a word character [^a-zA-Z0-9_]
\t             tab
\n             linefeed
\r             carriage return

Author

Written by Vinitha Gowtheepan - Full stack developer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment