kiranparajuli589/REGEX_TRAINING_20210204.md

## REGEX_TRAINING_20210204.md

      
    Raw
  

              REGEX_TRAINING_20210204.md
            
          
    Regular Expression (Regex)

Regex is one of the most powerful, flexible, and efficient text processing approaches. Regex has its own terminologies, conditions and syntax; it is, in a sense, a mini programming language.
Regex can be used to add, remove, isolate and manipulate all kinds of text and data. It could be used a a simple text editor command, e.g. search and replace, or as it’s own powerful text-processing language. Because of that, Regex has so many applications in technology today, such as: Extract Useful Information With Web Crawlers, Data Scrapping and Web Scraping, Data Wrangling, and machine learning __ namely, natural language process and speech recognition.
Regex is not a programming language-specific application; in facet, it can be used in all programming languages today. Programming languages give support to the usage of Regex, but all the magic and strength comes from the Regex itself.
Using Regex can save the programmer a precious time that can be wasted on mundane tasks. Tasks such as:

looking for emails in a folder of files
removing repetition from a bunch of text files
analyzing the syntax of a specific language
highlighting some context in a file
and much much more

[-a-z0-9]+(\[-a-z0-9]+)*
Language Analogy

Full Regex is often composed of two basic characters:


metacharacters
Grammer of Regex


literals
Words of the language


Metacharacters

Types:

Metacharacters Class
Quantifiers
Position Metacharacters

Metacharacters Class

These types of metacharacters are used to match single characters, and they all start with\ to distinguish them from literals. Here is a table of the six possible Metacharacters classes:


METACHARACTER
NAME
WHAT IT MATCHES


\w
Word
Any word character a-z, A-Z or digits 0-9


\W
Non word
Any non-word character


\d
Digit
Any digit between 0-9


\D
Non digit
Anything that is not a digit between 0-9


\s
Whitespace
Whitespace characters, space, tab, newline


\S
Non whitespace
Non-whitespace characters


Quantifiers

These types of metacharacters are used to indicate the number of occurrences of a character in the pattern we are trying to match. Say we want to match both “Jessy” and “Jesy”, we would use one of the quantifiers to indicate that both options are acceptable. There are four types of quantifiers.


METACHARACTER
NAME
WHAT IT MATCHES


?
Question
Characters appearing zero or one time only


*
Star
Characters appearing zero or more time


+
Plus
Characters appearing one or more times


(min, max)
Specific Range
Characters appearing a within a range of times


Position Metacharacters

Position metacharacters are used to indicate the location of the character we are looking for. Is it a t the beginning of the text, at the end of the line, or a word? Is it at the beginning of the text, at the end of the line, or a word? to get this specific, we use position metacharacters.


METACHARACTER
NAME
WHAT IT MATCHES


^
Caret
A character in the start of the line


$
Dollar
A character in the end of the line


\<
Upper word boundary
A character in the start of the word


\>
Lower word boundary
A character in the end of the word


Meta-extras

That’s just what I call them — definitely not the official name —- these are some extra metacharacters that are used to join other metacharacters and literals.


METACHARACTER
NAME
WHAT IT MATCHES


[]
Square Bracket
A set of characters


.
Dot
Any one character


|
Or Operator
A character between tow or more options


()
parentheses
Used to group quantifiers


Literals

Literals are all words and characters that is not a metacharacter. For example,
“Automation”, “Regex”, “Hello” all these are literals.
Problem

A problem arises if I want to match one of the metacharacters, for example, say I want to match *, ^ characters, would should I do?
In this case, we use the escape character to Regex to explicitly indicate we want to match that character. So we type \^ or \\* instead of just ^ and *.
EXAMPLES:


Distro Enrollment

/(distroEnrollment Question )(\d{1,2})/g

distroEnrollment Question 2",
distroEnrollment Question 2
distroEnrollment Question 3
distroEnrollment Question 31


This regex validates German vehicle registration numbers. It includes 'H' for Oldtimers (Historic) and 'E' for electric. Futhermore it validates optional seasonal plates. For example for motorcycles or recreational vehicles.
/^([A-ZäÄÖÜ]{1,3})\-[ ]{0,1}([A-Z]{0,2})[ ]{0,1}([0-9]{1,4}[HE]{0,1})[ ]{0,1}([0-9]{0,2})[ ]{0,1}([0-9]{0,2})$/gm

ABC-DE 1234
ABC-DE 1234H
ABC-DE 1234E
ABC-DE 1234 04 10


Retails csv date wise validator
/Retails_OB_(?<YYYY>\d{4})(?<MM>\d{2})(?<DD>\d{2}).csv/gm

Retails OB.csv
Retails_OB_20200717.csv
Retails_OB_20200723.csv
Retails_OB_20200804.csv
Retails_OB_20200814.csv
Retails_OB_20200821.csv
Retails_OB_20200825.csv
Retails_OB_20200902.csv
Retails_OB_20200910.csv
Retails_OB_20200917.csv
Retails_OB_20200924.csv
Retails_OB_20200929.csv
Retails_OB_20201006.csv
Retails_OB_20201013.csv
Retails_OB_20201021.csv
Retails_OB_20201028.csv
Retails_OB_20201119.csv
Retails_OB_20201125.csv
Retails_OB_20201214.csv


Extract image path from thumbnail
/(?:image:\/\/)(?<Thumnail>.+)\/transform\?size=thumb$/gm

image://%2fhome%2fkeana%2fPictures%2fWallpapers%2fpawel-czerwinski-6lQDFGOB1iw-unsplash.jpg/transform?size=thumb


Validate (although, not recommended) URI scheme, and separate the URI syntax with multiple groups.
/(?:(?<Protocol>https?):\/\/)?(?:(?<Subdomain>[\w\.]+)?\.)?(?<Hostname>\w+)\.(?<Domain>\w+)\:?(?<Port>\d+)?(?<Path>\/.*)?/g

https://regex101.com:8000/api?test


Nums in filename
/(?<=_)(\d+)(?=\.jpg)/gm

tesT_3_1312.jpg
test3_v_32.jpg
v_32.jpg
wow_123_1234.jpg
METACHARACTER	NAME	WHAT IT MATCHES
\w	Word	Any word character a-z, A-Z or digits 0-9
\W	Non word	Any non-word character
\d	Digit	Any digit between 0-9
\D	Non digit	Anything that is not a digit between 0-9
\s	Whitespace	Whitespace characters, space, tab, newline
\S	Non whitespace	Non-whitespace characters
METACHARACTER	NAME	WHAT IT MATCHES
?	Question	Characters appearing zero or one time only
*	Star	Characters appearing zero or more time
+	Plus	Characters appearing one or more times
(min, max)	Specific Range	Characters appearing a within a range of times
METACHARACTER	NAME	WHAT IT MATCHES
^	Caret	A character in the start of the line
$	Dollar	A character in the end of the line
\<	Upper word boundary	A character in the start of the word
\>	Lower word boundary	A character in the end of the word
METACHARACTER	NAME	WHAT IT MATCHES
[]	Square Bracket	A set of characters
.	Dot	Any one character
\|	Or Operator	A character between tow or more options
()	parentheses	Used to group quantifiers