Read the basics of Regular Expressions on Devopedia site.
The rest of this document shows examples of regex for the purpose of learning. We follow PCRE (PHP) flavour. You may use Regex101 to try out these examples online.
For the purpose of this tutorial, we use the following format:
- Search:
input --> /regex/modifier --> result
- Search & Replace:
input --> /regex/replace/modifier --> output
Where input, output or result strings end or begin with whitespace, we'll put them within a pair of double quotes for readability. These quotes are not part of the strings.
For most examples, we use the global modifier g
.
Characters and character classes:
1. Hello World --> /o/g --> 2 matches: o, o
2. Hello World --> /l/g --> 3 matches: l, l, l
3. Hello World --> /[A-Z][a-z]/g --> 2 matches: He, Wo
4. Hello World_Or_Planet --> /\W/g --> 1 match: 1 space
5. $22.50 --> /\d/g --> 4 matches: 2, 2, 5, 0
6. R G B --> /\S/g --> 3 matches: R, G, B
7. R G B --> /\s/g --> 2 matches: 2 spaces
8. Off 20%! --> /[^\d]/g --> 6 matches: O, f, f, " ", %, !
9. Off 20%! --> /[\D]/g --> 6 matches: O, f, f, " ", %, !
10. Mr. & Mrs. --> /Mr./g --> 2 matches: Mr., Mrs
11. Mr. & Mrs. --> /Mr\./g --> 1 match: Mr.
Anchors:
1. abc --> /^bc/g --> 0 match
2. abc --> /^.bc/g --> 1 match: abc
3. abcdef --> /.bcd$/g --> 0 match
4. abcdef --> /bcd\w\w$/g --> 1 match: bcdef
5. abcdef --> /\w\w$/g --> 1 match: ef
Boundaries:
1. This is a name --> /\bis\b/g --> 1 match: is
2. catfish concatenate kitty-catty --> /\w+\Bcat\w+/g --> concatenate
3.
First line
Another line at the end --> /\A./mg --> 1 match: F
4.
First line
Another line at the end --> /.\Z/mg --> 1 match: d
Alternation:
1. grey or gray --> /grey|gray/g --> 2 matches: grey, gray
/gr(e|a)y/g
/gr[ea]y/g
2. cats and dogs --> /^cat|dog/g --> 2 matches: cat, dog
3. cats and dogs --> /^(cat|dog)/g --> 1 match: cat
Quantifiers:
1. Hello World --> /.*/ --> 1 match: Hello World
2. Hello World --> /.*/g --> 2 matches: Hello World, ""
3. Hello World --> /.+/g --> 1 match: Hello World
4. Hello World --> /\w+$/g --> 1 match: World
5. $22.50 --> /\d+/g --> 2 matches: 22, 50
6. colour or color --> /colou?r/g --> 2 matches: colour, color
7. bbc bcci mcc dcccx cccccd --> /c{2,3}./g --> 4 matches: cci, "cc ", cccx, cccc
8. bbc bcci mcc dcccx cccccd --> /c{2,}./g --> 4 matches: cci, "cc ", cccx, cccccd
9. bbc bcci mcc dcccx cccccd --> /c{,3}./g --> 0 matches
Match metacharacters:
1. -3.2 + 4.3 = 1.1 --> /[-\d\.]+/g --> 3 matches: -3.2, 4.3, 1.1
2. My IP address: 192.168.12.44 --> /\d+\.|\d+$/g --> 192., 168., 12., 44
3. Price is $22.50 --> /\$\d+/g --> 1 match: 22
/[$]\d+/g
4. /var/html/www --> /\/\w+/g --> 3 matches: /var, /html, /www
5. Cost (in Rupees) --> /\([^\)]+\)/g --> 1 match: (in Rupees)
/\([^)]+\)/g
Modifiers or flags:
1. Hello hello HELLO --> /hello/ig --> 3 matches: Hello, hello, HELLO
2. Help me!! Quick!!! --> /\w(\w|\s)+ !+/gx --> 2 matches: Help me!!, Quick!!!
/\w[\w\s]+ !+/gx
/\w[\w ]+ !+/gx
3.
First line
Second one --> /.*/g --> 4 matches: First line, "", Second one, ""
4.
First line
Second one --> /.*/sg --> 2 matches: "First line\nSecond one", ""
5.
east to west
best is better
better than best --> /[a-z]est$/mg --> 2 matches: west, best
Groups (capturing and non-capturing):
1. My name is John Smith --> /My name is (\w+) (\w+)/ --> 1 match: (My name is John Smith, John, Smith)
2. Color is #12de87 --> /#([0-9a-f]{2})([0-9a-f]{2})([0-9a-f]{2})/ig --> 1 match: (#12de87, 12, de, 87)
3. Color is #12de87 --> /#([0-9a-f]{2}){3}/ig --> 1 match: (#12de87, 87)
4. Color is #12de87 --> /#(?:[0-9a-f]{2}){3}/ig --> 1 match: #12de87
5. Color is #12de87 --> /[0-9a-f]{2}/ig --> 3 matches: 12, de, 87
6. abcd abcD aBc abcdefG --> /[a-z]{2}(?:[a-z]{2,3})?/g --> 5 matches: abcd, abcD, aB, abcde, fG
7. bbc bcbi mkk deeex fffffd --> /([a-z]).\1+/g --> 3 matches: bcb, eee, fffff
Search and replace with capturing groups:
1. My name is John Smith
--> /My name is (\w+) (\w+)/First name: \1; Last name: \2/
--> First name: John; Last name: Smith
2. My name is John Smith
--> /My name is (?P<first>\w+) (\w+)/First name: \g<first>; Last name: \g<last>/
--> First name: John; Last name: Smith
3. "What", "when' and 'who'
--> /(["'])\w+\1/g
--> 2 matches: "What", 'who'
Change date format from yyyy-mm-dd to dd-mm-yyyy:
/(\d{4})-(\d{1,2})-(\d{1,2})/\1-\2-\3/
Match an email address:
/[\w.%+-]+@[\w.-]+\.[a-zA-Z]{2,4}/
Match an IPv4 address:
# Without capture
/\b(?:\d{1,3}\.){3}\d{1,3}\b/
# Capture the parts
/\b(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})\b/
# Capture the parts with range checking
/\b(25[0-5]|2[0-4]\d|[01]?\d{1,2})\.
(25[0-5]|2[0-4]\d|[01]?\d{1,2})\.
(25[0-5]|2[0-4]\d|[01]?\d{1,2})\.
(25[0-5]|2[0-4]\d|[01]?\d{1,2})\b/x
Redirect a URL request:
https://techcrunch.com/2015/08/15/the-future-of-marketplace
--> /\/(\d{4})\/(\d{2})\/(\d{2})\//\/\3\/\2\/\1\//
--> https://techcrunch.com/15/08/2015/the-future-of-marketplace
Extract middle names, if any:
John Max Smith
Karthik Kumar
Jane McDonald
Salman al-Uzza Khan
Aparna Ranjan Roy
--> /^\S+ (\S+) \S+$/gm
--> 3 matches in group 1: Max, al-Uzza, Ranjan
Lazy (non-greedy) match:
1. rupee (INR), dollar (USD), pound (GBP) --> /\(.+?\)/g --> 3 matches: (INR), (USD), (GBP)
2. Hello World, here we come, again! --> /.*?,/g --> 2 matches: "Hello World,", " here we come,"
/[^,]*,/g
3. 12496 --> /\d{2,3}?/g --> 2 matches: 12, 49
4. 12496 --> /\d{2,3}?$/g --> 1 match: 496
Possessive quantifiers:
1. 1249698 --> /\d++9/ --> no match
2. 1249698 --> /\d+9/ --> 1 match: 124969
Atomic grouping of the form (?>...)
:
1. 1 and 4 are integers. --> /\b(?>integer|insert|in)\b/g --> no match but fails faster
2. Let's insert 3 at the start. --> /\b(in|insert)\b/g --> 1 match: insert
3. Let's insert 3 at the start. --> /\b(?>in|insert)\b/g --> 0 match since we don't backtrack
Lookaround assertions of the forms (?=...)
, (?!...)
, (?<=...)
, (?<!...)
:
1. That Iraqi must be questioned. --> /q(?=u)\w+/ig --> 1 match: questioned
2. That Iraqi must be questioned. --> /q(?!u)\w+/ig --> 1 match: qi
3. _rabbit _dog _mouse DIC:cat:dog:mouse --> /_(\w+)\b(?=.*:\1\b)/ --> 2 matches: _dog, _mouse
4. He employs 1 cook, 5 waiters and 2 cleaners. --> /\b[a-z]+(?<!s)\b/ig --> 3 matches: He, cook, and
/\b[a-z]+[^s\s]\b/ig
5. He employs 1 cook, 5 waiters and 2 cleaners. --> /\b[a-z]+(?<=s)\b/ig --> 3 matches: employs, waiters, cleaners
Conditionals of the form (?ifthen|else)
:
# https://www.regular-expressions.info/conditional.html
1. bd bc abc abd --> /(a)?b(?(1)c|d)/g --> 3 matches: bd, abc, bd
2. bd bc abc abd --> g(a?)b(?(1)c|d)/g --> 2 matches: bc, abc
Recursion of the form (?R)
:
# https://www.regular-expressions.info/recurse.html
1. aaazz azz aaazzz --> /a(?R)?z/g --> 3 matches: aazz, az, aaazzz
2.
(full || (half%3==0)) || (full && half)
--> /\((?>[^()]|(?R))*\)/g
--> 2 matches: (full || (half%3==0)), (full && half)
Find in Apache server log all requests between 7-8 AM that result in an error:
127.0.0.1 - frank [10/Oct/2000:07:55:36 -0700] "GET /logo.png HTTP/1.0" 201 2326
127.0.0.1 - john [10/Oct/2000:06:22:42 -0700] "GET /help HTTP/1.0" 404 -
127.0.0.1 - mary [10/Oct/2000:14:55:36 -0700] "GET /home HTTP/1.0" 500 120
127.0.0.1 - arun [10/Oct/2000:12:55:36 -0700] "GET /about HTTP/1.0" 200 4377
--> /:0[4-7](:\d{2}){2} -\d{4}] "([^"]+)" .*(?<=[45]\d{2}) (?:-|\d+)\s*$/gm
--> 1 match: second line will match
Add commas to numbers (thousands, millions, billions):
\d{1,3}(?=(\d{3})+(?!\d))
Add commas to numbers (Indian convention of thousands, lacs, crores):
# Using alternation
/\d(?=(?:\d{2})+(\d{3})(?!\d)|(\d{3})(?!\d))/
# Simpler one
/\d{1,2}(?=(\d{2})*\d{3}(?!\d))/
More readable version of the above, using x
modifier that ignores whitespace in regex:
/\d(?=
(?:\d{2})+(\d{3})(?!\d) | # >=100000
(\d{3})(?!\d) # >=1000 && <100000
)/
Match content within nested HTML tags:
He said, "<span>I <strong>really don't</strong> like <em>ginger</em> tea or
<b>black</b> coffee</span>", but he <span>was</span> lying.
--> /<(\w+)>(?=([^\1]+?)<\/\1>)/g
--> 5 matches: group 2 contains the desired inner content of each tag
Try the following and see why they aren't suitable:
1. /<(\w+)>.*<\/\1>/g
2. /<(\w+)>.*?<\/\1>/g
3. /<(\w+)>[^<]*<\/\1>/g
Match words (case insensitive) repeated within the same sentence:
Hello world, and hello again. I am a programmer, but I'm not a good at program-writing. I wish to learn it better.
--> /\b(\w+)\b(?=[^.?!]+\b\1\b)/i
--> 3 matches: Hello, I, a