Hello, everyone! In this markdown file, we'll be covering the usage of regular expressions in a few other languages by showing you each language specific syntax, and some Useful Regex Supporting API methods. Feel free to go through the whole file or use the navigation index below.
In JavaScript, you can define regex patterns using single-, double-quoted, or template strings. An alternative to strings is RegExp instances, that can be declared using the RegExp Constructor or the literal notation(double forward slash).
const sentence = "The quick brown fox jumped over the lazy dog"
// Regex Literal
const thePattern = /the/
// RegExp Instance
const thePattern1 = new RegExp("the")
const thePattern2 = new RegExp(thePattern)
It Tries to match the pattern on the target string. This method returns null if none (or an array) is found. If you are applying the global flag to the expression, the content of the array will be all the captured occurrences; if not, the array will contain a single match along with additional information about the captured value, such as its index, groups, and the targeted string.
const sentence = "The quick brown fox jumped over the lazy dog"
const pattern = /the/
console.log(sentence.match(pattern));
This method splits the string and returns an array containing the splitted values. The separator parameter determines the positions where the string will be splitted. This parameter can be either a string, or a regular expression. On the example below, we're separating the string on each vowel.
👀 Observation
Note that the splitted string array does not include the separators.
const sentence = "The quick brown fox jumped over the lazy dog"
const separatorPattern = /[aeiou]/
console.log(sentence.split(separatorPattern));
This method replaces a value on the string with another; it can be a character, a word or a full sentence. The first parameter(searchValue) can be either a string or a regular expression, which will search for the value or values that will be replaced; the second parameter(replaceValue) is the value that will replace the captured value.
const sentence = "The quick brown fox jumped over the lazy dog"
const searchPattern = /o/
console.log(sentence.replace(searchPattern, "i"));
This method checks if the given string can be matched by the target regular expression and then returns a boolean.
const sentence = "The quick brown fox jumped over the lazy dog"
const pattern = /quick/
console.log(pattern.test(sentence))
This method works similarly to the match method, but with a few key differences.
- It returns only one match, regardless of the presence of the global flag.
- Modifies some properties on the RegExp Instance after the match. If you are interested in knowing which values are modified, here's the documentation link for reference. Take a look at the Description section.
const sentence = "The quick brown fox jumped over the lazy dog"
const pattern = /o/
const pattern2 = /o/g
// Both produce the same output
console.log(pattern.exec(sentence))
console.log(pattern2.exec(sentence))
Regex patterns in Go can be defined with strings. You can either use regular double-quoted strings(requires double escaping just like in JavaScript) or literal strings, which are declared with backticks and do not require Double escaping.
package main
import (****
"regexp"
)
func main() {
sentence := "The quick brown fox jumped over the lazy dog"
thePattern := `the`
theSecondPattern := "the"
}
Alternatives to standalone Strings are the Compile and MustCompile methods. Both expect a double-quoted or literal String pattern, parses it to a RegExp instance, which can then be used against text.
📘 Compile Vs MustCompile
There's a slight difference between these two methods: Compile returns two values, a regexp instance and an error object, while the MustCompile returns only the regexp instance, but panics if something happens in the parsing process.
package main
import (
"regexp"
)
func main() {
sentence := "The quick brown fox jumped over the lazy dog"
pattern := `the`
cRegexpInstance, err := regexp.Compile(pattern)
mcRegexpInstance2 := regexp.MustCompile(pattern)
}
In my opinion, Golang has a really good documentation. In it, you will not only find the methods' signatures but also a well written description along with a simple example. Having said that, there's not much I have to add in this section of our file besides sharing the link for the Original RegExp Docs and strongly recommend that you take a look at it.
Java's Official Documentation offers an overview on the comprising parts of the Regex Package, along with its supporting flags and metacharacters, and each methods' signatures paired with a concise definition, but unfortunately, it is lacking on simple examples. In this section, we'll be targeting that.
Pattern declaration in Java is made with the Pattern.compile()
method, which expects a String as the pattern parameter.
👀 Observation
Java does not have a similar solution to JavaScript's Template String or Golang's Literal String, so we must always double escape some metacharacters(e.g shorthands).
import java.util.regex.*;
class Regex {
public static void main(String args[]) {
String sentence = "The quick brown fox jumped over the lazy dog";
Pattern pattern = Pattern.compile("the");
}
}
Compiles the given regular expression into a pattern with the given flags.
This method has two parameters:
String regex
: The expression to be compiledint flags
: The flags that will be applied to the pattern. In Java, the flags are represented as bitmasks. For more information on which flags Java Supports, they are listed and described at the official documentation
This method returns a Pattern
instance.
import java.util.regex.*;
class Regex {
public static void main(String args[]) {
String sentence = "The quick brown fox jumped over the lazy dog";
Pattern pattern = Pattern.compile("the");
}
}
Creates a Matcher
that will match a given input against this pattern.
This method has one parameter:
CharSequence input
: The test string
This method returns a new Matcher
for this pattern.
import java.util.regex.*;
public class Regex {
public static void main(String args[]) {
String ssn = "123-45-7890";
Pattern pattern = Pattern.compile("\\d{3}-\\d{2}-\\d{3,4}");
Matcher matcher = pattern.matcher(ssn);
}
}
The Matcher.matches()
is used to match the input sequence against the full text(beginning to end).
More info on JavaTPoint
This method returns a boolean
import java.util.regex.*;
class Regex {
public static void main(String args[]) {
String ssn = "123-45-7890";
Pattern pattern = Pattern.compile("\\d{3}-\\d{2}-\\d{3,4}");
Matcher matcher = pattern.matcher(ssn);
System.out.println(matcher.matches()); // true
}
}
Splits a given String
into an Array
of substrings, resulting in a new Array
.
import java.util.regex.*;
class Regex {
public static void main(String args[]) {
String html = "#001 Desmond#002 Jin#003 Kate#004 Locke";
String[] value = html.split("#\\d{3} ");
for (String string : value) {
System.out.println(string);
}
}
}
Searches for the next subsequence of the input that matches the pattern.
Besides returning a boolean
, this method also change the values on some properties on the previously initialized Matcher
instance, such as:
group
start
end
Every new call, moves for the next possible matching subsequence and set new values on the Matcher
instance, until there's no possible matches, resulting in a false
.
import java.util.regex.*;
class Regex {
public static void main(String args[]) {
String time = "[06-03-2021 17:08:40]";
Pattern pattern = Pattern.compile("\\d{2}");
Matcher matcher = pattern.matcher(time);
while(matcher.find()) {
System.out.println("Matched value: " + matcher.group());
System.out.println("Start at: " + matcher.start());
System.out.println("Ends at: " + matcher.end());
}
}
}
As the documentation states:
"Returns the input subsequence matched by the previous match. This method returns a
String
; The (possibly empty) subsequence captured by the group during the previous match, or null if the group failed to match part of the input."
This method has two overloads:
group(int group)
: This parameter represents each group's numbered backreference.group(String name)
: Thename
parameter can be used to backreference named capturing groups.
Just like Go
, in Python
you can either declare a pattern using the re.compile()
method which creates a regular expression object, or directly access the re
module methods passing in the pattern as a String
.
import re
string_pattern = "((?:T|t)he)"
regex_object = re.compile("((?:T|t)he)")
This method searches through the string capturing the first occurence that matches the defined pattern.
This method returns a Match Object or None.
import re
string_pattern = "((?:T|t)he)"
regex_object = re.compile("((?:T|t)he)")
sentence = "The quick red fox jumped over the lazy dog"
match_object = re.search(string_pattern, sentence)
# Or...
# match_object = regex_object.search(sentence)
print(match_object)
This method works similarly to the re.search()
method. The only difference is that it restricts the search location to the beginning of the String.
This method returns a Match Object
or None.
import re
string_pattern = "((?:T|t)he)"
regex_object = re.compile("((?:T|t)he)")
sentence = "The quick red fox jumped over the lazy dog"
result = re.match(string_pattern, sentence)
# Or...
# result = regex_object.match(sentence)
print(result)
⚠ Keep in mind
This method will always restrict the search to the beginning of the string. If you need to perform a search at the beginning of each line, use the
re.search()
method along with there.MULTILINE
flag.
This method returns a Match Object
if the whole string is matched by the defined pattern. Otherwise, it retursn None
import re
string_pattern = "\d{3}-\d{2}-\d{4}"
regex_object = re.compile("\d{3}-\d{2}-\d{4}")
ssn = "123-45-6789"
result = re.fullmatch(string_pattern, ssn)
# Or...
# result = regex_object.match(sentence)
print(result)
This method split the String by the occurrences of the pattern. Use the maxplit
parameter to limit the amount of occurences.
Returns a List
of all groups.
import re
string_pattern = "[aeiou]"
regex_object = re.compile("[aeiou]")
sentence = "The quick brown fox jumped over the lazy dog"
result = re.split(string_pattern, sentence)
# Or...
# result = regex_object.split(sentence)
print(result)
This method finds all occurrences in string and return them as a list of strings.
import re
string_pattern = "((?:T|t)he)"
regex_object = re.compile("((?:T|t)he)")
sentence = "The quick red fox jumped over the lazy dog"
result = re.findall(string_pattern, sentence)
print(result)
class Playground
sentence = "The quick brown fox jumped over the lazy dog"
pattern = Regexp.new(/((?:T|t)he)/)
pattern2 = Regexp.compile(/((?:T|t)he)/) # Alias for Regexp.new()
pattern3 = /((?:T|t)he)/ # puts pattern3.instance_of? Regexp == true
end
The pattern matching operator, matches a regular expression against a string and returns an Integer
representing the starting position of the match, or nil
if none was found.
class Playground
pattern = /\d{3}-\d{2}-\d{4}/
ssn = "123-45-6789"
puts ssn =~ pattern # 0
end
👀 Observation
This value can be stored on a regular variable like so:
variable = ssn =~ patternHowever, this value has already been stored on the global
$~
variable, and it can also be accessed through theRegexp.last_match
.
This method expects a pattern(String
or Regexp
) and iterates through in search for a match, returning an Array
or nil
if none was found.
class Playground
sentence = "The quick red fox jumped over the lazy dog"
pattern = /(?:T|t)he/
results = sentence.scan(pattern)
print results
sentence.scan(pattern).each do |match|
puts "Found: #{match}"
end
end
Alternatively, you can use the following syntax as well:
class Playground
sentence = "The quick red fox jumped over the lazy dog"
pattern = /(?:T|t)he/
sentence.scan(pattern).each do |match|
puts "Found: #{match}"
end
end
The Match method matches a pattern against a string capturing the first occurence.
This method returns a MatchObject
or nil
if none was found.
class Playground
sentence = "The quick red fox jumped over the lazy dog"
pattern = /(?:T|t)he/
matchedData = sentence.match(pattern)
print matchedData
end
This method returns a Hash
, pairing the capturing groups(keys) with their respective values.
class Playground
pattern = /(?<fst>\d{3})-(?<scnd>\d{2})-(?<thrd>\d{4})/
ssn = "123-45-6789"
result = pattern.match(ssn)
result_as_hash = result.named_captures
print result_as_hash # {"fst"=>"123", "scnd"=>"45", "thrd"=>"6789"}
end
Alternatively, you can use this syntax as well:
class Playground
pattern = /(?<fst>\d{3})-(?<scnd>\d{2})-(?<thrd>\d{4})/
ssn = "123-45-6789"
result_as_hash = result.named_captures
result_as_hash.each do |key, value|
puts "#{key}: #{value}"
end
# fst: 123
# scnd: 45
# thrd: 6789
end
Using this operator you have access to the matched data. Pass numbers for numbered backreference, or the capturing group name preceded by a colon.
class Playground
pattern = /(?<fst>\d{3})-(?<scnd>\d{2})-(?<thrd>\d{4})/
ssn = "123-45-6789"
result = pattern.match(ssn)
puts "Full Match: #{result[0]}"
puts "First Group: #{result[1]}"
puts "Second Group: #{result[2]}"
puts "Thrid Group: #{result[3]}"
puts "Named Backreference: #{result[:fst]}"
end
The sub
method is the equivalent to JavaScript's String.prototype.replace()
method. The first parameter can be either a Regexp
pattern or a String
and the second parameter is the replace value.
class Playground
sentence = "The quick brown fox, jumped over the lazy dog"
pattern = /the/i
puts sentence.sub(pattern, "THE")
# THE quick brown fox, jumped over the lazy dog
end
Notice that it only replaced the first occurence of the word the. The sub
method only replaces the first occurence, and since Ruby(2.7.x) does not support the global flag, if you ever need to replace all the occurences, you need to use the gsub method.