- What is Regex?
- Regular Expression Literal
- Quantifiers
- OR Operator
- Character Classes
- Flags
- Results
- Learn More
- About the Author
Regex, or Regular Expressions, can be daunting. The syntax can be hard to read, hard to write, and if not written correctly, can lead to unexpected or inconsistent results.
Regex expressions, as the MDN web docs describes them, are patterns used to match character combinations in strings. In other words, imagine the simple Find
action in many word processors and other applications but much more powerful. Regex is a means of describing a single search query to find not just a letter-for-letter match, but combinations of string elements that may appear zero, one, multiple times and in different formats and in specific circumstances.
I'm not going to try to explain all of regex, nor how it can be implemented in different coding languages or computer systems. Far smarter people than me have already done that.
Instead, I'm going to take a made-up but semi-plausible scenario in which a relatively simple regex expression could be helpful and walk through the elements of that expression and explain what each does.
Let's imagine a widget startup company with a DIY marketing strategy. As the product has been developed and the company set up, the owner has over time adopted several iterations of the product name and has finally decided on SonShine. Upon reviewing the press release moments before publication, he is bewildered that the draft document actually contains every version of the brand name that had been considered (along with a few clumsy typos). He doesn't have time to edit the document word for word, so he calls upon the web developer to come up with a solution quickly.
No problem! The developer comes back moments later with this cryptic code to find all the variations of the product name so they can easily be replaced.
/s[ou]n *sh(i|y)[nm]e?y?/ig
Let's take the regex and body copy and head over to VS code or to Regexr.com.
Sunshine is great! Sonshine can do wonderful things, yes sun Shine is the best product you've ever seen. It will solve all your problems right here in River City. Trouble - I mean, dirt is no match for son shine. Use sunshiny to wash your car. Sonshiny can clean your deck safely, and sun shiny can even be used to dry clean your drapes. Sunshyne is the greatest development this year and son shyne has three patents pending. Sunshyme makes a wonderful gift or you can buy extra son shyme for your home office. Give it to your son or something. Stephen Sondheim inspired sonshyn look at that shine! Isn't it shiny?
One way to define a regular expression is as a regular expression literal. This is indicated by the forward slash /
at the beginning and end of the expression. This wraps the expression to be matched, much like the angle brackets in HTML tags.
But wait - the second slash isn't at the end of this code, you say. There are two letters i
and g
at the end. Those are flags
and we'll get to those in a moment.
Except where noted by modifiers (or escapes), a regex is read from left to right, character by character, in order. We'll get to the modifiers below, but in our example, the s
, n
, sh
, e
and y
are going to match those characters in that order, literally.
Quantifiers are special characters that mark quantities of the characters or elements that they modify. The *
after the space indicates that the space may appear 0 or more times. This captures variations that in our example have no space, one space, or more spaces. The ?
is a similar quantifier, in that it indicates that the e
and y
in our expression may both appear 0 or 1 time (but not more than that).
The |
pipe character is used as an OR operator. In our example, the i
and y
are grouped in parentheses, separated by the |
. This means that in this case, we're matching i
OR y
to appear after sh
.
A character class is a way to group a larger group of elements that can be matched. Any of the characters within the square brackets []
can be matched in any order. In our example, [ou]
and [nm]
will catch s
followed by o
as well as s
followed by u
, then n
. And in the second part of the word, we'll catch various spellings of shine, and the typos where m
appears instead of n
.
i and g
The i
and g
characters after the second /
at the end of the expression operate as flags. Flags indicate an option that affects how the entire search is to behave. The i
flag means that the search will return matches in a case Insensitive manner, so the characters will match whether uppercase or lowercase (in the expression as well as the search copy).
The g
flag indicates that we are searching Globally. Without this flag, the search would only return the first match it finds. With this flag, we can be confident to find all matching text strings.
So how did we do? Using the replace tool on regexr.com, I ran our expression against the company owner's dreadful draft and replaced all matches with SonShine. It's still dreadful marketing copy, but at least the product name is now consistent!
SonShine is great! SonShine can do wonderful things, yes SonShine is the best product you've ever seen. It will solve all your problems right here in River City. Trouble - I mean, dirt is no match for SonShine. Use SonShine to wash your car. SonShine can clean your deck safely, and SonShine can even be used to dry clean your drapes. SonShine is the greatest development this year and SonShine has three patents pending. SonShine makes a wonderful gift or you can buy extra SonShine for your home office. Give it to your son or something. Stephen Sondheim inspired SonShine look at that shine! Isn't it shiny?
Regex is in incredibly powerful tool for developers and even copywriters. I have used it within Adobe InDesign to apply different character and paragraph styles to text that met very specific conditions.
There are more special characters in Regex that define other conditions which did not apply to our selection. These include anchors
, grouping and capturing
, bracket expressions
, greedy and lazy match
, boundaries
, back-references
, look-ahead and look-behind
. To learn more about regular expressions go to the MDN web docs or check out Marijn Haverbeke's Eloquent Javascript, chapter 9 .
Mike Johnson is a graphic designer turned web developer who is always looking for the better mousetrap. Visit him on GitHub.com/MikeWebPrint.