Skip to content

Instantly share code, notes, and snippets.

@rogerpence
Last active January 10, 2019 18:22
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rogerpence/8b55bef13db273c7bffa71d0d6f9cc84 to your computer and use it in GitHub Desktop.
Save rogerpence/8b55bef13db273c7bffa71d0d6f9cc84 to your computer and use it in GitHub Desktop.
An old regex article
<p>Regular expressions (regex) provide a pattern matching scheme you can use to search and manipulate&nbsp;strings. Although regex has been around for almost 50 years, it is often either completely ignored or relegated to the &ldquo;will learn later&rdquo; pile by many programmers. Regex is indeed borderline witchcraft that at first glance looks like it was culled from the script of <a href="https://www.youtube.com/watch?v=Ln7WF78PolA">Plan 9 from Outer Space</a>. (It wasn&rsquo;t!).</p>
<p>There is an old programming adage that goes: &ldquo;One time I had a problem so I tried to solve it with regular expressions. Now I have two problems.&rdquo; If you&#39;re not careful, that old saying sums up regex well. If you get in a hurry&nbsp;or try to over-complicate things, regex will get you in more trouble than cheating on your taxes. But with a little effort and practice, knowing how and when to use a regular expression to solve a problem will make your code better, easier to read (really!), and easier to maintain (also really!).</p>
<p>So put your tray tables in their upright and locked positions and&nbsp;let&rsquo;s take regular expressions for a spin in ASNA Visual RPG.</p>
<h3>What are regular expressions?</h3>
<p>Regular expressions are a way to search, replace, and manipulate string values. Regular expression syntax provides a succinct grammar for identifying parts of a string. This concise grammar can be a bit overwhelming at first, but with a little effort it&rsquo;s not as bad at is initially seems&mdash;especially for basic regular expressions. And, remember, the trade-off for a regular expression&rsquo;s awkwardness is that it&rsquo;s doing what would otherwise lots of logic and conditional testing.</p>
<p>Regular expressions can be used directly in at least four places with AVR for .NET:</p>
<ol>
<li>
In AVR language to work with your own strings
</li>
<li>
In the Visual Studio&rsquo;s editor to perform search and replace
</li>
<li>
With the regular expression validator for ASP.NET
</li>
<li>
In JavaScript for browser-based development
</li>
</ol>
<p>Regular expressions are generally a cross-platform facility, however there are two families of regular expression engines: PCRE (Perl Compatible Regular Expressions) and Posix (Portable Operating System Interface). .NET&rsquo;s regex engine is a PCRE derivative, as are PHP, PERL, and JavaScript regex engines. While there are some minor differences across different implementations, generally any PCRE-based regex tutorial or online regex testing site will work with AVR and .NET.</p>
<p>Throughout this article, you&rsquo;ll see several &ldquo;Try this pattern online&rdquo; buttons. Those buttons link to specific exercises on the excellent <a href="https://regex101.com">Regular Expression online playground</a>. It provides a superb way to experiment with regular expressions. There are a couple of examples below that iterate over a regex result and those don&rsquo;t work well in the online tester. To test them, copy and paste the code in an AVR project. Other online references include:</p>
<p><a href="https://msdn.microsoft.com/en-us/library/az24scfc%28v=vs.110%29.aspx">MS Regex quick reference</a></p>
<p><a href="https://msdn.microsoft.com/en-us/library/hs600312%28v=vs.110%29.aspx">.NET Framework Regular expressions</a></p>
<h3>A simple regex match</h3>
<p>The .NET namespace <a href="https://msdn.microsoft.com/en-us/library/system.text.regularexpressions%28v=vs.110%29.aspx">System.Text.RegularExpressions</a> provides a variety of classes to perform regex work in AVR. This article uses several of those classes. All of the following code assumes</p>
<pre>
<code>Using Sytem.Text.RegularExpressions
</code></pre>
<p>is at the top of your class. Regular expressions attempt a match at an input string against a given regex pattern. Let&rsquo;s assume that we have two variables declared:</p>
<pre>
<code>DclFld Source Type(*String)
DclFld Re Type(*String)
</code></pre>
<p><code>Source</code> is to contain the searched string and <code>Re</code> is to contain the regular expression. Let&rsquo;s start with something very simple using the <a href="https://msdn.microsoft.com/en-us/library/twcw2f1c%28v=vs.110%29.aspx"><code>Regex class&#39;s Match() method</code></a>. (Don&rsquo;t get confused, this <code>Match()</code> method returns an instance of the <a href="https://msdn.microsoft.com/en-us/library/system.text.regularexpressions.match%28v=vs.110%29.aspx"><code>Match</code></a> class.)</p>
<pre>
<code>DclFld m Type(Match)
</code></pre>
<p>With the necessary declarations, here is a&nbsp;very simple regex example:</p>
<pre>
<code>Source = &#39;Hello, World&#39;
Re = &#39;ll&#39;
m = Regex.Match(Source, Re)
If m.Success
// Occurrence found.
Else
// Occurrence not found.
EndIf
</code></pre>
<p><a class="btn btn-block btn-asna-blue btn-xs" href="https://regex101.com/r/tN7gQ4/2">Try this pattern online</a>&nbsp;</p>
<p>The example above looks for a match in the <code>Source</code> field to the <code>Re</code> field. Checking the value of the <code>Match</code> class the <code>Match()</code> method returned reports the success of the match. In this case, <code>Hello, World</code> contains <code>ll</code> so <code>m.Success</code> is true.</p>
<p>In addition to its <code>Success</code> property, the <code>Match</code> object that <code>Match()</code> returns the additional information about the match including the starting location of the match, its length, and its value. If you don&rsquo;t need the <code>Match</code> instance, you can use this shorthand for the <code>Match()</code> method:</p>
<pre>
<code>If (Regex.Match(Source, Re).Success)
// Occurrence found.
Else
// Occurrence not found.
EndIf
</code></pre>
<p>As currently written, this match is case-sensitive. Using an optional third argument to the&nbsp;<code>Match()</code> method <span style="line-height:20px">removes the match case-sensitivity (you&#39;ll later see another way to impose case insenstivity on regular expressions).</span></p>
<pre>
<code>Source = &#39;Hello, World&#39;
Re = &#39;LL&#39;
m = Regex.Match(Source, Re, RegexOptions.IgnoreCase)
</code></pre>
<p>The match above succeeds because the <code>RegexOptions.IgnoreCase</code> argument was provided. In subsequent <code>Match()</code> method examples, the code to execute when the match succeeds is omitted.</p>
<h3>Some basic Regex special characters</h3>
<p>Mastering regular expressions requires you to master a lot of arcane special-case patterns. At first, this gobbledy-goop is indeed overwhelming. But, one bite at a time, it starts to make sense. This next example introduces some regex special characters. These are characters, that unless escaped, have special meaning to regular expressions. Let&rsquo;s start with regex <em>anchors</em>. Although <a href="https://msdn.microsoft.com/en-us/library/az24scfc%28v=vs.110%29.aspx#atomic_zerowidth_assertions">this page</a> shows eight of them, the first two, <code>^</code> and <code>$</code>, are all you need to know for now. <code>^</code> anchors a match at the beginning of a string, and <code>$</code> matches a search at the end of the string. An example or two explains anchors best.</p>
<pre>
<code>Source = &#39;Hello, World&#39;
Re = &#39;^ll&#39;
m = Regex.Match(Source, Re)
</code></pre>
<p>The match above fails because it looks for <code>ll</code> at the beginning of the string.</p>
<pre>
<code>Source = &#39;Hello, World&#39;
Re = &#39;^He&#39;
m = Regex.Match(Source, Re)
</code></pre>
<p>The match above succeeds because it looks for <code>He</code> at the beginning of the string.</p>
<pre>
<code>Source = &#39;Hello, World&#39;
Re = &#39;ll$&#39;
m = Regex.Match(Source, Re)
</code></pre>
<p>The match above fails because it looks for <code>ll</code> at the end of the string.</p>
<pre>
<code>Source = &#39;Hello, World&#39;
Re = &#39;orld$&#39;
m = Regex.Match(Source, Re)
</code></pre>
<p>The match above succeeds because it looks for <code>orld</code> at the end of the string.</p>
<p>To search a string for one of the special case anchor characters, use the <code>\</code> escape character.</p>
<pre>
<code>Source = &#39;x = 6^2&#39;
Re = &#39;\^&#39;
m = Regex.Match(Source, Re)
</code></pre>
<p>The match above succeeds because the <code>\</code> escape character changes what would have otherwise been the <code>^</code> anchor character to an absolute character to match. See if you can determine what this match is doing:</p>
<pre>
<code>Source = &#39;^I love wolverines&#39;
Re = &#39;^\^&#39;
m = Regex.Match(Source, Re)
</code></pre>
<p><a class="btn btn-block btn-asna-blue btn-xs" href="https://regex101.com/r/pG1vT5/1">Try this pattern online</a>&nbsp;</p>
<p>The match above succeeds because it looks for a string that starts with <code>^</code>. Crazy, huh. Comprehending regular expressions takes a little practice. Don&rsquo;t get discouraged. What if you need to search for <code>\</code>? escape the escaper!</p>
<pre>
<code>Source = &#39;\\mycomputer\document&#39;
Re = &#39;^\\\\&#39;
m = Regex.Match(Source, Re)
</code></pre>
<p>The match above succeeds by looking for a string that beings with two <code>\</code> characters, by escaping the escape character.</p>
<p>To make regular expressions easier to use you might be inclined to use a little white space to make the regex pattern a little more readable:</p>
<pre>
<code>Source = &#39;^I love wolverines&#39;
Re = &#39;^ \^&#39;
m = Regex.Match(Source, Re)
</code></pre>
<p>However, that just broke the match. Now, the regex is looking for a string that starts with a space and then a <code>^</code> character. A single space is just that&ndash;a search for a single space. You have to be as careful with white space as you are with any other character in your regular expressions. Two other regex special character are <code>(</code> and <code>)</code>. The are used to group expressions. Although we&rsquo;ll later see how grouping expressions can lead to some very sophisticated matching, for now, let&rsquo;s use grouping just to improve regex readability.</p>
<pre>
<code>Source = &#39;^I love wolverines&#39;
Re = &#39;^(\^)&#39;
m = Regex.Match(Source, Re)
</code></pre>
<p>Grouping&nbsp;the second <code>\^</code> pattern makes this regular expression a little easier to read. You can almost think of parentheses as adding punctuation to regular expressions. This article series uses regex grouping for readability quite often. To search for the absolute <code>(</code> or <code>)</code> characters, escape them.</p>
<pre>
<code>Source = &#39;(I can&#39;&#39;t get no) Satisfaction&#39;
Re = &#39;\)&#39;
m = Regex.Match(Source, Re)
</code></pre>
<p>The match above succeeds in finding the closing <code>)</code> by escaping it. Note the repeating single quote marks in the match above. It could have also been written as:</p>
<pre>
<code>Source = &quot;(I can&#39;t get no) Satisfaction&quot;
Re = &#39;\)&#39;
m = Regex.Match(Source, Re)
</code></pre>
<p>That single quote/double quote issue is an &nbsp;AVR issue, not a regular expression issue.</p>
<h3>Regex character classes and quantifiers</h3>
<p>Regex character classes provide a way to match any one of a set of characters. Let&rsquo;s first consider user-defined character classes. First, though, a simple match for &lsquo;get&rsquo;.</p>
<p>The search below searches the Source string for &lsquo;get&rsquo;.</p>
<pre>
<code>Source = &quot;(I can&#39;t get no) Satisfaction&quot;
Re = &#39;get&#39;
m = Regex.Match(Source, Re)
DoWhile m.Success
Console.WriteLine(m.Index.ToString() + &#39;:&#39; + m.Value)
m = m.NextMatch()
EndDo
</code></pre>
<p>It reports finding one occurrence that starts at position 8. Take a look at what appears to be the nearly identical match below. Notice its regular expression is surrounded with brackets.</p>
<pre>
<code>Source = &quot;(I can&#39;t get no) Satisfaction&quot;
Re = &#39;[get]&#39;
m = Regex.Match(Source, Re)
DoWhile m.Success
Console.WriteLine(m.Index.ToString() + &#39;:&#39; + m.Value)
m = m.NextMatch()
EndDo
</code></pre>
<p>The brackets tell the regex engine that this is a character class match that matches any occurrence of <code>g</code>, <code>e</code>, or <code>t</code>&ndash;in any order. A character class is an implicit OR statement. This one is specifying any <code>g</code>, any <code>e</code>, and any <code>t</code>. This pattern reports six matches and their values at positions 6, 8, 9, 10, 18, and 24. To negate a character class, start the character class with an <code>^</code> symbol. For example, this match:</p>
<pre>
<code>Source = &quot;(I can&#39;t get no) Satisfaction&quot;
Re = &#39;[^aeiou]&#39;
m = Regex.Match(Source, Re)
DoWhile m.Success
Console.WriteLine(m.Index.ToString() + &#39;:&#39; + m.Value)
m = m.NextMatch()
EndDo
</code></pre>
<p>Reports the value and position of each character that isn&rsquo;t a vowel. Note the overloaded use of <code>^</code>. When <code>^</code> starts a character set inside brackets it negates them; otherwise it anchors the search at the beginning of the string. You can also specify ranges with character classes. For example, the match below:</p>
<pre>
<code>Source = &quot;(I can&#39;t get no) Satisfaction&quot;
Re = &#39;[a-m]&#39;
m = Regex.Match(Source, Re)
DoWhile m.Success
Console.WriteLine(m.Index.ToString() + &#39;:&#39; + m.Value)
m = m.NextMatch()
EndDo
</code></pre>
<p>Reports the position and value of any character within the inclusive range of characters from <code>a</code> through <code>m</code>.</p>
<p>There are also several built-in character classes. Four that you&rsquo;ll use frequently are:</p>
<table>
<tbody>
<tr>
<th>
Class
</th>
<th>
Description
</th>
</tr>
<tr>
<td>
\s
</td>
<td>
matches any white-space character (ie, space, tab, linefeed, etc)
</td>
</tr>
<tr>
<td>
\S
</td>
<td>
matches any non-white-space character
</td>
</tr>
<tr>
<td>
\d
</td>
<td>
matches any decimal digit (ie, 0 through 9)
</td>
</tr>
<tr>
<td>
\D
</td>
<td>
matches any character except a decimal digit
</td>
</tr>
<tr>
<td>
.
</td>
<td>
matches any single character except a new line (\n)
</td>
</tr>
</tbody>
</table>
<p>&nbsp;</p>
<p>Be sure to watch your case-sensitivity with regex special characters. Here are several frequently-used regex quantifiers (but there are several more):</p>
<table>
<tbody>
<tr>
<th>
Quantifier
</th>
<th>
Description
</th>
</tr>
<tr>
<td>
*
</td>
<td>
matches the previous element zero or more times
</td>
</tr>
<tr>
<td>
+
</td>
<td>
matches the previous element one or more times
</td>
</tr>
<tr>
<td>
?
</td>
<td>
matches the previous element zero or one time
</td>
</tr>
<tr>
<td>
{n}
</td>
<td>
matches the previous element exactly n times
</td>
</tr>
<tr>
<td>
{n,}
</td>
<td>
matches the previous element at least n times
</td>
</tr>
<tr>
<td>
{n, m}
</td>
<td>
matches the previous element at least n times but no more than m times
</td>
</tr>
</tbody>
</table>
<p>&nbsp;</p>
<p>Let&rsquo;s take a look at some examples that put regex classes and quantifiers to work.</p>
<h3>US Social Security number</h3>
<p>First, we&rsquo;ll consider a US social security number, which&nbsp;is always in the format <code>nnn-nn-nnnn</code>&nbsp;, where n is a single digit.</p>
<pre>
<code>Source = &#39;345-15-4978&#39;
Re = &#39;\d&#39;
m = Regex.Match(Source, Re)
</code></pre>
<p>This produces a match because <code>Source</code> contains a digit. That pattern isn&rsquo;t specific enough, let&rsquo;s try something better.</p>
<pre>
<code>Source = &#39;345-15-4978&#39;
Re = &#39;\d\d\d-\d\d-\d\d\d\d&#39;
m = Regex.Match(Source, Re)
</code></pre>
<p>This match works. It looks for three digits, a dash, two digits, a dash, and then four digits. It&rsquo;s a little verbose, to tighten it up further the pattern could also be (and probably should be) written like this:</p>
<pre>
<code>Source = &#39;345-15-4978&#39;
Re = &#39;\d{3}-\d{2}-\d{4}&#39;
m = Regex.Match(Source, Re)
</code></pre>
<p>That seems like it might be the best pattern for the job, doesn&rsquo;t it? However, consider the match below. Does it return success?</p>
<pre>
<code>Source = &#39;Neil 345-15-4978 Young&#39;
Re = &#39;\d{3}-\d{2}-\d{4}&#39;
m = Regex.Match(Source, Re)
</code></pre>
<p>Be careful. This returns success because the pattern is in the string. The best pattern for a US social security number is probably:</p>
<pre>
<code>Source = &#39;345-15-4978&#39;
Re = &#39;^\d{3}-\d{2}-\d{4}$&#39;
m = Regex.Match(Source, Re)
</code></pre>
<p>In this case, we&rsquo;ve added the <code>^</code> character and <code>$</code> character to anchor the search at the beginning and ending of the input string. This forces the search pattern to be the entire value being matched. In many (perhaps most) cases, it&rsquo;s necessary to add the <code>^</code> and <code>$</code> anchors for the most reliable results. This example is a reliable pattern with which to test a US social security number, however, we can improve it by grouping its major patterns.</p>
<pre>
<code>Source = &#39;345-15-4978&#39;
Re = &#39;^(\d{3})-(\d{2})-(\d{4})$&#39;
m = Regex.Match(Source, Re)
</code></pre>
<p>Adding the groups above doesn&#39;t change the pattern functionally, but improves its human comprehension.</p>
<p><a class="btn btn-block btn-asna-blue btn-xs" href="https://regex101.com/r/dW1lG7/1">Try this pattern online</a>&nbsp;</p>
<h3>URL</h3>
<p>URLs are great playgrounds for regular expressions. Alas, attempting to use URLs with regular expressions reveals what you&rsquo;ve probably been thinking all along: &ldquo;This regex stuff is cool, but I&rsquo;m betting it can get you in big trouble.&rdquo; That&rsquo;s true. It&rsquo;s possible to get carried away, very carried away, with regular expressions. <a href="https://mathiasbynens.be/demo/url-regex">This page shows at least 12 regular expressions</a> to parse a URL. One of them is 1,347 characters long!</p>
<p>Let&rsquo;s be clear about regular expression usage. If, after learning basic regex syntax, using a regular expression makes the problem harder to solve than an old-timey brute force method, you&rsquo;re doing something wrong. The qualifier &ldquo;after learning basic regex syntax&rdquo; is important. Without a little study and effort, regex won&rsquo;t make anything easier. But with basic regex skills under your belt, the declarative, concise nature of regular expressions is usually a much better approach than pretzel logic full of <code>substring()</code> and <code>indexOf()</code>. Give regex a chance&ndash;but don&#39;t go off the deep end!</p>
<p>With the sermon over, let&rsquo;s see what parsing a <em>simple</em> URL can teach us. The next several examples will only show the <code>Source</code> and <code>Re</code> for clarity. What follows below is intended to explain regular expressions and not intended to be the definitive way to define a URL with a regex (I don&rsquo;t think there is one!). Consider the example below. Does it match?</p>
<pre>
<code>Source = &#39;https://www.asna.com&#39;
Re = &#39;^http&#39;
</code></pre>
<p>It does. But how could we provide a pattern that matches URLs that start with either <code>http</code> or <code>https</code>?</p>
<pre>
<code>Source = &#39;https://www.asna.com&#39;
Re = &#39;^https?&#39;
</code></pre>
<p>The <code>s?</code> added to the pattern above makes the pattern specify the string start with <code>http</code> followed by an optional <code>s</code>. In this case, we&rsquo;re using the <code>?</code> quantifier to specify zero or one time. Let&rsquo;s add the slashes and the colon to make the first part of our regex include the full protocol specification:</p>
<pre>
<code>Source = &#39;https://www.asna.com&#39;
Re = &#39;^https?://&#39;
</code></pre>
<p>The pattern above successfully matches the full protocol, but forward classes are one of regular expressions&rsquo; cross platform weaknesses. Some implementations of regex (Javascript and PHP, for example) don&rsquo;t like unescaped forward slashes. I&rsquo;d generally write the regex above like the one shown below:</p>
<pre>
<code>Source = &#39;https://www.asna.com&#39;
Re = &#39;^https?:\/\/&#39;
</code></pre>
<p>This makes the pattern start to look a little goofy, but if you write AVR <em>and</em> JavaScript committing to escaping regex forward slashes will keep you out of trouble. Let&rsquo;s add one more bit of clarity to our pattern:</p>
<pre>
<code>Source = &#39;https://www.asna.com&#39;
Re = &#39;(^https?:\/\/)&#39;
</code></pre>
<p>This doesn&rsquo;t change the pattern for the regex engine, but adding the parentheses contributes to readability and, as we&rsquo;ll find, makes it possible later to easily identify and capture the enclosed match. Let&rsquo;s take the subdomain, which is optional. Does the following pattern work?</p>
<pre>
<code>Source = &#39;https://www.asna.com&#39;
Re = &#39;(^https?:\/\/)www?&#39;
</code></pre>
<p>Nope. At first glance, it looks like it might, but this illustrates the specificity demanded by regex. The pattern above doesn&rsquo;t look for an optional <code>www</code> but rather checks to see if the domain is <code>ww</code> or <code>www</code>. In this case, we need to group the match:</p>
<pre>
<code>Source = &#39;https://www.asna.com&#39;
Re = &#39;(^https?:\/\/)(www)?&#39;
</code></pre>
<p>The question mark is a quantifier that promises to match zero or one of the previous elements. The parentheses around <code>www</code> define it as the previous element. There is one more thing to do for the domain name. Let&rsquo;s fully group it.</p>
<pre>
<code>Source = &#39;https://www.asna.com&#39;
Re = &#39;(^https?:\/\/)((www)?)&#39;
</code></pre>
<p>As before, when we grouped the protocol, these parentheses don&rsquo;t change the search pattern&nbsp;but help provide readability. They make more clear the fact that the <code>?</code> applies just to the <code>(www</code>). And, again, they are going to later let the domain name be easily extracted. Now we need to add the dot between the subdomain name and the domain name. Does the pattern below do that? (Hint: What does the regex dot (or a period) character class specify?)</p>
<pre>
<code>Source = &#39;https://www.asna.com&#39;
Re = &#39;(^https?:\/\/)((www)?).&#39;
</code></pre>
<p>Alas, the pattern above doesn&rsquo;t look for a dot, but rather *any* character. To look for a literal dot, we need to escape it (and we&rsquo;ll also group it), as shown below:</p>
<pre>
<code>Source = &#39;https://www.asna.com&#39;
Re = &#39;(^https?:\/\/)((www)?)(\.)&#39;
</code></pre>
<p>Now we need to add the domain name, which can contain numbers, letters, dashes, and be between 2 and 63 characters long. To tackle this one we&rsquo;ll create our own character class.</p>
<pre>
<code>Source = &#39;https://www.asna.com&#39;
Re = &#39;(^https?:\/\/)((www)?)(\.)[\da-z\.-]&#39;
</code></pre>
<p>Our character class, the last group in brackets, specifies:</p>
<table>
<tbody>
<tr>
<th>
Regex&nbsp;
</th>
<th>
Description
</th>
</tr>
<tr>
<td>
\d
</td>
<td>
Any Digit
</td>
</tr>
<tr>
<td>
a-z
</td>
<td>
Any lower-case character
</td>
</tr>
<tr>
<td>
-
</td>
<td>
A dash
</td>
</tr>
</tbody>
</table>
<p><br>
This character class works but doesn&rsquo;t apply length constraints nor is it grouped. The example below does that:</p>
<pre>
<code>Source = &#39;https://www.asna.com&#39;
Re = &#39;(^https?:\/\/)((www)?)(\.)([\da-z\.-]{2,63})&#39;
</code></pre>
<p>The pattern above adds the <code>{n, m}</code> qualifier to allow domain name lengths of 2 through 63 characters. It also groups the domain name with parentheses. We&rsquo;re almost done. Let&rsquo;s add the dot to come after the domain name and before the top-level domain name (just like we did for the dot separating the subdomain and domain names).</p>
<pre>
<code>Source = &#39;https://www.asna.com&#39;
Re = &#39;(^https?:\/\/)((www)?)(\.)([\da-z\.-]{2,62})(\.)&#39;
</code></pre>
<p>With the dot added, we only need add the pattern for the top-level domain name. I found varying rules on the Internet as to the&nbsp;maximum length of a top-level domain name&ndash;but apparently, the longest in existence is 24 characters, so we&rsquo;ll use that constraint.</p>
<pre>
<code>Source = &#39;https://www.asna.com&#39;
Re = &#39;(^https?:\/\/)((www)?)(\.)([\da-z\.-]{2,62})(\.)([a-z]{2,24})$&#39;
</code></pre>
<p>The top-level domain is defined by the [a-z]{2,24} pattern&ndash;which looks for a two to 24 length string with the letters <code>a</code> through <code>z</code> in it. This pattern is grouped with parentheses and then the end-of-string anchor <code>$</code> is added to ensure there aren&rsquo;t any spurious trailing characters.</p>
<p><a class="btn btn-block btn-asna-blue btn-xs" href="https://regex101.com/r/lG9yL7/3">Try this pattern online</a>&nbsp;</p>
<h3>Summary</h3>
<p>Regular expressions are a powerful addition to the .NET programmer&#39;s kit bag. With a little time at Regex101.com and with MS&#39;s online regex help, you&#39;ll be quickly on your way to regular expression bliss.&nbsp;</p>
<p>Watch for other parts of this series. Once regex&nbsp;nuance is understood and its initial shock wears off, we&#39;ve got way cooler things to do with regular expressions!&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment