Regular Expressions ("regex" or "regexp") are a common way for web developers to use JavaScript to find specific patterns of characters -- letters, numbers, punctuation, special characters, and even whitespace such as spaces, tabs, and new lines. In this tutorial we will look at one such regex that will find an IPv4 address in a string in JavaScript.
When sifting through access and error logs, it's often useful to find the IP address in a string. Other times, you will want the user to enter this information into a form. You may also want to scrape a webpage to find any IP addresses contained within.
IPv4 address are four sets of numbers ranging from 0-255 separated by a period ("."). Any or all of the four numbers can be zero. They can have leading zeros followed by other numbers ("012" or even "000"). In this tutorial I also want to be able to pluck an IP address out of a larger body of text, such as an entry in an error log or the HTML of a webpage.
The regular expression below will find an IPv4 address at the beginning of, at the end of, or within a string. It does not matter what characters come before or after it, even if the address is not separated from those characters by whitespace. This is desirable because sometimes logs or user input is... bad.
/(?:(?:25[0-5]|2[0-4]\d|[01]?\d?\d{1})\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d?\d{1})/g
- Anchors
- Quantifiers
- Grouping Constructs
- Bracket Expressions
- Character Classes
- The OR Operator
- Flags
- Character Escapes
JavaScript treats regexes as literals, so your pattern must be wrapped in slashes ("/"). In this tutorial, we are using the "g" flag at the end so we can use JavaScript's .matchAll()
method to find all IP addresses in the string, not just the first one.
/(?:(?:25[0-5]|2[0-4]\d|[01]?\d?\d{1})\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d?\d{1})/g
Let's dive in and see what each part does.
There are two anchor types for regexes: the start/end of the string and the boundaries between words. For this regex, we explicity do not want either of these. We want to find all instances of the pattern regardless of their position in the string, so we will keep it the same.
However, if you want to find only IP addresses that are at the beginning of the string, you can place ^
at the beginning of the regex. This may be useful if you have a log and have already split it into an array at each line break, and the IP address is at the beginning of a line.
/^(?:(?:25[0-5]|2[0-4]\d|[01]?\d?\d{1})\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d?\d{1})/g
Additionally, if you want to find only IP addresses that are separated by whitespace, you can place \b
at the beginning and end of the regex like so:
/\b(?:(?:25[0-5]|2[0-4]\d|[01]?\d?\d{1})\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d?\d{1})\b/g
With regex you can find subexpressions in the string as well. For our purposes, this is useful because we are looking for two subexpressions:
- Three sets of numbers followed by a period
- One set of numbers not followed by a period
This is done by utilizing grouping constructs. Since this isn't a particularly complex regex, we are using non-capturing groups. The (?:
creates a grouping. Then we can use quantifiers to match multiple occurences of that subexpression. Notice we are creating two groups. The first one contains another grouping of subexpressions. This is finding the instances of numbers followed by a period. The second group contains one subexpression that is finding just one instance of numbers not followed by a period.
Qualifiers tell the regex how many times to find a subexpression. Our regex uses two specific quantifiers: {1}
and {3}
. The {1}
is used to find exactly one instance of a digit (0-9) so there is at least one digit found in the subexpression. The {3}
is used to find exactly three instances of numbers followed by a period. The regex also uses the "lazy" quantifier ?
a few times. The lazy quantifier matches as few instances as possible, in our case 0 or 1 instance of a digit (number from 0 to 9).
Bracket expressions begin with [
and end with ]
and contain a "set". They are used to find any character in the suppled set or any characters in a range of letters (A-Z, a-z) or numbers (0-9) that are in the supplied set. We are making use of bracket expressions to find the digits of numbers that are 0-255. We can just put [0-255]
, we have to check each digit. We also are utilizing the character class for digits, \d
. This is a shorthand for [0-9]
. We use it several times to check the digits of the numbers found. You'll notice that sometimes we use \d
and other times we use [0-n]
where n is the upper bound of the digits we want to find. And that leads us finally to the |
operator...
Because we only want numbers as great as 255 and we can only check one digit at a time, we need to introduce the OR operator: |
. This acts like the OR operation in JavaScript. In this tutorial, we want to find numbers that begin with "25" and end with any digit (\d
) OR numbers that begin with "2" and then have any two digits OR numbers that begin with 0 or 1 ([01]
) and then have any two digits (\d\d
).
The subexpression 25[0-5]
finds numbers from 250-255; 2[0-4]\d
finds numbers from 200-249; and [01]?\d\d
finds numbers from 0/000-199. We need to look for each of these possibilities and return only one; therefore, we use the OR operator.
As discussed previously, we are using the /g
flag to specify a "global search". Using this flag will retain the index of the last match so that subsequent searches can know to start looking after the previous match. This flag is required when using JavaScript's built-in String method .matchAll()
to find all instances of the regex pattern in a string.
Regexes can also search for the characters that make up its own operators (i.e., brackets, slashes, periods). Our regex uses one character escape to find the period at the end of each number that makes up the IP address: \.
.
Mark Drummond