RFC allows almost everything, that's why /.+@.+/
is the only way for a (simple) regex.
Most regular expressions do not cope with comments in the email address. The RFC allows comments to be arbitrarily nested. A single regular expression cannot cope with this. The Perl module pre-processes email addresses to remove comments before applying the mail regular expression.
HISTORICAL NOTE: Several of the mechanisms described in this set of documents may seem somewhat strange or even baroque at first reading. In particular, compatibility was always favored over elegance.
There is no point in trying to work out if an email address is ‘valid’. A user is far more likely to enter a wrong and valid email address than they are to enter an invalid one. Therefore, you are better off spending your time doing literally any other thing than trying to validate email addresses.
One approach could be to reduce misspellings using something like https://github.com/mailcheck/mailcheck.
https://html.spec.whatwg.org/multipage/input.html#valid-e-mail-address
It's been said that it's impossible to parse email addresses using regular expressions alone. This is somewhat true. If you allow comments in email addresses, then nested comments cannot be matched with a single regexp - a simple loop applying a reducing regexp first is needed. Aside from that, the library (https://code.iamcal.com/php/rfc822/) uses some post-match checks instead of rolling everything into one regexp. This is not because it wouldn't be possible, but because it would make it huge - the number of IPv6 permutations alone would probably double the size. Aside from the practicality, it seems entirely possible to boil it down to a single regexp. However, the one used for HTML5 is not even close...
The requirement is a willful violation of RFC 5322, which defines a syntax for e-mail addresses that is simultaneously too strict (before the "@" character), too vague (after the "@" character), and too lax (allowing comments, whitespace characters, and quoted strings in manners unfamiliar to most users) to be of practical use here.
The following JavaScript and Perl compatible regular expression is an implementation of that definition.
/^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/
- A . is not required. A TLD can have email addresses, or there could be an IPv6 address.
- RFCs are not the end of the story: ICANN does not allow 'dotless' domains any more.
- https://www.icann.org/news/announcement-2013-08-30-en (New gTLD Dotless Domain Names Prohibited)
- RFCs are not the end of the story: ICANN does not allow 'dotless' domains any more.
- The maximum length for an email address is 254 characters.
- The local part (before the @) is limited to 64 characters and that each part of the domain name is limited to 63 characters. There's no direct limit on the number of subdomains. But the maximum length of an email address that can be handled by SMTP is 254 characters. So with a single-character local part, a two-letter top-level domain and single-character sub-domains, 125 is the maximum number of sub-domains.
The local-part of the e-mail address may use any of these ASCII characters:
- Uppercase and lowercase English letters (a-z, A-Z)
- Digits 0 to 9
- Characters ! # $ % & ' * + - / = ? ^ _ ` { | } ~
- Character . (dot, period, full stop) provided that it is not the first or last character, and provided also that it does not appear two or more times consecutively.
Additionally, quoted-strings (ie: "John Doe"@example.com) are permitted, thus allowing characters that would otherwise be prohibited, however they do not appear in common practice. RFC 5321 also warns that "a host that expects to receive mail SHOULD avoid defining mailboxes where the Local-part requires (or uses) the Quoted-string form".
- Gmail ignores dots in the part before @, so if your email is
test@gmail.com
you can send emails totest.@gmail.com
ortest....@gmail.com
, both of those addresses are invalid according to RFC, but valid in real world.
- http://emailregex.com/
- http://emailregex.com/email-validation-summary/
- https://www.regular-expressions.info/email.html
- https://en.wikipedia.org/wiki/International_email
- https://en.wikipedia.org/wiki/Email_address#Examples
- https://en.wikibooks.org/wiki/JavaScript/Best_practices#Email_validation
- https://en.wikipedia.org/wiki/List_of_Internet_top-level_domains
- http://www.orsn.org/en/tech/tld/
- http://data.iana.org/TLD/tlds-alpha-by-domain.txt
- https://tools.ietf.org/html/rfc6530 (Overview and Framework for Internationalized Email)
- https://stackoverflow.com/questions/46155/how-to-validate-an-email-address-in-javascript
- https://stackoverflow.com/questions/201323/how-to-validate-an-email-address-using-a-regular-expression/201378#201378
- https://stackoverflow.com/questions/201323/how-to-validate-an-email-address-using-a-regular-expression/1917982#1917982
- https://stackoverflow.com/questions/760150/can-an-email-address-contain-international-non-english-characters/31066998#31066998
- https://stackoverflow.com/questions/2049502/what-characters-are-allowed-in-an-email-address/2049510
- https://stackoverflow.com/questions/24973086/are-comments-allowed-in-email-address-domain-part
- https://superuser.com/questions/958156/what-is-the-purpose-of-allowing-comments-inside-email-addresses
- https://fightingforalostcause.net/content/misc/2006/compare-email-regex.php
- http://thedailywtf.com/articles/Validating_Email_Addresses
- https://github.com/manishsaraan/email-validator/blob/master/index.js (RegEx + Function)
- https://isemail.info/
- http://www.dominicsayers.com/isemail/
- https://github.com/dominicsayers/isemail
- https://code.iamcal.com/php/rfc822/
- https://code.iamcal.com/php/rfc822/demo.php
- https://github.com/iamcal/rfc822/blob/master/rfc822.php
- https://www.w3.org/TR/html5/forms.html#valid-e-mail-address
- https://html.spec.whatwg.org/multipage/input.html#valid-e-mail-address
- https://code.iamcal.com/php/rfc822/full_regexp.txt
- https://www.npmjs.com/package/isemail
- http://sphinx.mythic-beasts.com/~pdw/cgi-bin/emailvalidate
- http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html
- w3c/html#538
- w3c/html#845
- https://www.w3.org/Bugs/Public/show_bug.cgi?id=15489
- https://www.icann.org/news/announcement-2013-08-30-en
- https://tools.ietf.org/html/rfc6531
- https://shkspr.mobi/blog/2014/01/poor-idn-support-from-major-webmail-providers/
- https://shkspr.mobi/blog/2016/09/why-cant-you-send-email-to-a-chinese-address/
- https://hackernoon.com/the-100-correct-way-to-validate-email-addresses-7c4818f24643
- https://hackernoon.com/how-to-reduce-incorrect-email-addresses-df3b70cb15a9
- http://blog.gerv.net/2011/05/html5_email_address_regexp/
- https://rgxdb.com/r/1JWKZ0PW
- https://jsfiddle.net/davidg707/835bxzas/
- https://github.com/mailcheck/mailcheck
- https://uasg.tech/wp-content/uploads/2017/04/Unleashing-the-Power-of-All-Domains-White-Paper.pdf
- http://www.potaroo.net/reports/Universal-Acceptance/UA-Report.pdf
- https://regexr.com/3dnsr
- https://www.youtube.com/watch?v=JENdgiAPD6c
- https://www.youtube.com/watch?v=4s9IjkMAmns
- http://jsfiddle.net/gerst20051/y1puhfmk/
- https://proofy.io/
id | valid | emailaddress | notes |
---|---|---|---|
2 | false | .email@test.com | a . is not allowed at the beginning and/or end |
3 | false | 1234567890123456789012345678901234567890123456789012345678901234+x@example.com | too long |
5 | false | a"b(c)d,e:f;gi[jk]l@example.com | none of the special characters in this local-part are allowed outside quotation marks |
8 | false | email@test.com. | a . is not allowed at the beginning and/or end |
9 | false | john..doe@example.com | double dot before @ |
10 | false | john.doe@example..com | double dot after @ |
11 | false | john@aol...com | not valid due to consecutive dots |
12 | false | just"not"right@example.com | quoted strings must be dot separated or the only element making up the local-part |
16 | true | "()<>[]:,;@"!#$%&'-/=?^_`{} | ~.a"@example.org |
18 | true | "very.(),:;<>[]".VERY."very@\ "very".unusual"@strange.example.com |