-
-
Save cjaoude/fd9910626629b53c4d25 to your computer and use it in GitHub Desktop.
Use: for testing against email regex | |
ref: http://codefool.tumblr.com/post/15288874550/list-of-valid-and-invalid-email-addresses | |
List of Valid Email Addresses | |
email@example.com | |
firstname.lastname@example.com | |
email@subdomain.example.com | |
firstname+lastname@example.com | |
email@123.123.123.123 | |
email@[123.123.123.123] | |
"email"@example.com | |
1234567890@example.com | |
email@example-one.com | |
_______@example.com | |
email@example.name | |
email@example.museum | |
email@example.co.jp | |
firstname-lastname@example.com | |
List of Strange Valid Email Addresses | |
much.”more\ unusual”@example.com | |
very.unusual.”@”.unusual.com@example.com | |
very.”(),:;<>[]”.VERY.”very@\\ "very”.unusual@strange.example.com | |
List of Invalid Email Addresses | |
plainaddress | |
#@%^%#$@#$@#.com | |
@example.com | |
Joe Smith <email@example.com> | |
email.example.com | |
email@example@example.com | |
.email@example.com | |
email.@example.com | |
email..email@example.com | |
あいうえお@example.com | |
email@example.com (Joe Smith) | |
email@example | |
email@-example.com | |
email@example.web | |
email@111.222.333.44444 | |
email@example..com | |
Abc..123@example.com | |
List of Strange Invalid Email Addresses | |
”(),:;<>[\]@example.com | |
just”not”right@example.com | |
this\ is"really"not\allowed@example.com |
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
email@123.123.123.123
email@[123.123.123.123]
these two mail ids are Invalid.
According to RFC5322:
Parsing "email" local-part
local-part = dot-atom
dot-atom = dot-atom-text
dot-atom-text = 1*atext
atext = ALPHA
"email" is a valid local-part
Parsing "123.123.123.123" domain
domain = dot-atom
dot-atom = dot-atom-text
dot-atom-text = 1*atext *("." 1*atext)
atext = DIGIT
"123" is a valid 1*atext, so "123.123.123.123" is a valid domain.
Parsing "[123.123.123.123]" domain
domain = domain-literal
domain-literal = "[" *(dtext) "]"
dtext = %d49 / %d50 / %d51
"123" is a valid *(dtext), so "[123.123.123.123]" is a valid domain.
"email@example.com (Joe Smith)" is valid according to RFC5322:
(local-part see previous comment)
domain = obs-domain
obs-domain = atom *("." atom)
atom = 1*atext [CFWS]
atext = ALPHA
CFWS = 1*([FWS] comment)
comment = "(" *([FWS] ccontent) ")"
ccontent = ctext
ctext = ... ; US-ASCII
"example" matches as atom
"com (Joe Smith)" matches as atext+CFWS
("com" is 1*atext, " " is FWS, "(Joe Smith)" is comment)
That being said, I think that "email@123.123.123.123" and "email@example.com (Joe Smith)" are invalid according to RFC5321.
(Sorry, the parser I implemented currently only checks for RFC5322 compliance.)
Where's the regex to validate this list???
I tested this list with JMail and other Java email address validation libraries: https://www.rohannagar.com/jmail
How many RFCs exist to validate emails? There is another one used the by PHP language to validate emails → rfc822
@ekscrypto Thanks for answering, this is a really interesting topic for me.
ok
ok
ok
ok
ok
ok
ok
ok
ok
hmmm.. ok I guess?
ok
IMO:
Regarding the local part, the original rfc822 says:
... The domain-dependent string is uninterpreted,
except by the final sub-domain; the rest of the mail
service merely transmits it as a literal string.
And that's the way the world works, RFCs don't send emails. A "valid" local part is completely up to the email server hosting the address. The only limitations on that are the arbitrary ones placed there by your email-sending pipeline.
Max-length is interesting, since the internet supports utf8 now, a "character" can be up to 4 bytes. With a limit of 256 bytes in the domain part, without parsing to punycode, safest bet is to limit it to 63 characters. That way you don't have to worry about subdomains being more than 63 characters either, since they need to fit the whole thing in there.
So your final validation should:
- split at the final @-symbol
- local part between 1 and 16000 characters (DB varchar(max) is 65535 bytes, assume all unicode and leave room for domain)
- domain part between 4 and 63 characters
- subdomains can't start or end with "-"
- domain part matches one of: DNS, ipv4 or ipv6 regex
// These are the regexes I landed on for javascript:
// The following letter sets are added because wikipedia insists
// they're valid email addresses, so, should be included in /p{L} but aren't:
// Hindi character set: \u0900-\u097F
// Kannada character set: \u0C80-\u0CFF
const domainRegex =
/^((?!-)[\p{L}\p{N}\u0900-\u097F\u0C80-\u0CFF-]+(?<!-)\.)+[\p{L}\u0900-\u097F\u0C80-\u0CFF]{2,}$/iu;
const ipv4Regex =
/^\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\]$/;
const ipv6Regex =
/^\[ipv6:(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))\]$/i;
ok
ok
ok
ok
ok
ok
ok
ok
Don't forget names and utf8:
=?utf-8?B?R0FSQU5UIEJDIERNQ0M=?= <2342431@mycompany.com>
ok
ok
ok
ok
ok
@frattaro If this is a 'humble' opinion, I don't want to know the 'supreme' opinion
@cizordj It was never mentioned that the opinion was humble. I think it was meant to be supreme from the beginning.
@RobKenis he literally said IMO 🐱
@cizordj And what does IMO stand for
@cizordj And what does IMO stand for
"In My Office"
@cizordj And what does IMO stand for
"In My Office"
"Iguanas Making Omelets"
@cizordj And what does IMO stand for
"In My Office"
"Iguanas Making Omelets"
"Invisible Martian Orchestras"
ok
I added "IMO" - "In my opinion" - lastly after writing that, because I finished and thought "you know, it wouldn't be that much more work to measure the byte lengths of the strings... and I'd bet you $5 that regex is missing some unicode ranges. I should probably package this up. I'm not gonna do that. You know what? I'll just say it's an opinion because it's good enough and I'm done with it."
ok
@frattaro tldr
@kinduff ok
@cizordj And what does IMO stand for
"In My Office"
"Iguanas Making Omelets"
"Invisible Martian Orchestras"
Interpolate my onions.
ok
I added "IMO" - "In my opinion" - lastly after writing that, because I finished and thought "you know, it wouldn't be that much more work to measure the byte lengths of the strings... and I'd bet you $5 that regex is missing some unicode ranges. I should probably package this up. I'm not gonna do that. You know what? I'll just say it's an opinion because it's good enough and I'm done with it."
ok
@cizordj And what does IMO stand for
"In My Office"
"Iguanas Making Omelets"
"Invisible Martian Orchestras"
Interpolate my onions.
Iced Mocha Overload
@cizordj And what does IMO stand for
Intellectuals Meeting Ogres
This answers all the regex questions about email: https://youtu.be/mrGfahzt-4Q?feature=shared&t=992
This answers all the regex questions about email: https://youtu.be/mrGfahzt-4Q?feature=shared&t=992
We don't want answers about email, we want answers about what IMO stands for
This answers all the regex questions about email: https://youtu.be/mrGfahzt-4Q?feature=shared&t=992
We don't want answers about email, we want answers about what IMO stands for
It's Means Ok
This is why English can be confusing sometimes, specially for foreigners.
really ok
ok
I'll argue that if you list email@example.web as an invalid email address then you should also list email@123.123.123.123 has invalid. As per RFC5321 Section 4.1.2 address literals can only be used if they start with "[" and end with "]", and .123 is not a valid TLD as per Public Suffix List.
Also based on that same RFC5321 I would argue that your list of very unusual email addresses are actually all invalid. Local-part definition clearly indicate that you either have a Dot-string or a Quoted-string, not both and not a combination of both. None of your very unusual emails start with DQUOTE therefore would fall under the Dot-string validation rule which only allow Atom *("." Atom), atom being only 1*atext