-
-
Save cjaoude/fd9910626629b53c4d25 to your computer and use it in GitHub Desktop.
Use: for testing against email regex | |
ref: http://codefool.tumblr.com/post/15288874550/list-of-valid-and-invalid-email-addresses | |
List of Valid Email Addresses | |
email@example.com | |
firstname.lastname@example.com | |
email@subdomain.example.com | |
firstname+lastname@example.com | |
email@123.123.123.123 | |
email@[123.123.123.123] | |
"email"@example.com | |
1234567890@example.com | |
email@example-one.com | |
_______@example.com | |
email@example.name | |
email@example.museum | |
email@example.co.jp | |
firstname-lastname@example.com | |
List of Strange Valid Email Addresses | |
much.”more\ unusual”@example.com | |
very.unusual.”@”.unusual.com@example.com | |
very.”(),:;<>[]”.VERY.”very@\\ "very”.unusual@strange.example.com | |
List of Invalid Email Addresses | |
plainaddress | |
#@%^%#$@#$@#.com | |
@example.com | |
Joe Smith <email@example.com> | |
email.example.com | |
email@example@example.com | |
.email@example.com | |
email.@example.com | |
email..email@example.com | |
あいうえお@example.com | |
email@example.com (Joe Smith) | |
email@example | |
email@-example.com | |
email@example.web | |
email@111.222.333.44444 | |
email@example..com | |
Abc..123@example.com | |
List of Strange Invalid Email Addresses | |
”(),:;<>[\]@example.com | |
just”not”right@example.com | |
this\ is"really"not\allowed@example.com |
ok
ok
email@123.123.123.123
email@[123.123.123.123]
these two mail ids are Invalid.
According to RFC5322:
Parsing "email" local-part
local-part = dot-atom
dot-atom = dot-atom-text
dot-atom-text = 1*atext
atext = ALPHA
"email" is a valid local-part
Parsing "123.123.123.123" domain
domain = dot-atom
dot-atom = dot-atom-text
dot-atom-text = 1*atext *("." 1*atext)
atext = DIGIT
"123" is a valid 1*atext, so "123.123.123.123" is a valid domain.
Parsing "[123.123.123.123]" domain
domain = domain-literal
domain-literal = "[" *(dtext) "]"
dtext = %d49 / %d50 / %d51
"123" is a valid *(dtext), so "[123.123.123.123]" is a valid domain.
"email@example.com (Joe Smith)" is valid according to RFC5322:
(local-part see previous comment)
domain = obs-domain
obs-domain = atom *("." atom)
atom = 1*atext [CFWS]
atext = ALPHA
CFWS = 1*([FWS] comment)
comment = "(" *([FWS] ccontent) ")"
ccontent = ctext
ctext = ... ; US-ASCII
"example" matches as atom
"com (Joe Smith)" matches as atext+CFWS
("com" is 1*atext, " " is FWS, "(Joe Smith)" is comment)
That being said, I think that "email@123.123.123.123" and "email@example.com (Joe Smith)" are invalid according to RFC5321.
(Sorry, the parser I implemented currently only checks for RFC5322 compliance.)
Where's the regex to validate this list???
I tested this list with JMail and other Java email address validation libraries: https://www.rohannagar.com/jmail
How many RFCs exist to validate emails? There is another one used the by PHP language to validate emails → rfc822
@ekscrypto Thanks for answering, this is a really interesting topic for me.
ok
ok
ok
ok
ok
ok
ok
ok
ok
hmmm.. ok I guess?
ok
IMO:
Regarding the local part, the original rfc822 says:
... The domain-dependent string is uninterpreted,
except by the final sub-domain; the rest of the mail
service merely transmits it as a literal string.
And that's the way the world works, RFCs don't send emails. A "valid" local part is completely up to the email server hosting the address. The only limitations on that are the arbitrary ones placed there by your email-sending pipeline.
Max-length is interesting, since the internet supports utf8 now, a "character" can be up to 4 bytes. With a limit of 256 bytes in the domain part, without parsing to punycode, safest bet is to limit it to 63 characters. That way you don't have to worry about subdomains being more than 63 characters either, since they need to fit the whole thing in there.
So your final validation should:
- split at the final @-symbol
- local part between 1 and 16000 characters (DB varchar(max) is 65535 bytes, assume all unicode and leave room for domain)
- domain part between 4 and 63 characters
- subdomains can't start or end with "-"
- domain part matches one of: DNS, ipv4 or ipv6 regex
// These are the regexes I landed on for javascript:
// The following letter sets are added because wikipedia insists
// they're valid email addresses, so, should be included in /p{L} but aren't:
// Hindi character set: \u0900-\u097F
// Kannada character set: \u0C80-\u0CFF
const domainRegex =
/^((?!-)[\p{L}\p{N}\u0900-\u097F\u0C80-\u0CFF-]+(?<!-)\.)+[\p{L}\u0900-\u097F\u0C80-\u0CFF]{2,}$/iu;
const ipv4Regex =
/^\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\]$/;
const ipv6Regex =
/^\[ipv6:(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))\]$/i;
ok
ok
ok
ok
ok
ok
ok
ok