Image: Joshua Tree Saloon by Wendy Bayer
Dealing with and representing email addresses is very common
in Ruby on Rails and other web applications. This is
especially true when representing and implementing an
application's user (i.e. User
model.) Often the user's
email address is used as their login identifier
(e.g. "username") which means that the email address
must be present, unique, and indexed for database
performance.
Although representing email addresses in Rails applications is very common, it is not always done consistently and/or correctly according to the email address standards.
This post seeks to help with that.
This post presents an overview of the email address standards (particularly case-sensitivity, maximum length, and format), how to meet these standards in Rails and PostgreSQL, an example implementation of these standards, and finally how to test to ensure that the standards are being met.
The email address standards were developed in the 1980s by the Internet Engineering Task Force (IETF) mostly in RFC 5322 specifically section 3.4.1.
🚢 Captain Obvious says that "RFC" is an abbreviation for Request for Comments
According to the RFC standard, there are three parts to an email address:
- The local-part (e.g.
Bob.Dobbs
in email addressBob.Dobbs@example.com
) - The symbol '@'
- A domain which is either a domain name (e.g.
example.com
) or an IP address enclosed in brackets
Most of the time, you are dealing with email addresses that
look like Bob.Dobbs@example.com
.
👉 TL;DR: Email Addresses are technically case sensitive, but most email services are case-insensitive
The case sensitivity of an email address is one of the more confusing aspects and is where many implementations diverge from the standard.
It is a common misconception that email addresses are case-insensitive. But that is not true according to the SMTP (Simple Mail Transfer Protocol) standard RFC5321 Section 2.4 which states...
The local-part of a mailbox MUST BE treated as case sensitive. Therefore, SMTP implementations MUST take care to preserve the case of mailbox local-parts. In particular, for some hosts, the user "smith" is different from the user "Smith". However, exploiting the case sensitivity of mailbox local-parts impedes interoperability and is discouraged.
Most major email providers like Gmail have
case-insensitive implementations treating bobdobbs@gmail.com
as the same email address as BobDobbs@gmail.com
.
Thus in your application, you should maintain the original
case of an email address, but disregard case when considering
whether or not an email is unique (e.g. treating
bobdobbs@gmail.com
as the same email address as
BobDobbs@gmail.com
).
👉 TL;DR: The maximum length of an email address is 254 characters
Another area of confusion is the maximum length of an email address. This confusion is understandable considering that the IETF even got it wrong in its standards and had to issue an erratum. This Stack Overflow post sums it up very well with the supporting links...
An email address must not exceed 254 characters.
This was accepted by the IETF following submitted erratum. A full diagnosis of any given address is available online. The original version of RFC 3696 described 320 as the maximum length, but John Klensin subsequently accepted an incorrect value, since a Path is defined as
Path = "<" [ A-d-l ":" ] Mailbox ">"
So the Mailbox element (i.e., the email address) has angle brackets around it to form a Path, which a maximum length of 254 characters to restrict the Path length to 256 characters or fewer.
The maximum length specified in RFC 5321 states:
The maximum total length of a reverse-path or forward-path is 256 characters.
RFC 3696 was corrected here.
People should be aware of the errata against RFC 3696 in particular. Three of the canonical examples are in fact invalid addresses.
Thus in your application, you should enforce a maximum length of 254 characters for email addresses.
👉 TL;DR: the rules for the syntax of a valid email address are lengthy and complex and properly and inclusively validating this format is a challenge
Given the size and complexity of the standard for a valid email address, it will not be presented here in this post.
However, for a good summary of the valid format for an email address, see the Email address entry in Wikipedia.
🤦 One interesting call out is that spaces and the special
characters "
(
)
,
:
;
<
>
@
[
\
]
are
allowed in the local-part of an email address if they are enclosed
in quotes. Thus, per the standard,
"Some spaces! And @ sign too!"@some.server.com
is a valid
email address.
Given the complexity of the syntax for all valid email addresses and the complexity and amount of testing required for covering all cases of complex regular expressions, you should generally avoid writing your own email validation regular expressions.
This section presents how to implement the email address
standards in Rails and PostgreSQL, particularly in the
common case of an application User
model with an
email
attribute.
In your Rails application, you want to keep the original case of the email addresses, but ignore differences in case when comparing email addresses for uniqueness.
Assuming your Rails application is using PostgreSQL, then
most likely the best data type to use for an email column is
citext
.
This is a case-insensitive text
data type allowing you to
store and preserve the original case-sensitive value but to
efficiently index and compare it (i.e for uniqueness) in a
case-insensitive manner. Overall this is more correct,
efficient, and performant than downcase
ing email addresses in
your application.
However, there are a few things to be aware of when using
citext
columns in rails.
-
Although Rails (ActiveRecord) supports
citext
, Postgres does not by default enable the Citext extension, so you must enable it in your migration by adding the lineenable_extension(:citext)
-
Because
citext
is a Postgrestext
field, there is no default maximum length constraint at the database level, so you should add one as part of your migration along with a a maximum length validation at the model to ensure the 254 character limit of an email address.The maximum length constraint for the database in the migration looks like this...
t.check_constraint '(length(email) < 255)', name: 'email_length_check'
And the length validation at the model looks like this...
validates :email, length: { maximum: 254}
🦸 Special thanks and shoutout to @dmcsorley for providing the information on adding the database constraints
👀 For more information on
citext
fields, especially for email columns, see
- Case insensitive emails and usernames with Postgres - this is the great post that got me started on this whole topic
- PostgreSQL
citext
documentation- Stack Overflow post on
citext
Performance
👀 For more information on adding
check_constraint
s to your Rails migrations and doing it safely in production, see
Here is a Rails 7.0.4.2 example of a Rails migration for
a User
model with an email
address field using citext
where the email address must be present, have an index,
be case-insensitive unique, and have a maximum length
constraint of 254 characters.
The rails generate
command to create this User
model
would look like this...
bundle exec rails generate model User email:citext
After editing to...
-
Enable the Citext extension
enable_extension(:citext)
-
Ensure presence of a value
null: false
-
Add an index for performance and to ensure (case-insensitive) uniqueness
index: { unique: true }
-
Add a constraint to ensure a maximum length of 254 characters
t.check_constraint '(length(email) < 255)', name: 'email_length_check'
the completed migration file would be...
class CreateUsers < ActiveRecord::Migration[7.0]
def change
enable_extension(:citext)
create_table :users do |t|
t.citext :email, null: false, index: { unique: true }
t.check_constraint '(length(email) < 255)', name: 'email_length_check'
t.timestamps
end
end
end
Running this migration would produce a schema that looks something like this...
ActiveRecord::Schema[7.0].define(version: 2023_02_16_195842) do
# These are extensions that must be enabled in order to support this database
enable_extension "citext"
enable_extension "plpgsql"
...
create_table "users", force: :cascade do |t|
t.citext "email", null: false
t.datetime "created_at", null: false
t.datetime "updated_at", null: false
t.index ["email"], name: "index_users_on_email", unique: true
t.check_constraint "length(email::text) < 255", name: "email_length_check"
end
...
end
After generating and running the migration for the example
User
model, the desired validations can be added for...
-
Presence
presence: true
-
Uniqueness (although case insensitiveness is handled at the database level by the
citext
data type)uniqueness: true
-
Maximum length
length: { maximum: 254 }
-
Format (which is covered in more detail below)
format: { with: URI::MailTo::EMAIL_REGEXP, ... }
By far, the best validation and verification for an email address is the technique of actually sending an email to that address and requiring a response, usually through a unique link to a controller in the application. This approach is covered very well in the excellent Ruby On Rails Tutorial chapter on Account Activation.
However, even with this approach, some initial format validation of an email address is generally desired and warranted.
The post
How to validate an email address in Ruby
offers a rather easy, pragmatic, and fairly comprehensive
approach using the Rails validates_format_of
validation
with the URI::MailTo::EMAIL_REGEXP
regular expression,
which is part of the standard Ruby library. Using this
Ruby-provided regular expression is arguably better than
"rolling your own" regular expression to account for all
valid email addresses. You can also specify a custom
message if the validation fails.
validates :email, format: {with: URI::MailTo::EMAIL_REGEXP, message: 'must match URI::MailTo::EMAIL_REGEXP' }
Note that the URI::MailTo::EMAIL_REGEXP
will not match
(i.e. successfully validate) valid but uncommon email
addresses with the enclosed-in-quotes formats such as
"Some spaces! And @ sign too!"@some.server.com
.
But, this may actually be a good thing 😉 by
mitigating any potential injection attacks.
Given these desired validations for the User
model
example, the app/models/user.rb
file would look
something like this...
class User < ApplicationRecord
validates :email, presence: true,
uniqueness: true,
length: {
maximum: 254
},
format: {
with: URI::MailTo::EMAIL_REGEXP,
message: 'must match URI::MailTo::EMAIL_REGEXP'
}
end
To make your testing easier, you can use the faker gem for fuzzing your test data, thoughtbot's factory_bot gem for your test data factories and their shoulda matchers gem for your validation assertions.
👉 Note that the examples shown here are using the
RSpec
test framework but you should
be able to use the same gems, matchers, and approaches with the
Rails default
minitest
.
Use a factory_bot test data factory for the User
model
and use faker
for fuzzing the valid email address values.
Also, define a trait
in your :user
factory for an
email that is longer than the maximum 254 characters.
For example in file spec/factories/users.rb
...
FactoryBot.define do
factory :user do
email { Faker::Internet.email }
trait :email_255_chars do
domain = '@test.com'
email { "#{Faker::Lorem.characters(number: (255 - domain.length))}#{domain}" }
end
end
end
To test the email
field presence validation, use the
validate_presence_of
ActiveModel
shoulda matcher
it { is_expected.to validate_presence_of(:email) }
To test the maximum length check_constraint
at the database,
you will need to bypass the model validation and verify that
the database raises an error when you attempt to save a user
record with an email longer than 255 characters. Use the
trait
you defined in your :user
factory for an
email that is longer than the maximum 254 characters.
The error that you should see is an
ActiveRecord::StatementInvalid
exception with a message
that contains the PG::CheckViolation
exception.
user = build(:user, :email_255_chars)
expect { user.save!(validate: false) }.to raise_error(ActiveRecord::StatementInvalid, /PG::CheckViolation/)
To test the maximum length validation of the email
field,
use the
validate_length_of
ActiveModel
shoulda matcher
it { is_expected.to validate_length_of(:email).is_at_most(254) }
To test the format validations of the email
field,
you will need to test both some positive and negative
cases, in other words, some valid email addresses that
should pass the validations and some invalid email addresses
that should fail the validations.
Use the allow_value
ActiveModel
shoulda matcher for testing both the valid and
invalid email addresses.
For the valid email address, use Faker::Internet.email
to generate a random valid email...
it { is_expected.to allow_value(Faker::Internet.email).for(:email) }
For the invalid email addresses, you will need to create your
own. Note that the allow_value
shoulda matcher provides
the ability to test a list of values...
....
let(:valid_but_rejected_email) { '"Some spaces! And @ sign too!" @some.server.com' }
let(:invalid_emails) do
%w[user@example,com user_at_foo.org user.name@example.
foo@bar_baz.com foo@bar+baz.com foo@bar..com]
end
...
it { is_expected.not_to allow_values(invalid_emails, valid_but_rejected_email).for(:email) }
Although there is a shoulda matcher for testing uniqueness, it tests case-sensitive uniqueness which will not work for the case-insensitive implementation presented here.
Here the factory-bot User
factory will be used as well
as the Ruby downcase
and upcase
methods. The returned
validation error message will also be verified to ensure
that the validation is falling for uniqueness. Note that
the error message check is done in the same test since
errors
is generated by the ActiveRecord#valid?
method.
it 'is expected to validate that :email is case-insenitive unique' do
downcase_email = Faker::Internet.email.downcase
user = create(:user, email: downcase_email)
new_user = build(:user, email: user.email.upcase)
expect(new_user.valid?).to be(false)
expect(new_user.errors.full_messages).to include('Email has already been taken')
end
Putting together all of the testing of the User
model
validations in spec/models/user_spec.rb
looks something
like this...
require 'rails_helper'
RSpec.describe User do
describe 'database constraints' do
describe 'email' do
it 'raises db error when email length greater than 254 characters' do
user = build(:user, :email_255_chars)
expect { user.save!(validate: false) }.to raise_error(ActiveRecord::StatementInvalid, /PG::CheckViolation/)
end
end
end
describe 'validations' do
describe 'email' do
let(:valid_but_rejected_email) { '"Some spaces! And @ sign too!" @some.server.com' }
let(:invalid_emails) do
%w[user@example,com user_at_foo.org user.name@example.
foo@bar_baz.com foo@bar+baz.com foo@bar..com]
end
it { is_expected.to validate_presence_of(:email) }
it { is_expected.to validate_length_of(:email).is_at_most(254) }
it { is_expected.to allow_value(Faker::Internet.email).for(:email) }
it { is_expected.not_to allow_values(invalid_emails, valid_but_rejected_email).for(:email) }
it 'is expected to validate that :email is case-insenitive unique' do
downcase_email = Faker::Internet.email.downcase
user = create(:user, email: downcase_email)
new_user = build(:user, email: user.email.upcase)
expect(new_user.valid?).to be(false)
expect(new_user.errors.full_messages).to include('Email has already been taken')
end
end
end
end
To enforce the length of the citext field in postgres you can use a check constraint. Ideally add it when you create the table, but if you didn't, you can
alter table
.If altering, it should be done in two steps to avoid locking the table for a scan (and wrecking prod).
Or the equivalent Rails migration. See also Braintree's excellent post on safe postgres operations.