Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save brianjbayer/4ed63d663db9c21f8a11126e839471cf to your computer and use it in GitHub Desktop.
Save brianjbayer/4ed63d663db9c21f8a11126e839471cf to your computer and use it in GitHub Desktop.
Detailed overview of the official email standards and how to more correctly implement them in Ruby on Rails and PostgreSQL

Representing Email Addresses in Rails

Joshua Tree Saloon - Wendy Bayer

Image: Joshua Tree Saloon by Wendy Bayer


Dealing with and representing email addresses is very common in Ruby on Rails and other web applications. This is especially true when representing and implementing an application's user (i.e. User model.) Often the user's email address is used as their login identifier (e.g. "username") which means that the email address must be present, unique, and indexed for database performance.

Although representing email addresses in Rails applications is very common, it is not always done consistently and/or correctly according to the email address standards.

This post seeks to help with that.

This post presents an overview of the email address standards (particularly case-sensitivity, maximum length, and format), how to meet these standards in Rails and PostgreSQL, an example implementation of these standards, and finally how to test to ensure that the standards are being met.


The Email Address Standards

The email address standards were developed in the 1980s by the Internet Engineering Task Force (IETF) mostly in RFC 5322 specifically section 3.4.1.

🚢 Captain Obvious says that "RFC" is an abbreviation for Request for Comments

The Three Parts of an Email Address

According to the RFC standard, there are three parts to an email address:

  1. The local-part (e.g. Bob.Dobbs in email address Bob.Dobbs@example.com)
  2. The symbol '@'
  3. A domain which is either a domain name (e.g. example.com) or an IP address enclosed in brackets

Most of the time, you are dealing with email addresses that look like Bob.Dobbs@example.com.

Case Sensitivity of Email Addresses

👉 TL;DR: Email Addresses are technically case sensitive, but most email services are case-insensitive

The case sensitivity of an email address is one of the more confusing aspects and is where many implementations diverge from the standard.

It is a common misconception that email addresses are case-insensitive. But that is not true according to the SMTP (Simple Mail Transfer Protocol) standard RFC5321 Section 2.4 which states...

The local-part of a mailbox MUST BE treated as case sensitive. Therefore, SMTP implementations MUST take care to preserve the case of mailbox local-parts. In particular, for some hosts, the user "smith" is different from the user "Smith". However, exploiting the case sensitivity of mailbox local-parts impedes interoperability and is discouraged.

Most major email providers like Gmail have case-insensitive implementations treating bobdobbs@gmail.com as the same email address as BobDobbs@gmail.com.

Thus in your application, you should maintain the original case of an email address, but disregard case when considering whether or not an email is unique (e.g. treating bobdobbs@gmail.com as the same email address as BobDobbs@gmail.com).

Maximum Length of an Email Address

👉 TL;DR: The maximum length of an email address is 254 characters

Another area of confusion is the maximum length of an email address. This confusion is understandable considering that the IETF even got it wrong in its standards and had to issue an erratum. This Stack Overflow post sums it up very well with the supporting links...

An email address must not exceed 254 characters.

This was accepted by the IETF following submitted erratum. A full diagnosis of any given address is available online. The original version of RFC 3696 described 320 as the maximum length, but John Klensin subsequently accepted an incorrect value, since a Path is defined as

Path = "<" [ A-d-l ":" ] Mailbox ">"

So the Mailbox element (i.e., the email address) has angle brackets around it to form a Path, which a maximum length of 254 characters to restrict the Path length to 256 characters or fewer.

The maximum length specified in RFC 5321 states:

The maximum total length of a reverse-path or forward-path is 256 characters.

RFC 3696 was corrected here.

People should be aware of the errata against RFC 3696 in particular. Three of the canonical examples are in fact invalid addresses.

Thus in your application, you should enforce a maximum length of 254 characters for email addresses.

Valid Email Address Format/Syntax

👉 TL;DR: the rules for the syntax of a valid email address are lengthy and complex and properly and inclusively validating this format is a challenge

Given the size and complexity of the standard for a valid email address, it will not be presented here in this post.

However, for a good summary of the valid format for an email address, see the Email address entry in Wikipedia.

🤦 One interesting call out is that spaces and the special characters " ( ) , : ; < > @ [ \ ] are allowed in the local-part of an email address if they are enclosed in quotes. Thus, per the standard, "Some spaces! And @ sign too!"@some.server.com is a valid email address.

Given the complexity of the syntax for all valid email addresses and the complexity and amount of testing required for covering all cases of complex regular expressions, you should generally avoid writing your own email validation regular expressions.


Email Addresses in Rails Applications

This section presents how to implement the email address standards in Rails and PostgreSQL, particularly in the common case of an application User model with an email attribute.

Database Datatype

In your Rails application, you want to keep the original case of the email addresses, but ignore differences in case when comparing email addresses for uniqueness.

Assuming your Rails application is using PostgreSQL, then most likely the best data type to use for an email column is citext.

This is a case-insensitive text data type allowing you to store and preserve the original case-sensitive value but to efficiently index and compare it (i.e for uniqueness) in a case-insensitive manner. Overall this is more correct, efficient, and performant than downcaseing email addresses in your application.

However, there are a few things to be aware of when using citext columns in rails.

  1. Although Rails (ActiveRecord) supports citext, Postgres does not by default enable the Citext extension, so you must enable it in your migration by adding the line

    enable_extension(:citext)
  2. Because citext is a Postgres text field, there is no default maximum length constraint at the database level, so you should add one as part of your migration along with a a maximum length validation at the model to ensure the 254 character limit of an email address.

    The maximum length constraint for the database in the migration looks like this...

    t.check_constraint '(length(email) < 255)', name: 'email_length_check'

    And the length validation at the model looks like this...

    validates :email, length: { maximum: 254}

    🦸 Special thanks and shoutout to @dmcsorley for providing the information on adding the database constraints

👀 For more information on citext fields, especially for email columns, see

👀 For more information on adding check_constraints to your Rails migrations and doing it safely in production, see

Example Migration for Email Column

Here is a Rails 7.0.4.2 example of a Rails migration for a User model with an email address field using citext where the email address must be present, have an index, be case-insensitive unique, and have a maximum length constraint of 254 characters.

The rails generate command to create this User model would look like this...

bundle exec rails generate model User email:citext

After editing to...

  • Enable the Citext extension

    enable_extension(:citext)
  • Ensure presence of a value

    null: false
  • Add an index for performance and to ensure (case-insensitive) uniqueness

    index: { unique: true }
  • Add a constraint to ensure a maximum length of 254 characters

    t.check_constraint '(length(email) < 255)', name: 'email_length_check'

the completed migration file would be...

class CreateUsers < ActiveRecord::Migration[7.0]
  def change
    enable_extension(:citext)
    create_table :users do |t|
      t.citext :email, null: false, index: { unique: true }
      t.check_constraint '(length(email) < 255)', name: 'email_length_check'

      t.timestamps
    end
  end
end

Running this migration would produce a schema that looks something like this...

ActiveRecord::Schema[7.0].define(version: 2023_02_16_195842) do
  # These are extensions that must be enabled in order to support this database
  enable_extension "citext"
  enable_extension "plpgsql"
  ...
  create_table "users", force: :cascade do |t|
    t.citext "email", null: false
    t.datetime "created_at", null: false
    t.datetime "updated_at", null: false
    t.index ["email"], name: "index_users_on_email", unique: true
    t.check_constraint "length(email::text) < 255", name: "email_length_check"
  end
  ...
end

Example Validations for Email Column

After generating and running the migration for the example User model, the desired validations can be added for...

  • Presence

    presence: true
  • Uniqueness (although case insensitiveness is handled at the database level by the citext data type)

    uniqueness: true
  • Maximum length

    length: { maximum: 254 }
  • Format (which is covered in more detail below)

    format: { with: URI::MailTo::EMAIL_REGEXP, ... }

Validating Email Format

By far, the best validation and verification for an email address is the technique of actually sending an email to that address and requiring a response, usually through a unique link to a controller in the application. This approach is covered very well in the excellent Ruby On Rails Tutorial chapter on Account Activation.

However, even with this approach, some initial format validation of an email address is generally desired and warranted.

The post How to validate an email address in Ruby offers a rather easy, pragmatic, and fairly comprehensive approach using the Rails validates_format_of validation with the URI::MailTo::EMAIL_REGEXP regular expression, which is part of the standard Ruby library. Using this Ruby-provided regular expression is arguably better than "rolling your own" regular expression to account for all valid email addresses. You can also specify a custom message if the validation fails.

validates :email, format: {with: URI::MailTo::EMAIL_REGEXP, message: 'must match URI::MailTo::EMAIL_REGEXP' }

Note that the URI::MailTo::EMAIL_REGEXP will not match (i.e. successfully validate) valid but uncommon email addresses with the enclosed-in-quotes formats such as "Some spaces! And @ sign too!"@some.server.com. But, this may actually be a good thing 😉 by mitigating any potential injection attacks.

Example User Model

Given these desired validations for the User model example, the app/models/user.rb file would look something like this...

class User < ApplicationRecord
  validates :email, presence: true,
                    uniqueness: true,
                    length: {
                      maximum: 254
                    },
                    format: {
                      with: URI::MailTo::EMAIL_REGEXP,
                      message: 'must match URI::MailTo::EMAIL_REGEXP'
                    }
end

Testing

To make your testing easier, you can use the faker gem for fuzzing your test data, thoughtbot's factory_bot gem for your test data factories and their shoulda matchers gem for your validation assertions.

👉 Note that the examples shown here are using the RSpec test framework but you should be able to use the same gems, matchers, and approaches with the Rails default minitest.

The User Test Data Factory with Fuzzing

Use a factory_bot test data factory for the User model and use faker for fuzzing the valid email address values. Also, define a trait in your :user factory for an email that is longer than the maximum 254 characters.

For example in file spec/factories/users.rb...

FactoryBot.define do
  factory :user do
    email { Faker::Internet.email }

    trait :email_255_chars do
      domain = '@test.com'
      email { "#{Faker::Lorem.characters(number: (255 - domain.length))}#{domain}" }
    end
  end
end

Testing Presence

To test the email field presence validation, use the validate_presence_of ActiveModel shoulda matcher

it { is_expected.to validate_presence_of(:email) }

Testing Maximum Length

Testing the Database Constraint

To test the maximum length check_constraint at the database, you will need to bypass the model validation and verify that the database raises an error when you attempt to save a user record with an email longer than 255 characters. Use the trait you defined in your :user factory for an email that is longer than the maximum 254 characters.

The error that you should see is an ActiveRecord::StatementInvalid exception with a message that contains the PG::CheckViolation exception.

user = build(:user, :email_255_chars)
expect { user.save!(validate: false) }.to raise_error(ActiveRecord::StatementInvalid, /PG::CheckViolation/)

Testing the Model Validation

To test the maximum length validation of the email field, use the validate_length_of ActiveModel shoulda matcher

it { is_expected.to validate_length_of(:email).is_at_most(254) }

Testing Format

To test the format validations of the email field, you will need to test both some positive and negative cases, in other words, some valid email addresses that should pass the validations and some invalid email addresses that should fail the validations.

Use the allow_value ActiveModel shoulda matcher for testing both the valid and invalid email addresses.

For the valid email address, use Faker::Internet.email to generate a random valid email...

it { is_expected.to allow_value(Faker::Internet.email).for(:email) }

For the invalid email addresses, you will need to create your own. Note that the allow_value shoulda matcher provides the ability to test a list of values...

....
      let(:valid_but_rejected_email) { '"Some spaces! And @ sign too!" @some.server.com' }
      let(:invalid_emails) do
        %w[user@example,com user_at_foo.org user.name@example.
           foo@bar_baz.com foo@bar+baz.com foo@bar..com]
      end
...

it { is_expected.not_to allow_values(invalid_emails, valid_but_rejected_email).for(:email) }

Testing Case-Insensitive Uniqueness

Although there is a shoulda matcher for testing uniqueness, it tests case-sensitive uniqueness which will not work for the case-insensitive implementation presented here.

Here the factory-bot User factory will be used as well as the Ruby downcase and upcase methods. The returned validation error message will also be verified to ensure that the validation is falling for uniqueness. Note that the error message check is done in the same test since errors is generated by the ActiveRecord#valid? method.

it 'is expected to validate that :email is case-insenitive unique' do
  downcase_email = Faker::Internet.email.downcase
  user = create(:user, email: downcase_email)
  new_user = build(:user, email: user.email.upcase)

  expect(new_user.valid?).to be(false)
  expect(new_user.errors.full_messages).to include('Email has already been taken')
end

Example User Model Spec

Putting together all of the testing of the User model validations in spec/models/user_spec.rb looks something like this...

require 'rails_helper'

RSpec.describe User do
  describe 'database constraints' do
    describe 'email' do
      it 'raises db error when email length greater than 254 characters' do
        user = build(:user, :email_255_chars)
        expect { user.save!(validate: false) }.to raise_error(ActiveRecord::StatementInvalid, /PG::CheckViolation/)
      end
    end
  end

  describe 'validations' do
    describe 'email' do
      let(:valid_but_rejected_email) { '"Some spaces! And @ sign too!" @some.server.com' }
      let(:invalid_emails) do
        %w[user@example,com user_at_foo.org user.name@example.
           foo@bar_baz.com foo@bar+baz.com foo@bar..com]
      end

      it { is_expected.to validate_presence_of(:email) }

      it { is_expected.to validate_length_of(:email).is_at_most(254) }

      it { is_expected.to allow_value(Faker::Internet.email).for(:email) }
      it { is_expected.not_to allow_values(invalid_emails, valid_but_rejected_email).for(:email) }

      it 'is expected to validate that :email is case-insenitive unique' do
        downcase_email = Faker::Internet.email.downcase
        user = create(:user, email: downcase_email)
        new_user = build(:user, email: user.email.upcase)

        expect(new_user.valid?).to be(false)
        expect(new_user.errors.full_messages).to include('Email has already been taken')
      end
    end
  end
end

@dmcsorley
Copy link

To enforce the length of the citext field in postgres you can use a check constraint. Ideally add it when you create the table, but if you didn't, you can alter table.

If altering, it should be done in two steps to avoid locking the table for a scan (and wrecking prod).

ALTER TABLE users ADD CONSTRAINT users_email_length CHECK (length(email) < 255) NOT VALID;
ALTER TABLE users VALIDATE users_email_length;

Or the equivalent Rails migration. See also Braintree's excellent post on safe postgres operations.

@brianjbayer
Copy link
Author

@dmcsorley ❤️ Thank you so much for enlightening me on this and those great links especially when dealing with making these changes to an existing production database. I added this information into this post and gave you the credit that you so deserve.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment