Skip to content

Instantly share code, notes, and snippets.

@JAMSUPREME
Last active June 5, 2020 18:54
Show Gist options
  • Save JAMSUPREME/8c4450b584c8a72170f2be84b458b3ff to your computer and use it in GitHub Desktop.
Save JAMSUPREME/8c4450b584c8a72170f2be84b458b3ff to your computer and use it in GitHub Desktop.
Feature toggle primer

Feature Toggle Primer

This is a quick primer on feature toggles. We'll go into:

  • What is a feature toggle?
  • Why use them?
  • How does it work?
  • How do I manage them?
  • What are the risks?

Terminology

  • Feature toggle - Put simply: A boolean that you can turn on or off.
  • Feature flag - Another word for "feature toggle"
  • Feature switch - Another word for "feature toggle"
  • A/B testing - This is a "specialized" type of feature toggle. A basic feature toggle is on/off, while A/B testing could be on/off/vA/vB/vC
  • Old toggle - The toggle has been alive for a long time (e.g. 6 months)
  • Stale toggle - The toggle has not been accessed for a long time, or perhaps was never accessed

What is a feature toggle?

Simply put, it's a boolean value somewhere (database, HTTP API, etc.) that determines whether you utilize a feature or not.

Why use them?

The big advantage of feature toggles is that you can constantly push your code into production safely. With feature toggles, you don't need to worry (as much) about breaking production because your code is safely behind the toggle.

  • OFF by default
  • Can be turned on for single users to verify in prod
  • Can be rolled back instantly if an issue is found
  • Can be scheduled if it is maintenance-related
  • Enables code to be deployed at any time when safely behind a toggle

How does it work?

To illustrate why this approach is so convenient, we'll compare it to some other strategies.

API

Let's look at how you might have done "versioning" the old way in an API vs. how you might do it with toggles

With a config

# This style is dependent on re-deploying with updated configs,
# or using some tooling to dynamically modify the configs

# GET /users
def get_users
  if config.use_v1
    old_client.get_users()
  else
    new_client.get_users()
  end
end

With versioning

# This style keeps your old version "safe", 
# while adding effort for clients to migrate

# GET /users?v=1
# GET /v1/users
def get_users_v1
  old_client.get_users()
end

# GET /users?v=2
# GET /v2/users
def get_users_v2
  new_client.get_users()
end

With feature toggles

For the sake of argument, let's say we want to be "safe" and expose a new version, but we don't want it available to the public until we think it is stable.

# GET /users?v=1
# GET /v1/users
def get_users_v1
  old_client.get_users()
end

# GET /users?v=2
# GET /v2/users
def get_users_v2
  if check_toggle.is_v2_enabled?
    new_client.get_users()
  else
    return_404
end

A more interesting scenario

In real life, it's usually even more complicated than this, so you may end up implementing a new API version while you also upgrade authorization, so maybe you have something like this:

# GET /users?v=1
# GET /v1/users
def get_users_v1
  if check_toggle.is_auth_enabled?
    do_auth_check
    
  old_client.get_users()
end

# GET /users?v=2
# GET /v2/users
def get_users_v2
  if check_toggle.is_auth_enabled?
    do_auth_check
    
  if check_toggle.is_v2_enabled?
    new_client.get_users()
  else
    return_404
end

Front-End

With the front end, it's much the same. Your options were historically to either hit an API to check a flag or to look at a config. One thing to keep in mind with the front-end is that there is a little more risk of something getting accidentally cached.

With the front end, there are also a number of special considerations depending on which framework or libraries you are using. For the sake of simplicity, I'll illustrate this with vanilla javascript.

Without toggles

// Using config-style
if(global_config.homeV1){
  renderV1();
} else { 
  renderV2();
}

// Using an API
if(toggleClient.isEnabled("homeV1"){
  renderV1();
} else { 
  renderV2();
}

With toggles

With toggles, the concept is exactly the same for hitting an API.

// Using an API
if(toggleClient.isEnabled("homeV1"){
  renderV1();
} else { 
  renderV2();
}

For those concerned with particulars, toggleClient would do something like this under the hood:

fetch('http://toggles.com/homeV1')
  .then(response => response.json())
  .then(data => return data.enabled);

How do I manage them?

Arguably, this is the hardest part of utilizing toggles.

There are a few things to keep in mind:

  • Toggles should have a short lifetime if possible
  • Your dashboard for toggles should flag old toggles and stale toggles
  • You should have monitoring to detect if a toggle is being accessed
  • You should prevent deletion of active toggles
  • You should "soft delete" toggles (optionally doing permanent deletion after a period of time)
  • Version control the management of the toggles (creation & deletion)

What are the risks?

In my experience, here are some of the common risks with toggles:

  • Deleting live toggles: This is the biggest one. You must implement a way to prevent deleting live toggles, or it will eventually happen. In my experience, if a toggle hasn't been accessed for a week, it is safe to delete.
  • Code/Toggle mismatch: There are a couple ways this can happen:
    • You rolled some code back to a former version which happens to depend on a toggle that has been deleted. This is bad, because it's likely the toggle should be ON but it will instead be OFF
    • You deployed some code but didn't create the toggle. This is usually OK because you will just make the toggle and turn it on.
  • Bad default behavior: Similar to the previous problem, if you don't define your "default" behavior for when a toggle is not found or the API is unavailable, you will have a bad time.
  • Old toggles: If you keep a toggle in the codebase after it has been launched and you have no intent of rolling it back, you're just creating technical debt.
  • Stale toggles: Stale toggles will clutter up your dashboard if you don't get rid of them, but otherwise pose minimal "real" risk
  • No monitoring: If you don't monitor your apps in some way, you might have subtle issues if toggles suddenly stop working. Old features might suddenly come back, or new features suddenly disappear. It's important to monitor the toggle API and clients using it so you know that everything is behaving normally. Monitoring is a good idea no matter what.

How does it compare with existing risk?

While toggles come with risks, it is important to keep in mind the risks of a system without toggles:

  • Slower delivery: You will have slower delivery
  • Maintaining and enhancing code is harder: You will be missing a powerful mechanism to evolve your code, and so you might find yourself doing a lot more versioning or doing "big cutovers" that have a lot of code and risk inherent in them
  • Hard-coding: You may end up with "hard-coded" behavior that requires re-deploys or other maintenance. It's generally easier to go to you feature toggle dashboard to see the state of things than SSH-ing onto machines to check their configuration.

Vendors

Various paid vendors:

Open source variants

Additional Resources

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment