JAMSUPREME/README.md

## README.md

      
    Raw
  

              README.md
            
          
    Feature Toggle Primer

This is a quick primer on feature toggles. We'll go into:

What is a feature toggle?
Why use them?
How does it work?
How do I manage them?
What are the risks?

Terminology


Feature toggle - Put simply: A boolean that you can turn on or off.
Feature flag - Another word for "feature toggle"
Feature switch - Another word for "feature toggle"
A/B testing - This is a "specialized" type of feature toggle. A basic feature toggle is on/off, while A/B testing could be on/off/vA/vB/vC
Old toggle - The toggle has been alive for a long time (e.g. 6 months)
Stale toggle - The toggle has not been accessed for a long time, or perhaps was never accessed

What is a feature toggle?

Simply put, it's a boolean value somewhere (database, HTTP API, etc.) that determines whether you utilize a feature or not.
Why use them?

The big advantage of feature toggles is that you can constantly push your code into production safely. With feature toggles, you don't need to worry (as much) about breaking production because your code is safely behind the toggle.

OFF by default
Can be turned on for single users to verify in prod
Can be rolled back instantly if an issue is found
Can be scheduled if it is maintenance-related
Enables code to be deployed at any time when safely behind a toggle

How does it work?

To illustrate why this approach is so convenient, we'll compare it to some other strategies.
API

Let's look at how you might have done "versioning" the old way in an API vs. how you might do it with toggles
With a config

# This style is dependent on re-deploying with updated configs,
# or using some tooling to dynamically modify the configs

# GET /users
def get_users
  if config.use_v1
    old_client.get_users()
  else
    new_client.get_users()
  end
end

With versioning

# This style keeps your old version "safe", 
# while adding effort for clients to migrate

# GET /users?v=1
# GET /v1/users
def get_users_v1
  old_client.get_users()
end

# GET /users?v=2
# GET /v2/users
def get_users_v2
  new_client.get_users()
end

With feature toggles

For the sake of argument, let's say we want to be "safe" and expose a new version, but we don't want it available to the public until we think it is stable.
# GET /users?v=1
# GET /v1/users
def get_users_v1
  old_client.get_users()
end

# GET /users?v=2
# GET /v2/users
def get_users_v2
  if check_toggle.is_v2_enabled?
    new_client.get_users()
  else
    return_404
end

A more interesting scenario

In real life, it's usually even more complicated than this, so you may end up implementing a new API version while you also upgrade authorization, so maybe you have something like this:
# GET /users?v=1
# GET /v1/users
def get_users_v1
  if check_toggle.is_auth_enabled?
    do_auth_check
    
  old_client.get_users()
end

# GET /users?v=2
# GET /v2/users
def get_users_v2
  if check_toggle.is_auth_enabled?
    do_auth_check
    
  if check_toggle.is_v2_enabled?
    new_client.get_users()
  else
    return_404
end

Front-End

With the front end, it's much the same. Your options were historically to either hit an API to check a flag or to look at a config. One thing to keep in mind with the front-end is that there is a little more risk of something getting accidentally cached.
With the front end, there are also a number of special considerations depending on which framework or libraries you are using. For the sake of simplicity, I'll illustrate this with vanilla javascript.
Without toggles

// Using config-style
if(global_config.homeV1){
  renderV1();
} else { 
  renderV2();
}

// Using an API
if(toggleClient.isEnabled("homeV1"){
  renderV1();
} else { 
  renderV2();
}

With toggles

With toggles, the concept is exactly the same for hitting an API.
// Using an API
if(toggleClient.isEnabled("homeV1"){
  renderV1();
} else { 
  renderV2();
}

For those concerned with particulars, toggleClient would do something like this under the hood:
fetch('http://toggles.com/homeV1')
  .then(response => response.json())
  .then(data => return data.enabled);

How do I manage them?

Arguably, this is the hardest part of utilizing toggles.
There are a few things to keep in mind:

Toggles should have a short lifetime if possible
Your dashboard for toggles should flag old toggles and stale toggles
You should have monitoring to detect if a toggle is being accessed
You should prevent deletion of active toggles
You should "soft delete" toggles (optionally doing permanent deletion after a period of time)
Version control the management of the toggles (creation & deletion)

What are the risks?

In my experience, here are some of the common risks with toggles:

Deleting live toggles: This is the biggest one. You must implement a way to prevent deleting live toggles, or it will eventually happen. In my experience, if a toggle hasn't been accessed for a week, it is safe to delete.
Code/Toggle mismatch: There are a couple ways this can happen:

You rolled some code back to a former version which happens to depend on a toggle that has been deleted. This is bad, because it's likely the toggle should be ON but it will instead be OFF
You deployed some code but didn't create the toggle. This is usually OK because you will just make the toggle and turn it on.


Bad default behavior: Similar to the previous problem, if you don't define your "default" behavior for when a toggle is not found or the API is unavailable, you will have a bad time.
Old toggles: If you keep a toggle in the codebase after it has been launched and you have no intent of rolling it back, you're just creating technical debt.
Stale toggles: Stale toggles will clutter up your dashboard if you don't get rid of them, but otherwise pose minimal "real" risk
No monitoring: If you don't monitor your apps in some way, you might have subtle issues if toggles suddenly stop working. Old features might suddenly come back, or new features suddenly disappear. It's important to monitor the toggle API and clients using it so you know that everything is behaving normally. Monitoring is a good idea no matter what.

How does it compare with existing risk?

While toggles come with risks, it is important to keep in mind the risks of a system without toggles:

Slower delivery: You will have slower delivery
Maintaining and enhancing code is harder: You will be missing a powerful mechanism to evolve your code, and so you might find yourself doing a lot more versioning or doing "big cutovers" that have a lot of code and risk inherent in them
Hard-coding: You may end up with "hard-coded" behavior that requires re-deploys or other maintenance. It's generally easier to go to you feature toggle dashboard to see the state of things than SSH-ing onto machines to check their configuration.

Vendors

Various paid vendors:

https://www.optimizely.com (free tier)
https://www.split.io/ (free tier)
https://rollout.io/
https://launchdarkly.com/

Open source variants


https://github.com/Unleash/unleash
https://cloud.spring.io/spring-cloud-config/reference/html/ (Spring Cloud config isn't specifically targeted as a feature flag framework, but it can be leveraged as one quite easily, especially if you already use it for configuration)
http://featureflags.io/resources/
etc.

Additional Resources


Optimizely toggle primer - This is another good little primer on feature toggles with some optimizely specifics
Build or buy? - Explanation of why to use the API vs. rolling your own