Skip to content

Instantly share code, notes, and snippets.

@gerhard
Last active April 25, 2022 03:18
Show Gist options
  • Save gerhard/4940366b8e8f688b8b99135ae2c2746c to your computer and use it in GitHub Desktop.
Save gerhard/4940366b8e8f688b8b99135ae2c2746c to your computer and use it in GitHub Desktop.

In this post, we will cover the new feature flags subsystem, which is part of the upcoming RabbitMQ 3.8.0. Feature flags will allow a rolling cluster upgrade to the next minor version, without requiring all nodes to be stopped before upgrading.

Upgrading from RabbitMQ 3.6.x to 3.7.x

It you had to upgrade a cluster from RabbitMQ 3.6.x to 3.7.x, you probably had to use one of the following solutions:

  • Deploy a new cluster alongside the existing one (a.k.a. blue-green deploy), then migrate data & clients to the new cluster
  • Stop all nodes in the existing cluster, upgrade the last node that was stopped first, then continue upgrading all other nodes, one-by-one

The above solutions were painful because the steps involved were complex. The new feature flags subsystem is meant to reduce this pain to the minimum.

The Feature Flags Subsystem

Feature flags are tools to help a RabbitMQ node remain compatible with other RabbitMQ nodes in a cluster, no matter their version.

For example, RabbitMQ 3.8.0 brings quorum queues. To implement them, an internal data structure and a database schema were modified. This impacts the communication with other nodes because the data structure is exchanged between nodes, and the database is replicated to all nodes.

Without the feature flags subsystem, it would be impossible to have a RabbitMQ 3.8.0 node inside a cluster where other nodes are running RabbitMQ 3.7.x. Indeed, the 3.7.x nodes would be unable to understand the data structure or the database schema from 3.8.0 node. The opposite is also true. That's why RabbitMQ today prevents this from happening by comparing versions and by denying clustering when versions are considered incompatible (the policy considers different minor/major versions to be incompatible).

New in RabbitMQ 3.8.0 is the feature flags subsystem: if you upgrade a single node in your 3.7.x cluster to 3.8.0 and restart that node, it will not immediately enable the new data structure or the new database schema because the feature flags subsystem told it not to. It could determine this because RabbitMQ 3.7.x supports no feature flags at all, therefore new features or behaviors in RabbitMQ 3.8.0 cannot be used before all nodes in the cluster are upgraded.

So after a partial upgrade of your cluster to RabbitMQ 3.8.0, all nodes are acting as 3.7.x nodes with regards to incompatible features, even the 3.8.0 one. In this situation, quorum queues are unavailable. You need to finish the upgrade of your running cluster by upgrading all nodes. When you are done, you can now decide to enable the new feature flags provided by RabbitMQ 3.8.0: one of them enables quorum queues. This is a manual operation (which can be automated outside of RabbitMQ) for now: the idea is that you need to confirm you don't have any other RabbitMQ 3.7.x nodes you plan to (re-)add to your cluster. You do that either from the CLI or from the Management plugin UI.

Once a new feature flag is enabled, it is impossible to add a RabbitMQ 3.7.x node to that cluster.

Demo with RabbitMQ 3.8.0

Let's go through a complete upgrade of a RabbitMQ 3.7.x cluster. We will take a look at the feature flags in the process.

We have the following 2-node cluster running RabbitMQ 3.7.12:

We now upgrade node A to RabbitMQ 3.8.0 and restart it. Here is what the management overview page looks like after the node is restarted:

We can see the difference of versions in the list of nodes: their version is displayed just below their node name.

The list of feature flags provided by RabbitMQ 3.8.0 is now available in the management UI on node A:

This page will not exist on node B because it is still running RabbitMQ 3.7.12.

On node A, we see that the quorum_queue feature flag is marked as Unavailable. The reason is that node B (still running RabbitMQ 3.7.12) does not known about quorum_queue feature flag, therefore node A is not allowed to use that new feature flag. This feature flag cannot be enabled until all nodes in the cluster support it.

For instance, we could try to declare a quorum queue on node A, but it is denied:

After node B is upgraded, feature flags are available and they can be enabled. We proceed and enable quorum_queue by clicking the Enable button:

Now, we can declare a quorum queue:

To Learn More

The Feature Flags subsystem documentation describes in greater details how it works and what operators and plugin developers should pay attention to.

Note that feature flags are not a guaranty that a cluster shutdown will never be required again for upgrades: the ability to implement a change using a feature flag depends on the nature of the change, and the RabbitMQ team will decide on a case-by-case basis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment