Skip to content

Instantly share code, notes, and snippets.

@knadh
Created March 21, 2019 08:02
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save knadh/3ce491d9ddffc6679af9e45b4bdb59f2 to your computer and use it in GitHub Desktop.
Save knadh/3ce491d9ddffc6679af9e45b4bdb59f2 to your computer and use it in GitHub Desktop.
Running multiple active publishers on a NATS cluster for failover avoding message duplication

Running multiple active publishers on a NATS cluster for failover while avoding message duplication

NATS is an excellent, clustered, full-mesh PubSub messaging system, highly performant and a cakewalk to setup. Full mesh means every node (servers and clients) knows about every other node, which is great, but makes it tricky to have multiple publishers on hot standby, for high availability of publishers (not the NATS network), while avoiding duplicate pubs.

Here --no-advertise comes in handy if we're willing to sacrifice the automatic meshing and discovery mechanism. This may be acceptable in setups where only a fixed set of NATS servers run in a cluster and whose addresses (either IPs or hostnames) are known.

--no-advertise

The gnatsd --no-advertise flag makes a NATS server not advertise itself automatically to the mesh. For other nodes to discover --no-advertise nodes, the --routes have to be explicitly specified. If there are N servers, there should be N routes.

-sl reload

gnatds -sl reload=pid makes a running NATS server reload configuration from its config file (-c) without downtime. This can be used to take out

Premise

  • A cluster of two NATS servers server0 and server1 that have N subscribers listening to the subject test.
  • A live publisher, publisher0 that is publishing on the subject test.
  • A hot standby publisher publisher1, who is also publisheing on the subject test, but whose messages should only take effect in the cluster if publisher0 goes down.

Solution

  • Each publisher gets its own local NATS server (here, dummy-nats0 and dummy-nats1 respectively for publishers publisher0 and publisher).
  • The publishers do not publish directly to the upstream cluster, but to their local NATS servers.
  • The primary publisher publisher0's dummy NATS server dummy-nats0 is clustered to the upstream NATS servers (via routes).
  • The backup publiser publisher1's dummy NATS server dummy-nats1 is not clustered to the upstream NATS servers (empty routes).
  • These configurations are specified in local configuration files.
  • When publisher0 goes down or there is a fault (assuming there's a healthcheck mechanism)
    1. Remove the upstream's NATS routes from nats-dummy0's configuration and issue a gnatsd -sl reload.
    2. Add the upstream's NATS routes to nats-dummy1 and do a gnatsd -sl reload.

The messages publisher0 had been publishing will immediately cease and make way for publisher1. Even if publisher0 or dummy-nats0 come back up, the messages will be self contained and not pushed to the cluster as the --no_advertise prevents automatic discovery and cluster formation, avoiding duplicate messages.

+-----------------------------------------------------------------------------------------------------+
|                                                                                                     |
|                                         N ... subscribers                                           |
|                                                                                                     |
+-----------------------------------------------------------------------------------------------------+
                                           -/ -\                                                  
                                         -/     -\                                                
                                       -/         -\                                              
                                     -/             -\                                            
                                   -/                 -\                                          
            +----------------------------+         +---------------------------------+
            |                            |         |                                 |
            |  NATS server0              |         |   NATS server1                  |
            |                            |         |                                 |
            |  listen         :4222      |         |   listen          :4222         |
            |  cluster-listen :4248      |---------|   cluster-listen  :4248         |
            |  no-advertise              |         |   no-advertise                  |
            |                            |         |                                 |
            |                            |         |   nats-routes     server0:4248  |
            +-------------|--------------+         +---------------------------------+
                          |                                 -----/                    
                          |                           -----/                          
                          |                     -----/                                
                          |               -----/                                      
                          |         -----/                                            
                          |   -----/                                                  
                          |--/                                                        
        +-----------------|--------------+          +--------------------------------+
        |                                |          |                                |
        |   NATS dummy-nats0             |          |   NATS dummy-nats1             |
        |                                |          |                                |
        |   listen         :4222         |          |   listen         :4222         |
        |   cluster-listen :4248         |          |   cluster-listen :4248         |
        |   no-advertise                 |          |   no-advertise                 |
        |                                |          |                                |
        |   routes         server0:4248  |          |   nats           []            |
        |                  server1:4248  |          |                                |
        +----------------|---------------+          +----------------|---------------+
                         |                                           |                
                         |                                           |                
                         |                                           |                
              +----------|----------+                     +----------|----------+     
              |                     |                     |                     |     
              |  publisher0         |                     |  publisher1         |     
              |  subject   test     |                     |  subject   test     |     
              |                     |                     |                     |     
              |                     |                     |                     |     
              +---------------------+                     +---------------------+
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment