Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
SmartStack vs Consul
Sent 5/1/2014
Hey Igor,
Glad you did a write up! I’m one of the authors of Consul. You mention we get some
things wrong about SmartStack, but we would love to get that corrected. The website
is generated from this file:
I would appreciate a PR with changes, or if you could just let me know I will gladly
update the page to be correct.
That said, there are some descriptions of Consul that are incorrect in your writeup that
I wanted to reach out to you about. Firstly, the gossip protocol is not used to propagate
all the information. It is actually only used for server membership (initial discovery of the
live servers), and as a failure detector within a datacenter. Instead, most all the data is
replicated among the servers in a strongly consistent manner using Raft. This means in
effect the Consul servers act exactly like the ZK nodes in SmartStack.
I’m not sure why the service discovery sentence does not make sense either. Synapse
must be configured a priori, while Consul does not. This couples synapse with the services
on the box. If I dynamically deploy a new service or an update to a service which changes
my upstream dependencies, that coupling enfaces an ordering: synapse must be updated
before the new service, or the service resolution will fail. Consul does not make that assumption
and can handle dynamic topologies or changing use patterns more easily. If the grammar is
unclear, I’m certainly open to suggestions.
Also, I think describing the tag discovery as “complex” is a bit overreaching, unless you
mean a simple string equality check is actually that complex. A service is only “just a service”
if it lacks any state, and is never updated. In practice, you may have multiple versions of a
service, or certain nodes may contain state. A service is not always perfectly homogenous.
I also agree that in many cases using HAProxy is much more reasonable that relying on
DNS. However, I see no reason why Synapse cannot simply source it’s information from
With respect to health checking, Nagios is not actually used as part of Consul. There is
no need to worry about Nagios being down or overloaded. Consul uses the same API
as nagios plugins, which allow that ecosystem to be used, but the nagios servers and
agents are not actually running. Consul simply fork/exec’s and runs your check (which
may be a nagios plugin). Nothing very complex. Debugging is as simple as running your
check, or reading the Consul logs. There is also nothing that Consul does to promote
“complex health checks”. It just supports node and service level checks. They are only
as complex as you make them.
Lastly, with multi-datacenter, the gossip is not being used to replicate data across the WAN.
In fact, Consul operates EXACTLY as how you suggest the correct approach to be. There is
a local service cluster per datacenter. Instead of a global ZK cluster, that relies on gossip
to be aware of other data centers, which is much less brittle than a centralized global ZK.
Consul even promotes (and defaults) to only talking to local services, so there is no configuration
required. I’m not sure where the overlap with configuration management is in this case.
Again, please let me know what I can correct on our write up, as we intend it to be
an honest statement of fact. Thanks.
Best Regards,
Armon Dadgar
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment