Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
etcd vs consul vs ???
- What do Etcd, Consul, and Zookeeper do?
- Service Registration:
- Host, port number, and sometimes authentication credentials, protocols, versions
numbers, and/or environment details.
- Service Discovery:
- Ability for client application to query the central registry to learn of service location.
- Consistent and durable general-purpose K/V store across distributed system.
- Some solutions support this better than others.
- Based on Paxos or some derivative (i.e. Raft) algorithm to quickly converge to a consistent state.
- Centralized locking can be based on this K/V store.
- Leader Election:
- Not to be confused with leader election within the quorum of Etcd/Consul nodes. This is an
implementation detail that is transparent to the user. What we are talking about here is leader
election among the services that are registered against Etcd/Consul.
- Etcd tabled their leader election module until the API stabilizes.
- Other non-standard use cases:
- Distributed locking
- Atomic broadcast
- Sequence numbers
- Pointers to data in eventually consistent stores.
- How do they behave in a distributed system?
- All of the solutions under consideration are primarily CP systems in the CAP context.
That is, they favor consistency over availability. This means that all nodes have a
consistent view of written data but at the expense of availability in the event that
a network partitions occurs (i.e. loss of node).
- Some of these solutions will support "stale reads" in the event of node loss.
- Each solution can work with only one node. It is generally advised that we have one etcd/
consul per VM/physical host. We do not want to have an etcd/consul per container!
- Immediate problems that we are trying to solve:
- Get and set dynamic configuration across a distributed system (e.g. things in moc.config.json):
- This is perhaps the most pressing problem that we need to solve.
- An SCM tool like Puppet/Anisble are great for managing static configurations but
they are too heavy for dynamic changes.
- Service registration:
- We need to be able to spin up a track and have services make themselves visible
via DNS.
- This would be useful primarily outside of production where we would want to regularly
spin up and destroy tracks.
- That said, we don't have a highly-distributed and elastic architecture so we could get
by without this for a while.
- Service discovery:
- Services must be able to determine which host to talk to for a particular service.
- This may not be as important for production if we have a loadbalancer. In fact, a
loadbalancer would be more transparent to our existing apps as they work at the IP level.
- That said, we don't have a highly-distributed and elastic architecture so we could get
by without this for a while.
- Features that we don't need for now:
- Leader election. Many of our apps are currently not designed to scale horizontally.
However, it should be noted that Consul has the ability to select a leader based on
health checks.
- Problems that these tools are not designed to solve:
- Load-balancing.
- Things that I've explored:
- Etcd:
- Basic info:
- Service registration relies on using a key TTL along with heartbeating from the service
to ensure the key remains available. If a services fails to update the key’s TTL, Etcd
will expire it. If a service becomes unavailable, clients will need to handle the
connection failure and try another service instance.
- There would be a compelling reason to favor Etcd if we ever planned to use CoreOS
but I don't see this happening anytime soon.
- Pros:
- Service discovery involves listing the keys under a directory and then waiting for
changes on the directory. Since the API is HTTP based, the client application keeps a
long-polling connection open with the Etcd cluster.
- Has been around for longer than Consul. 150% more github watches/stars.
- 3 times as many contributors (i.e. more eyes) and forks on github.
- Cons:
- There are claims that the Raft implementation used by Etcd (go-raft) is not quite right (unverified).
- Immature, but by the time its use is under consideration in production, it should
have reached 1.0.
- Serving DNS records from Etcd may require a separate service/process (verify):
- SkyDNS is essentially DNS on top of Etcd
- Consul:
- Pros:
- Has more high-level features like service monitoring.
- There is another project out of Hashicorp that will read/set environment variable
for processes from Consul.
- Better documentation.
- I had an easier time installing and configuring this over Etcd, not that Etcd was
particularly hard. Docs make all the difference.
- Stuff like this makes me want to shed a tear. I commend the KIDS at Hashicorp.
- You can make DSN queries directly against Consul agent! Nice! No need for SkyDNS or Helix
- We can add arbitrary checks! Nice, if we are into that sort of thing.
- Understands the notion of a datacenter. Each cluster is confined to datacenter but the
cluster is able to communicate with other datacenters/clusters.
- At Skybox, we might use this feature to separate docker tracks, even if they live on same host.
- It has a rudimentary web UI:
- Cons:
- There are claims that Consul's implementation of Raft is better (unverified).
- Immature. Even younger than Etcd (though there are no reason to believe that there are problems with it).
- Etcd and Consul similarities:
- HTTP+JSON based API. Curl-able.
- Docker containers can talk directly with Etcd/Consul over the docker0 interface (i.e. default gateway).
- Atomic look-before-you-set:
- Etcd: Compare-and-set by both value and version index.
- Consul: Check-and-set by sequence number (ModifyIndex)
- DNS TTLs can be set to something VERY low.
- Etcd: supports TTL (time-to-live) on both keys and directories, which will be honoured:
if a value has existed beyond its TTL
- Consul: By default, serves all DNS results with a 0 TTL value
- Has been tested with Jepsen (tool to simulate network partitions in distributed databases).
- Results were not 100% for either but still generally promising.
- Both work with Confd by Kelsey Hightower.
- A tool that watches Etcd/Consul and modifies config files on disk.
- Long polling for changes:
- Etcd: Easily listen for changes to a prefix via HTTP long-polling.
- Consul: A blocking query against some endpoints will wait for a change to potentially
take place using long polling.
- Things that I have not explored:
- SkyDNS: Anyone have good input on this one?
- Zookeeper: It seems mature but it would take a lot more work to make it work for us.
- We would be have to configure and use it without high-level features.
- Provides only a primitive K/V store.
- Requires that application developers build their own system to provide service discovery.
- Java dependency (and Dan Streit hates Java)
- All clients must maintain active connections to the ZooKeeper servers, and perform keep-alives.
- Zookeeper not recommended for virtual environments? Why? I just read this somewhere.
- Corosync/Pacemaker (not sure if this is a viable solution, actually)
- Redis is not viable! It is an in-memory K/V that does not persist data. Nope.
- Smartstack + Synapse + Nerve from AirBnB (not viable as it only does TCP through HAproxy).
- Ruby dependencies and many moving parts.
- References: (heroku's excellent 12-factor thing).
Copy link

timhaak commented Nov 23, 2014


I think

You can make DSN queries directly against Consul agent! Nice! No need for SkyDNS or Helix

Should be

You can make DNS queries directly against Consul agent! Nice! No need for SkyDNS or Helix

Copy link

timhaak commented Nov 23, 2014

Oh another point though agree redis is not a good solution for this it is actually persist data. Which can be tuned from paranoid to off.

Copy link

Thanks for such detail investigation!

Copy link

dreampuf commented Mar 9, 2015

Thank you experienced share.

Copy link

Nice post, thanks !
Some months ago, i retained that Etcd behave better with only a few number of nodes in his cluster (less than ten). Consul with its agents (client or server) strategy seems to be able to handle a cluster with much more nodes. Which is a Pro for generic service discovery/monitoring. At least on the paper ;-)

Copy link

let4be commented Jan 28, 2016

Redis does persist data via RDB(in-memory snapshot) and AOF(can persists each second - thus up to 1 second of data loss or after each write - slow) and with settings can provide reliability similar to sql databases

Copy link

How about change the filename suffix to md

Copy link

Redis is non-volatile, not volatile like you describe. Changes absolutely get written to disk.

Copy link


Copy link

rresino commented Sep 5, 2017

All etcd v3 API's are defined in gRPC services.
Pro: It's faster
Cons: It's only grpc compatible, and it's not so compatible like rest apis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment