geokal/Service Discovery Comparison.md

## Service Discovery Comparison.md

      
    Raw
  

              Service Discovery Comparison.md
            
          
    What do Etcd, Consul, and Zookeeper do?


Service Registration:

Host, port number, and sometimes authentication credentials, protocols, versions
numbers, and/or environment details.


Service Discovery:

Ability for client application to query the central registry to learn of service location.


Consistent and durable general-purpose K/V store across distributed system.

Some solutions support this better than others.
Based on Paxos or some derivative (i.e. Raft) algorithm to quickly converge to a consistent state.
Centralized locking can be based on this K/V store.


Leader Election:

Not to be confused with leader election within the quorum of Etcd/Consul nodes. This is an
implementation detail that is transparent to the user. What we are talking about here is leader
election among the services that are registered against Etcd/Consul.
Etcd tabled their leader election module until the API stabilizes.


Other non-standard use cases:

Distributed locking
Atomic broadcast
Sequence numbers
Pointers to data in eventually consistent stores.


How do they behave in a distributed system?


All of the solutions under consideration are primarily CP systems in the CAP context.
That is, they favor consistency over availability. This means that all nodes have a
consistent view of written data but at the expense of availability in the event that
a network partitions occurs (i.e. loss of node).

Some of these solutions will support "stale reads" in the event of node loss.


Each solution can work with only one node. It is generally advised that we have one etcd/
consul per VM/physical host. We do not want to have an etcd/consul per container!

Immediate problems that we are trying to solve:


Get and set dynamic configuration across a distributed system (e.g. things in moc.config.json):

This is perhaps the most pressing problem that we need to solve.
An SCM tool like Puppet/Anisble are great for managing static configurations but
they are too heavy for dynamic changes.


Service registration:

We need to be able to spin up a track and have services make themselves visible
via DNS.
This would be useful primarily outside of production where we would want to regularly
spin up and destroy tracks.
That said, we don't have a highly-distributed and elastic architecture so we could get
by without this for a while.


Service discovery:

Services must be able to determine which host to talk to for a particular service.
This may not be as important for production if we have a loadbalancer. In fact, a
loadbalancer would be more transparent to our existing apps as they work at the IP level.
That said, we don't have a highly-distributed and elastic architecture so we could get
by without this for a while.


Features that we don't need for now:


Leader election. Many of our apps are currently not designed to scale horizontally.
However, it should be noted that Consul has the ability to select a leader based on
health checks.

Problems that these tools are not designed to solve:


Load-balancing.

Things that I've explored:

Etcd:

Basic info:


Service registration relies on using a key TTL along with heartbeating from the service
to ensure the key remains available. If a services fails to update the key’s TTL, Etcd
will expire it. If a service becomes unavailable, clients will need to handle the
connection failure and try another service instance.
There would be a compelling reason to favor Etcd if we ever planned to use CoreOS
but I don't see this happening anytime soon.

Pros:


Service discovery involves listing the keys under a directory and then waiting for
changes on the directory. Since the API is HTTP based, the client application keeps a
long-polling connection open with the Etcd cluster.
Has been around for longer than Consul. 150% more github watches/stars.
3 times as many contributors (i.e. more eyes) and forks on github.
Cons:


There are claims that the Raft implementation used by Etcd (go-raft) is not quite right (unverified).
Immature, but by the time its use is under consideration in production, it should
have reached 1.0.
Serving DNS records from Etcd may require a separate service/process (verify):

http://probablyfine.co.uk/2014/03/02/serving-dns-records-from-etcd/
SkyDNS is essentially DNS on top of Etcd


Consul:

### Pros:
  - Has more high-level features like service monitoring.
  - There is another project out of Hashicorp that will read/set environment variable
    for processes from Consul.
    - https://github.com/hashicorp/envconsul
  - Better documentation.
    - I had an easier time installing and configuring this over Etcd, not that Etcd was
      particularly hard. Docs make all the difference.
    - Stuff like this makes me want to shed a tear. I commend the KIDS at Hashicorp.
      - http://www.consul.io/docs/internals/index.html
  - You can make DNS queries directly against Consul agent! Nice! No need for SkyDNS or Helix
  - We can add arbitrary checks! Nice, if we are into that sort of thing.
  - Understands the notion of a datacenter. Each cluster is confined to datacenter but the
    cluster is able to communicate with other datacenters/clusters.
    - At Skybox, we might use this feature to separate docker tracks, even if they live on same host.
  - It has a rudimentary web UI:
    - http://demo.consul.io/ui/
### Cons:
  - There are claims that Consul's implementation of Raft is better (unverified).
  - Immature. Even younger than Etcd (though there are no reason to believe that there are problems with it).

Etcd and Consul similarities:


HTTP+JSON based API. Curl-able.
Docker containers can talk directly with Etcd/Consul over the docker0 interface (i.e. default gateway).
Atomic look-before-you-set:

Etcd: Compare-and-set by both value and version index.
Consul: Check-and-set by sequence number (ModifyIndex)


DNS TTLs can be set to something VERY low.

Etcd: supports TTL (time-to-live) on both keys and directories, which will be honoured:
if a value has existed beyond its TTL
Consul: By default, serves all DNS results with a 0 TTL value


Has been tested with Jepsen (tool to simulate network partitions in distributed databases).

Results were not 100% for either but still generally promising.
https://news.ycombinator.com/item?id=7884640


Both work with Confd by Kelsey Hightower.

A tool that watches Etcd/Consul and modifies config files on disk.
https://github.com/kelseyhightower/confd


Long polling for changes:

Etcd: Easily listen for changes to a prefix via HTTP long-polling.
Consul: A blocking query against some endpoints will wait for a change to potentially
take place using long polling.


Things that I have not explored:


SkyDNS: Anyone have good input on this one?
Zookeeper: It seems mature but it would take a lot more work to make it work for us.

We would be have to configure and use it without high-level features.
Provides only a primitive K/V store.
Requires that application developers build their own system to provide service discovery.
Java dependency (and Dan Streit hates Java)
All clients must maintain active connections to the ZooKeeper servers, and perform keep-alives.
Zookeeper not recommended for virtual environments? Why? I just read this somewhere.


Corosync/Pacemaker (not sure if this is a viable solution, actually)
Redis is not viable! It is an in-memory K/V that does not persist data. Nope.
Smartstack + Synapse + Nerve from AirBnB (not viable as it only does TCP through HAproxy).

Ruby dependencies and many moving parts.


References:


http://www.hashicorp.com/blog/twelve-factor-consul.html   (heroku's excellent 12-factor thing).
http://12factor.net/
http://www.consul.io/intro/vs/index.html
http://www.consul.io/docs/internals/index.html
https://news.ycombinator.com/item?id=7604787
https://news.ycombinator.com/item?id=7623317
https://news.ycombinator.com/item?id=7884640
http://www.activestate.com/blog/2014/03/brandon-philips-explains-etcd
http://jpmens.net/2013/10/24/a-key-value-store-for-shared-configuration-etcd-confd/
http://igor.moomers.org/smartstack-vs-consul/
http://jasonwilder.com/blog/2014/02/04/service-discovery-in-the-cloud/
http://nerds.airbnb.com/smartstack-service-discovery-cloud/