Skip to content

Instantly share code, notes, and snippets.

@jonathan-kosgei
Last active January 16, 2022 00:10
Show Gist options
  • Star 104 You must be signed in to star a gist
  • Fork 6 You must be signed in to fork a gist
  • Save jonathan-kosgei/dac620fed9d9aeec35050bcc0a146647 to your computer and use it in GitHub Desktop.
Save jonathan-kosgei/dac620fed9d9aeec35050bcc0a146647 to your computer and use it in GitHub Desktop.

I have been an aggressive Kubernetes evangelist over the last few years. It has been the hammer with which I have approached almost all my deployments, and the one tool I have mentioned (shoved down clients throats) in almost all my foremost communications with clients, and it was my go to choice when I was mocking my first startup (saharacluster.com).

A few weeks ago Docker 1.13 was released and I was tasked with replicating a client's Kubernetes deployment on Swarm, more specifically testing running compose on Swarm.

And it was a dream!

All our apps were already dockerised and all I had to do was make a few modificatons to an existing compose file that I had used for testing before prior said deployment on Kubernetes.

And, with the ease with which I was able to expose our endpoints, manage volumes, handle networking, deploy and tear down the setup. I in all honesty see no reason to not use Swarm. No mission-critical feature, or incredibly convenient really nice to have feature in Kubernetes that I'm going to miss; except perhaps the Kube admin dashboard, and heapster but even those have ready replacements. Weave scope for Kube admin (admittedly not as pretty) and I could easily setup my own ELK to monitor my containers.

The moment it dawned on me how simple swarm was, was when I realised that all I had to do to expose an nginx service publicly was to publish the ports in my compose file. It hit me again when I attempted to create a number of replicas for the nginx service fully expecting to run into this Kubernetes like error Pod Deploy - Failed to fit in any node - PodFitsHostPorts that has frustrated me before, but no. It worked! It just worked! And to boot Docker intelligently loadbalanced the requests from all my nodes (nginx was accessible from all the node ips in the Swarm at port 80/443) to the various containers in the service.

Anyone who has used Kubernetes on any long-term large-scale project knows what a pain this is. If you're not on AWS or GCE and can't create a Loadbalancer service, where Kubernetes will provision an elastic ip address for your service (which you have to pay for), and you're not okay with having to access your service on a weird random port between default:30000-32767, then you have to deal with the fickle beast that is Kubernetes Ingresses. To elaborate on how many steps it takes to come close to replicating what I had achieved on Swarm with three lines in my compose file.

You'd have to do the following on Kubernetes

  1. Create the ingress controller

  2. Work on your Kubernetes yaml spec and define an ingress resource jumping around from the various documentation sources online

  3. A bit of trial and error here to get to where you can create your ingress without error

  4. Realising that to use paths i.e. example.com/path you have to create a "path" directory in the friggin /usr/share/nginx/html in the nginx container

Full docs here Kubernetes Ingresses

In short, exposing services to the outside world in Kubernetes is a pain! While with Docker however all it takes is

services:
  nginx:
    ports:
      - "80:80"
      - "443:443"
    ...

And it works! It just works!

How to use volumes in Swarm

version: '3'
volumes:
  poc:
services:
  redis:
    volumes:
      - poc:/redis

How to use volumes in Kubernetes :(

  1. Create the pv sample

  2. Create the pvc sample

  3. Create the deployment specifying your pvc sadness sample

Configs

I am incredibly appreciative of how easily I can eyeball my entire deployment in Swarm, ports, volumes, services, dependencies, images etc etc as all the config is in one docker-compose.yml file of reasonable length as opposed to the countless files covering everything from pvs, pvcs, deployments, statefulsets etc in Kubernetes. You can define everything in one file in Kubernetes but it won't do much for readability.

[Think how many lines you need to pore through to get to the container image you're using]

Deploying and cleaning up

You have to run kubectl create -f more than once. You know it. I know it. (Unless of course you put everything into one file and trade your readability for convenience).

With compose on Swarm however, all you have to do is: docker stack deploy --compose-file=docker-compose.yml <stack-name> You could delete your app's entire namespace in Kubernetes to cleanup, but what if that's not you want? Or you didn't have the foresight to deploy your app in a separate namespace? You'd again have to run a number of kubectl delete -f commands to delete everything. You could run kubectl delete -f on a single directory with all your app's kube config files in there. You could do that. Or you could use Swarm and be able to, docker stack rm <stack-name> And have a life.

BONUS:

Pure Docker goodness

It is beyond satisying to use Docker and only Docker. To spin up a fresh vm and install Docker and only Docker. And be able to do everything you need. I have spent countless hours writing salt files and ansible playbooks to automate installing Kubernetes. You could use kargo, or kops, but all I have to do to start a Swarm cluster is install Docker and run docker Swarm init. What more could anyone want!

What Kubernetes could do:

Humans should not have to write/read config files. If there was a way I could easily deploy a Kubernetes cluster (Hint: Make up-to-date repositories for your software available from the distros repos), and not have to write any configs, that would be great.

@rajcheval
Copy link

Docker swarm is easy to get started with. Trouble starts when you are performance testing a solution that leverages routing mesh and networking. We were running in to issues related to routing mesh. These issues appear in 15 minute load tests with 50 concurrent users. Upgrade to the latest version of Swarm Docker version 17.03 as it resolved many issues for us.

@HusseinMorsy
Copy link

I sugest to have a look at the changelog of the swarm project and the Release page on github. There is no update since 4 month.
In contrast, kubernetes is an ongoing project. Nearly every week kubernetes has a new release

@ssboisen
Copy link

ssboisen commented Jun 1, 2017

@HusseinMorsy That's docker swarm, not docker swarm mode. Someone at docker should have thought twice about naming the two similar but different products almost identically.

@bprashanth
Copy link

bprashanth commented Jun 9, 2017

Fyi this

services:
  nginx:
    ports:
      - "80:80"
      - "443:443"
    ...

is possible on kube too, eg: https://github.com/kubernetes/kubernetes/blob/master/examples/https-nginx/nginx-app.yaml#L10

Ingress solves a slightly different problem. Just pointing out that you don't need to use it if you aren't interested in, say, a cross platform abstraction to help with the nginx.conf -> cloud CDN aspect ratio, or federating clusters once you land in the clouds.

How to use volumes in Kubernetes :(
Create the pv sample
Create the pvc sample
Create the deployment specifying your pvc sadness sample

You don't have to create either if you bring up the cluster with a volume provisioner, the provisioner will just stamp out volumes/claims for you based on the specified storage-class (eg: https://github.com/kubernetes/contrib/blob/master/pets/zookeeper/zookeeper.yaml#L114).

@killcity
Copy link

killcity commented Aug 4, 2017

I'm on the swarm side as of now. That doesn't mean I won't switch to k8s at some point. Here's why: I'm surprised more people haven't embraced macvlan. Fortunately in Docker 17.0.6, macvlan in swarm was finally released. No unnecessary load balancers (dns with health checks via consul is faster), no extra L3 hops, increasing latency. Everything is just routable and "there". Each container has it's own real MAC and IP. For latency sensitive stacks, this is the way to go. You can also still use your own service discovery tools (ie: consul) and not have deal with the extra bits of internal dns and built-in service discovery that only work within the unroutable overlay networks, etc. I know macvlan support is apparently available for k8s, but I haven't heard of many people leveraging it. Not to mention, there aren't 20 containers running to support this, not knowing what half of them are doing, as you'd get with k8s.

I feel like k8s is the new Openstack. Everyone is told they need to run it, but no one really knows why. Swarm took me 5 minutes to get running. Try that with k8s, especially with kubeadm, tectonic, rancher, etc.

@Justin-DynamicD
Copy link

Your last post here interests me. At my company we just dumped k8s for swarm almost entirely due to the complexity issues you've outlined. k8s is just so hard to stand up with so many parts we found ourselves using tectonic and rancher to do that lifting only to realize we didn't really understand the platform when it came time to troubleshoot and if said tool had an issue (looking at you, rancher) we were hammered.

However I'm very interested in reading more about macvlan and consul and how you are using that to supplement swarm. While the swarm mode is dirt simple to setup (docker swarm init .... done. Seriously that easy), we are exploring broader service discovery tools and well ... sounds like you may have forged that path already.

Any links you are willing to provide?

@industrialsynthfreak
Copy link

industrialsynthfreak commented Sep 8, 2017

Hi, not so experienced swarm user here.

Well, my three current problems with the swarm mode:

  • it's still in active development: features emerge, docs change, bugs appear. For example, there's still no start delay argument in the compose file healthcheck config; "depends_on" feature doesn't work in the swarm mode, and "stack deploy" won't tell you about that - you need to read the docs carefully
  • services auto-scaling ? (no easy way here)
  • occasional connection loss between master and worker nodes (still haven't figured why this happens)

So, this may be a nice technology, but one should use the swarm carefully when going into prod.

@saada
Copy link

saada commented Sep 9, 2017

@industrialsynthfreak ... for autoscaling you can try Orbiter

@cloudbow
Copy link

I think you should not leave. Kubernetes has introduced kcompose - https://kubernetes.io/docs/tools/kompose/user-guide/#warning-about-deployment-configs

@sasavilic
Copy link

sasavilic commented Dec 8, 2017

I wish you the best luck! We have been running a several (dev/test/prod) Swarm clusters for 8 months and during that time our clusters failed apart multiple times. In the end, you get error like "dispatcher is down" and you don't have any idea what is going on. The only solution was to restart all nodes, one after another, hoping that things will fix it self. If it didn't work, then we had to remove all nodes from swarm, rebuild swarm and redeploy software. We also found a few bugs in swarm (i.e. moby/moby#33685) that are still not fixed.

We also used a nginx as reverse proxy + additional watch script to re-generate nginx config when something is deployed/removed from cluster. But since nginx will not start if it can't resolve dns name of upstream, our script had to make sure that name of service is resolvable before adding it to nginx. Yet, few times I have observed that a my reverse proxies on different node would have different configuration. I first thought it was error in script, but further manual inspection (jumping into container and executing dns resolve commands) showed that dns resolving on some nodes didn't work properly. Solution, restart all nodes, one after another, 2 times (restarting once didn't help).

Reporting such bugs is nightmare for us and nightmare for docker swarm maintainers: we don't know how to reproduce them. And even for bugs that are clearly reproducible like one a mentioned above, there is no fix for a months.

After we have switched to k8s, I haven't see cluster any cluster failure so far. The thing that I really like about k8s is that they follow simple UNIX philosophy: do the one thing and do it right. I have clear understanding what api server does, what is kube-proxy for, kubelet, etcd, etc. And if something is wrong, I have idea where to look for. With docker swarm, when something goes wrong you don't have any clue whatsoever what is going on. Only solution was to restart docker deamon itself, but that meant that all running instances on that node had to be restarted or even moved to other nodes.

@BretFisher
Copy link

BretFisher commented Feb 15, 2018

I'm a Swarm fan (really just a fan of whatever works for you). Tips for an easier time with Swarm in 2018:

  • Use 17.12 or latest stable (and try to keep it updated to latest stable). Since Swarm is key to Docker's Enterprise Edition, they are keen to fix issues, but only if you keep up with current versions. They don't backport major fixes to Edge monthly versions beyond that month.
  • Many of the issues my clients and students had in 2017 were overlay networking related and resolved by 17.12 (a bit in 17.06, then 17.09, etc.). Rolling updates are now possible without packet loss.
  • It's best to not run app workloads on managers in production. Keeping managers free to orchestrate and schedule the Swarm is a good thing. Make them smaller dedicated instances and use placement constraints to keep app workloads off of them. This helps prevent a runaway batch job or backup on your apps from consuming all cpu, memory, or networking accidentally on a manager and causing a manager outage.
  • Always use odd number of managers to keep Raft HA.
  • Don't use AWS T2 instances which are capped on CPU cycles. Raft consensus requires traffic every second to ensure health and consensus. T2's can sometimes get in the way of that with their slow/poor networking and limited CPU busts. This would show up as various issues with raft leader election and manager connection. Since switching systems to something like M3 those problems don't seem to reappear. I also wouldn't recommend anything with less then Gb networking ("High" in AWS terms).
  • If you don't want Overlay networking, you can still use bridge or macvlan drivers in Swarm that have others have mentioned... it's just not the default choice so many don't realize the flexability.
  • Great checklist of Docker production enterprise concerns (good for both Docker CE and Docker EE).

@flaviostutz
Copy link

It was a dream when Google opened Kubernetes, but since the Docker guys started Swarm it was clear that they were creating something with usability, simplicity and power in mind. Recently I used OpenShift in a project, but realized that Swarm delivers almost all its features with much less configuration and management effort.

I've been collecting tools for managing a Swarm Cluster on the following repository:
https://github.com/flaviostutz/docker-swarm-cluster

Feel free to contribute your experience there!

@micw
Copy link

micw commented Nov 30, 2018

Kubernetes supports direct port expose as well as direct volume allocation, exactly as docker-compose / swarm does. The reason for having services, ingresses, pvs and pvcs is managebility.
E.g. if your pod binds to host port 80, you cannot do a rolling update without downtime. You 1st need to release the port befor you bind a new instance to the port. That's why you have services - the service points always to a healty instance of your pod. The ingress exposes the service. If you don't need it, you can simply expose the port and use a different upgrade strategy.

@trajano
Copy link

trajano commented Jun 16, 2020

Personally I am a swarm guy but if I had to comment on your last statement here.

What Kubernetes could do:
Humans should not have to write/read config files. If there was a way I could easily deploy a Kubernetes cluster (Hint: Make up-to-date repositories for your software available from the distros repos), and not have to write any configs, that would be great.

Terraform + EKS for AWS or Managed K8s from Azure. You can set up a production grade (self-updating) cluster with less headaches than deploying your own instances and installing K8S yourself. For local development, you can use minikube for a scaled down version of your production setup.

However, if you like Swarm you can look at my Terraform module https://registry.terraform.io/modules/trajano/swarm-aws/docker which sets up Docker Swarm on AWS, but it is self managed (hence cheaper) than Managed K8S servers theoretically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment