Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save kubopanda/470b24eb89407bf6007f59883c198389 to your computer and use it in GitHub Desktop.
Save kubopanda/470b24eb89407bf6007f59883c198389 to your computer and use it in GitHub Desktop.
Q & A from Traefik Online Meetup: Kubernetes, Ingress and Traefik Usage at CERN

Q & A from Traefik Online Meetup: Kubernetes, Ingress and Traefik Usage at CERN

Question: What is your Kubernetes upgrade strategy? 
Do you have Dev and Staging area as well?

Answer: We recommend to our users to upgrade by deploying new clusters and redirecting traffic gradually to the new resources - and move capacity from one cluster to the other along that. This requires users set external LB instances for their services. For cases where in-place upgrades are required, we recommend prod/staging clusters - usually staging takes ~10% of the traffic/requests. We do use grafana for visualization, on top of Prometheus.

Question: What are you using for Prometheus aggregation? Is that Thanos/Cortex?

Answer: We have an internal solution where a central Prometheus instance queries each cluster that is subscribed to central metric collection. The usual long term aggregation is done with 1h granularity (vs 10min for the in cluster metrics).

Question: Do you have any CI/CD tools?

Answer: We promote gitops, which is something people were already used to in the past with other config management tools. We have people using Flux with Helm and support that solution, but a couple users rely on ArgoCD. In other cases, we have the tooling using a push model keeping the credentials in gitlab-ci and pushing on change.

Question: How do you push the locally hosted cluster logs to a centralized infrastructure?

Answer: We rely on fluentd, and publish using the http plugin via a http gateway maintained by our monitoring team. This allows us to push data to multiple backends with a single configuration - in our case it means elastic search and hdfs.

Question: How do you setup HTTPS ingress using certmanger and Traefik CRD?

Answer: We rely on the traefik built in ACME integration to generate the certs.

Question: Have you considered something like MetalLB for your ingress controllers?

Answer: No.

Question: How do you manage at CERN the different networks isolations between your projects? And the fact for Traefik to access your projects across the clusters/vms?

Answer: CERN has a particular networking setup with a flat network where almost all instances get a publicly routable IP address. This makes the setup easier for Traefik, but isolation is limited within the datacenter. To access the services across clusters, we rely on our internal LBaaS service.

Question: Are your nodes with the ingress role doing ONLY ingress? How big are those nodes? Are they just network-optimized to forward traffic along?

Answer: We do not have dedicated nodes for Ingress. The size is variable and depends on the use case. For services or applications that are network bound, we deploy them with host network bypassing the network namespace for improved performance.

Question: Who manages the DNS?

Answer: Our network team manages the DNS, and we use an API provided by them to update the device information. There is no support for users to manage their own subdomains today.

Question: Do you have a use case for stateful workloads running in Kubernetes like any database with headless service and TCP based ingress? And, how does traefik support these services?

Answer: We have a few cases where people are running databases on Kubernetes - MongoDB, Cassandra. For those cases we need Ingress with TCP and rely on nginx-ingress as support in Traefik was not available at the time.

Question: How do you manage & segregate the cluster and related resources like ingress in 1000s? is it divided among a group of people? (multi tenancy)

Answer: We offer a “cluster-as-a-service” model to our users, and in most cases multi-tenancy is achieved by splitting services in different clusters. We’re starting to have cases where for different reasons (mostly improved resource usage) we use namespaces, resource quotas, and OIDC to integrate with our internal auth/authz service.

Question: What is the database in use for Kubernetes clusters and for the database?

Answer: We rely on etcd.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment