Service Mesh in Kubernetes: it’s not that easy
In this talk, presenters will share lessons from several years of experience running Envoy in production at scale. They will explore practical techniques for triaging issues in a service mesh, along with the intuition behind them. The presenters will cover a broad range of topics including traffic capture, issues specific to GRPC, health checks, and techniques useful during incident mitigation. The talk will end with a deep dive into Envoy stats and their use in resolving issues.
Kubernetes enables a faster, more reactive infrastructure. As Lyft migrated its applications to Kubernetes, assumptions baked into the networking layer were tested. This talk will provide a deep dive of how Lyft used Envoy’s xDS protocol to design their own flexible service mesh and handle new challenges from a multi-cluster architecture such as:
- Integrating with Kubernetes
- Routing across multiple Kubernetes clusters
- Handling Deployments
- Rapid scale-in and scale-out
- Service Discovery
- Active/Passive Health Checking
- Readiness in the service mesh
This talk will also go over changes that were made in the Envoy codebase to make this work.
This talk will not go into the migration story of legacy VMs to Kubernetes, compared to last year’s 2018 Kubecon talk, Evolving Legacy Systems into Kubernetes at Lyft: A Hybrid Environment. This talk will be an advance talk, going into the technical details of Envoy and building a multi-cluster service mesh with Kuberentes.