David Aronchick
Machine learning is, at its core, just statistics.
The statistics can be really, really complicated though.
- non linear groups
- multi-dimensional models
Machine learning is a way to solve problems without explicitly knowing how to create the solution Machine learning is hard - even for Google
DIY machine learning is super common
- setup from scratch
- migrating between environments is super hard
Hosted ML also exists
- works immediately
- but it becomes bespoke quickly
- vendor lock-in :(
Value of k8s is in the extension model
- no need to fork code
Need a "cloud native ML"
- composibility
- portability
- scalability
Composability
- every business is different
- ability swap components is key
Portability
- "multi-cloud is the reality"
- 81% (!) of Enterprise are multi-cloud, avg 5 (!) cloud platforms
- Dev is an environment and it's usually totally different from stage/prod
Scalability
- More resources
- More humans
- More problems
Kubeflow (pr. Kûb Flô)
- Simplifies ML on top of k8s (re: 3 criteria above)
Demo!
- Sentiment analysis
- Demo was a video, which is good - but the video didn't work, which is bad.
- The live demo gods are angry today!
- Apparently suppoed to be a terminal workflow of deploying Kubeflow
- Kubeflow configs and sets up tensor2tensor (handy)
- "tpu" makes it all faster?
Contact
- "kubeflow" on all the things
Anthony Seure (eng at Algolia)
~2 years using k8s in prod
(One of the two projectors broke. oh no! not his fault.)
"everything at google runs in a continer" - joe beda, 2014
introduction to k8s
- software, config, tools
- architecture
- config
- describe services
- define resource (avail/min/max)
- config
future?
- dc/os
- hashi nomad
Algolia (the service) runs on bare metal, highly optimised. 1200+ servers in 70+ DCs
All user-facing stuff (website, dashboard, blog, docs, status page, etc) is in VMs w/ buckets
Backend services (logs, analytics, usage pipeline, monitoring) on VMs & k8s
k8s at Algolia two years ago:
- single-machine monolith
- analytics served from ES
entre-temps tried:
- SaaS solutions
- just couldn't deal with the volume OR way too expensive
- google cloud dataflow
- also way too expensive
now:
- migrated(ing) to k8s
- live in both google and aws at the same time. works but network fees are expensive.
infra (google):
- k8s = GKE
- nodes = GCE
- Ingress = GLB
- Docker reg = GCR
testing:
- staging is done on-demand against real-world load (!)
logging, monitoring, alerting:
- currently: stackdriver and wavefront
- considering Datadog
- shout out!
Should you use k8s?
- That's a real question. Not everybody needs it. It's a lot of work.
- Still need to pay attention to infra & classic ops concerns.
- New and exciting things that will break!
Cloud vs on-prem?
- If you're 100% k8s native, portability is possible
- Watch out for vendor extensions
- "If you can afford it, prefer IaaS providers"
Learning curve is steep
- Can you afford to invest the time and resources? again: do you really need?
- share knowledge early and often
- on-board people from other teams as early as possible
deployment
- no miracle solutions. terraform, skaffold, gcloud deplyoment manager, etc
- all tools are painful. pick one early and deal with it.
logging
- centralise from day one
- pick a good tool, you're gonna need it, esp. if you're on-prem
testing
- load and end-to-end testing
- blue/green
k8s scales, but does your app?
- watch out for static deps, threading issues, etc
@horgix (WeScale)
Buzzwords!
- tracing
- service mesh
- serverless
micro-servers:
- monoliths are dead, micro-services are the way forward
orchestration
- allocate resources to jobs
- reschedule jobs in case of failure
- bring API-centric infra
observability
- monitoring is a sub-set
logs: recording events, easy to grep
metrics: data combined from measurong events, can odent trends and context
tracing: recording events w/ causal ordering, ident cause across services
- see also: APM
- solutions: datadog, new relic, others
service mesh: routing with intelligence
- linkerd, conduit, istio, etc
serverless: Functions as a Service
- 5 years ago: run this infra-as-code that installs my app
- now: run my container
- serverless: run my code
advantages of serverless
- only pay for usage
- pretty easy to deploy
- "nano-services" ?
- open faas, openwhisk, kubeless, and the cloud providers
Ihor Dvoretskyi @idvoretsky - Dev Adv CNCF
Rise of micro-services correlates with rise of containerisation
Docker is fundamentally attractive to application developers because Docker behaves like an app, not an operating system (hmmm)
Docker is great but it doesn't scale by itself
Kubernetes started at Google and has since been "donated" to the Linux Foundation
Kubernetes "graduated" from CNCF earlier in 2018
More than 20 "platinum" companies supporting CNCF; more than 50 companies contributing code
CNCF "trail map": L.cncf.io
Many local meetups; the one here in Paris is one of the largest / most vital.
Liz Rice @lizrice
Containers from scratch
Dive into what docker run <image>
actually does at a code level
- live coding in Go
Within the app, if called with arg run
, this must actaully do something.
- i.e. fork a thread
In order to prepare the call for containerisation, we must use unix timesharing.
syscall.SysProcAttr
- 🎉 containers!
Use /proc/self/exe
to make the current process self-referential
- This way the app can set environment (namespace) then call itself within that env
/proc/
has numbered subdirs for each running process
- Within these dirs there's a tonne of FS-accessible information (in the form of files) about the running process
ps
uses this to generate output
syscall.chroot
to, well, chroot.
- Must
chdir
afterwards or sadness will prevail.
So docker run
` is effectively chrooting somewhere then forking self-referentially within that chroot.
- Ok yeah it's more than that but that's the basics.
Mounting proc
within the chroot allows access to /proc
contextually.
summary: Namespaces
- what you can see
- created with syscalls
Summary: CGroups
- what yuou can use
- filesystem interface
/sys/fs/cgroup/
: cgroups!
/sys/fs/cgroup/docker/<id>
: cgroup for that container
CGroups can be used to control the number of processes that can be run within the container
- Useful to prevent fork-bombing.
Damien Lespiau (weave works)
Open with a WeaveWorks biz & product pitch. blah.
Context: micro-services on k8s
- traffic management is a problem that increases exponentially with scale
Why load balance at L7 instead of L4?
aside: what does the cli tool hey
do?
L4 load-balancing is trickier in the world of HTTP pielining. HTTP/2 request multiplexing (gRPC) makes this worse.
Goals of load balancing:
- distribute the load fairly
- affinity
- locality
- circuit-breaking
L4 vs L7
- L4: connection-level
- L4: affinity stops at IP/port
- L7: request-level
- potentially better load distributin
- way more affinity criteria
- "passive circuit breaking"
"Sidecar Proxy" is a thing
- Damien had a hard time explaining it though
L7 proxies:
- language agnostic
- simple clients
- proxy in the data path - this might also be a disadvantage
Client-side proxies:
- need lib for each language
- no extra hop in the data path though
- full control over the desired behaviour
Look-aside load-balancing:
- "I've never seen a product implement this."
- gRPC
"reverse proxy with consistent hashing"
Hash the endpoints, plot them on a circle. Hash incoming requests, plot them on that circle. Requests will go to the endpoint, clockwise.
- Bounded loads to prevent saturation:
upperBound = c * averageLoad, c > 1
- If a given endpoint is saturated, pass to the next.
Circuit breaking
- relies on k8s
readinessProbe
- …