Skip to content

Instantly share code, notes, and snippets.

@dPacc
Created September 20, 2021 09:55
Show Gist options
  • Save dPacc/71ff7a76250a6074c474ee7ce4fc7af9 to your computer and use it in GitHub Desktop.
Save dPacc/71ff7a76250a6074c474ee7ce4fc7af9 to your computer and use it in GitHub Desktop.
ML Web Service vs Streaming

Why choose Streaming Endpoints over Web Services?

Web services are great when you don’t have a lot of requests to serve at the same time. Not only that, but it is also a popular form of deployment because it’s the easiest and fastest solution. All you have to do is spin up a Flask service and you have a full functioning model. But, most large scale enterprises and consumer technologies, a Web Service deployment won’t suffice. The problems begin to occur when you have tons of data coming in every minute. You will see over time that your model will start failing and will send errors back. If you’re lucky and your model is layered on a Kubernetes architecture you can easily scale the number of pods more and more. This will cause you to eat up much more resources, and will likely require you to implement retries mechanism. Nevertheless, you’ll also need to store your inputs and outputs for each model – meaning you need to add another layer of architecture on top of it. In terms of efficiency and scalability, Web Services fall short and often end up spiraling into more and more errors.

This is where Streaming Endpoints shine. While streaming your data to your model – you will be able to handle 30% more requests for each pod. In addition,** you don’t need to handle errors simply because you won’t have any! ** You can also scale up the number of pods that listen to your Kafka inputs, and all the inputs/outputs are stored in the Kafka topics – so you can always come back to it and understand it.

Streaming Endpoints are a great solution for consumer technologies and large scale applications. This architecture is ideal for recommender systems and event-based predictions that require high throughput, low latency and fault tolerant environments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment