dPacc/ML_Service_VS_Streamining.md

## ML_Service_VS_Streamining.md

      
    Raw
  

              ML_Service_VS_Streamining.md
            
          
    Why choose Streaming Endpoints over Web Services?

Web services are great when you don’t have a lot of requests to serve at the same time. Not only that,
but it is also a popular form of deployment because it’s the easiest and fastest solution. All you have
to do is spin up a Flask service and you have a full functioning model. But, most large scale enterprises
and consumer technologies, a Web Service deployment won’t suffice. The problems begin to occur when you have
tons of data coming in every minute. You will see over time that your model will start failing and will send
errors back. If you’re lucky and your model is layered on a Kubernetes architecture you can easily scale
the number of pods more and more. This will cause you to eat up much more resources, and will likely require
you to implement retries mechanism. Nevertheless, you’ll also need to store your inputs and outputs for
each model – meaning you need to add another layer of architecture on top of it. In terms of efficiency
and scalability, Web Services fall short and often end up spiraling into more and more errors.
This is where Streaming Endpoints shine. While streaming your data to your model – you will be able to handle
30% more requests for each pod. In addition,** you don’t need to handle errors simply because you won’t have any! **
You can also scale up the number of pods that listen to your Kafka inputs, and all the inputs/outputs are stored
in the Kafka topics – so you can always come back to it and understand it.
Streaming Endpoints are a great solution for consumer technologies and large scale applications.
This architecture is ideal for recommender systems and event-based predictions that require high throughput,
low latency and fault tolerant environments.