Skip to content

Instantly share code, notes, and snippets.

@brunowdev
Last active November 27, 2020 22:29
Show Gist options
  • Save brunowdev/4ee15bc1f338a6cace7edcc5bef58ca1 to your computer and use it in GitHub Desktop.
Save brunowdev/4ee15bc1f338a6cace7edcc5bef58ca1 to your computer and use it in GitHub Desktop.

Hi Cecilia, about your questions: 1 - I mean an Endpoint, is like an "API" that you could use behind an API Gateway to invoke your model. The presumed behavior within an API I'ts to invoke and get the result (this will be the output). So, your Lambda could take the endpoint response and save to S3, put in a Queue or Topic, or even return to the API Gateway (assuming that you have an HTTP API that invokes this endpoint). About batch predictions, usually, you will use them for asynchronous inferences. For example, I have a project to predict dropouts. After all the classes ended, we trigger an asynchronous batch prediction job that will get all the students' data and call the inference model. Since we have thousands of student profiles, this endpoint will be called with batches of 1000 profiles for each batch and store the results in the S3 bucket. This action triggers the remaining pipeline steps which the new scores will be saved in our database.

For this project, this strategy is more cost-effective since we don't need an endpoint (which, in reality, is an EC2 container running your model) 24x7.

Another case we have here is a regression model for real state evaluation. In this project, we have an application where the user will fill in some information about the property he is selling and click on a button to get the price prediction. So, in this case, we do need an endpoint running 24x7. The Lambda returns the JSON response and the applications displays (here, we don't need to store in the S3).

2 - Your comprehension of the batch prediction is correct. About the security part, well, you need to secure the components involved in your project. For example, you will need to trigger the predictions job (here, we usually use an S3 event). So, your application will trigger "some event" or call directly a lambda that can call the start of the batch job. An interesting point is your job will be isolated on your VPC, so the security level will depend on how you manage the IAM roles. For example, who can view the buckets with the input/output data, the Cloudwatch logs, or even trigger the batch predictions job.

If you have an endpoint running 24x7 with a Lambda, you usually will protect your inference API with API Gateway authentication/authorization.

If you use a batch prediction job, you can view it as a Jenkins/Gitlab job that will do the work and be terminated. So the security points are actually on the who can view, trigger, or access the input/output info.

Here, we have some projects in which we use the Sagemaker just for training the model. Since we have a Kubernetes cluster, we realized that it is more cost-effective to have a container in which we define the model version on S3, and on each deployment, we load the model (a tar file with a sklearn model serialized). We have 2/3 replicas of a simple python application that receives a payload from a web application, makes the inference, and returns.

Since the models are stored on the S3, you can download and use them as you need.

About the step functions, for me, it isn't clear where they will help you if you have an endpoint. You can emulate the batch prediction behavior with a StepFuncion, which checks something and then terminates the endpoint. But in this case, it will be more simple just using the batch predictions job.

I hope I have answered your questions. Let me know if not. English is not my first language. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment