Skip to content

Instantly share code, notes, and snippets.

@timesler
Created April 21, 2023 23:03
Show Gist options
  • Save timesler/4b244a6b73d6e02d17fd220fd92dfaec to your computer and use it in GitHub Desktop.
Save timesler/4b244a6b73d6e02d17fd220fd92dfaec to your computer and use it in GitHub Desktop.
Deploy Dolly v2.0 to SageMaker
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@ulisseshen
Copy link

Thanks for the example. We have deployed many conversational models on Sagemaker. The challenge is that this way the endpoint does not stream the response and a lot of times for longer responses it times out.

You can try another conversational pattern for your server/client like a websocket

@IChr1
Copy link

IChr1 commented May 30, 2023

Has anyone used an inference config for the code as seen above so that the model can handle embeddings ?

@ybm11
Copy link

ybm11 commented Jul 24, 2023

Thanks for sharing, this is helping me a lot in trying to figure this topic out.
One question - why is there a mismatch between the transformers version in the requirements.txt file and in the Sagmaker model creation command? What is the difference, and how does it make sense that they will be different?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment