Testing out mistralai/Mistral-7B-Instruct-v0.2
through vLLM
and documenting the very basics to make an API call request. Run through docker. Requires the NVIDIA Container Toolkit.
Last active
January 6, 2024 21:17
-
-
Save pmbaumgartner/d565ccd6cc420f2a5ea92ca03222b46b to your computer and use it in GitHub Desktop.
Mistal w/ vLLM. Run w/ a RTX 3090
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# https://docs.mistral.ai/self-deployment/vllm/ | |
export HF_TOKEN=<Huggingface Token> | |
docker run --gpus all \ | |
-e HF_TOKEN=$HF_TOKEN -p 8000:8000 \ | |
ghcr.io/mistralai/mistral-src/vllm:latest \ | |
--host 0.0.0.0 \ | |
--model mistralai/Mistral-7B-Instruct-v0.2 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from pathlib import Path | |
import requests | |
lyrics = Path("lyrics.txt").read_text() | |
message = f"""Interpret the core message of these lyrics: | |
{lyrics} | |
""" | |
data = {"messages" : [{"role" : "user", "content" : message}], "model": "mistralai/Mistral-7B-Instruct-v0.2"} | |
# Note: Both the vLLM and Mistral docs don't mention you need the `v1` in the URL. | |
# You'll see a lot of {"detail" : "Not Found"} responses without this | |
r = requests.post("http://localhost:8000/v1/chat/completions", json=data).json() | |
print(r['choices'][0]['message']['content']) | |
# The core message of these lyrics appears to be about the speaker's experience of being in a relationship with someone who has hurt or confused them, and their struggle to decide whether to continue investing their emotions in the relationship or to let go and move on. The speaker expresses their desire for the other person to be open and expressive in their feelings, as they have been trying to be more forgiving and patient. However, they also acknowledge that they have been getting better at letting go of things and not getting too attached, particularly if the other person is not reciprocating their feelings or behavior is inconsistent. The speaker ultimately expresses their reluctance to leave the relationship but also their determination to protect themselves from unnecessary pain. Overall, the lyrics suggest a complex emotional landscape of love, hurt, and ambivalence. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
INFO: 172.17.0.1:45074 - "POST /v1/chat/completions HTTP/1.1" 200 OK | |
INFO 01-06 20:33:24 async_llm_engine.py:379] Received request cmpl-0493039beb574618be780e6235452a46: prompt: '<s>[INST] Interpret the core message of these lyrics.\n\nLyrics:\nGot so hung up\nOn something you said\nI should’ve guessed \nthat you would mess \nwith my head\n\nYou got up \nand I stayed in bed\nI was about to say something\nSaid nothing instead \n\nGetting good at letting things go\nBut you’re somebody I want to know\nSo if you love me than let it show\nCuz I’ve been getting good at letting things go\n\nI’ve got a feeling\nYou could prove me wrong\nA feeling that I haven’t felt in so long\nI can be patient\nI can play along\nForgive as fast I forget you, \nSo don’t make me have to move on\n\nGetting good at letting things go\nBut you’re somebody I want to know\nSo if you love me than let it show\nCuz I’ve been getting good at letting things go\n\nDon’t want to leave so I’m letting you know\nThat I’ve been getting good at letting things go done [/INST]', sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.7, top_p=1.0, top_k=-1, min_p=0.0, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], ignore_eos=False, max_tokens=32522, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True), prompt token ids: [1, 1, 733, 16289, 28793, 4287, 5520, 272, 6421, 2928, 302, 1167, 22583, 28723, 13, 13, 28758, 19591, 28747, 13, 28777, 322, 579, 7342, 582, 13, 2486, 1545, 368, 773, 13, 28737, 1023, 28809, 333, 26415, 28705, 13, 6087, 368, 682, 4687, 28705, 13, 3415, 586, 1335, 13, 13, 1976, 1433, 582, 28705, 13, 391, 315, 10452, 297, 2855, 13, 28737, 403, 684, 298, 1315, 1545, 13, 28735, 3439, 2511, 3519, 28705, 13, 13, 1458, 1157, 1179, 438, 12815, 1722, 576, 13, 2438, 368, 28809, 267, 12421, 315, 947, 298, 873, 13, 5142, 513, 368, 2016, 528, 821, 1346, 378, 1347, 13, 28743, 3533, 315, 28809, 333, 750, 2719, 1179, 438, 12815, 1722, 576, 13, 13, 28737, 28809, 333, 1433, 264, 4622, 13, 1976, 829, 7674, 528, 3544, 13, 28741, 4622, 369, 315, 6253, 28809, 28707, 2770, 297, 579, 1043, 13, 28737, 541, 347, 7749, 13, 28737, 541, 1156, 2267, 13, 28765, 1909, 495, 390, 4102, 315, 7120, 368, 28725, 28705, 13, 5142, 949, 28809, 28707, 1038, 528, 506, 298, 2318, 356, 13, 13, 1458, 1157, 1179, 438, 12815, 1722, 576, 13, 2438, 368, 28809, 267, 12421, 315, 947, 298, 873, 13, 5142, 513, 368, 2016, 528, 821, 1346, 378, 1347, 13, 28743, 3533, 315, 28809, 333, 750, 2719, 1179, 438, 12815, 1722, 576, 13, 13, 6017, 28809, 28707, 947, 298, 3530, 579, 315, 28809, 28719, 12815, 368, 873, 13, 3840, 315, 28809, 333, 750, 2719, 1179, 438, 12815, 1722, 576, 2203, 733, 28748, 16289, 28793]. | |
INFO 01-06 20:33:24 llm_engine.py:649] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 1.1%, CPU KV cache usage: 0.0% | |
INFO 01-06 20:33:27 async_llm_engine.py:111] Finished request cmpl-0493039beb574618be780e6235452a46. | |
INFO: 172.17.0.1:55126 - "POST /v1/chat/completions HTTP/1.1" 200 OK |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Lyrics: | |
Got so hung up | |
On something you said | |
I should’ve guessed | |
that you would mess | |
with my head | |
You got up | |
and I stayed in bed | |
I was about to say something | |
Said nothing instead | |
Getting good at letting things go | |
But you’re somebody I want to know | |
So if you love me than let it show | |
Cuz I’ve been getting good at letting things go | |
I’ve got a feeling | |
You could prove me wrong | |
A feeling that I haven’t felt in so long | |
I can be patient | |
I can play along | |
Forgive as fast I forget you, | |
So don’t make me have to move on | |
Getting good at letting things go | |
But you’re somebody I want to know | |
So if you love me than let it show | |
Cuz I’ve been getting good at letting things go | |
Don’t want to leave so I’m letting you know | |
That I’ve been getting good at letting things go done |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
+---------------------------------------------------------------------------------------+ | |
| NVIDIA-SMI 545.23.08 Driver Version: 545.23.08 CUDA Version: 12.3 | | |
|-----------------------------------------+----------------------+----------------------+ | |
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | |
| | | MIG M. | | |
|=========================================+======================+======================| | |
| 0 NVIDIA GeForce RTX 3090 On | 00000000:09:00.0 On | N/A | | |
| 0% 27C P8 24W / 420W | 17696MiB / 24576MiB | 0% Default | | |
| | | N/A | | |
+-----------------------------------------+----------------------+----------------------+ | |
+---------------------------------------------------------------------------------------+ | |
| Processes: | | |
| GPU GI CI PID Type Process name GPU Memory | | |
| ID ID Usage | | |
|=======================================================================================| | |
| 0 N/A N/A 1423 G /usr/lib/xorg/Xorg 161MiB | | |
| 0 N/A N/A 1689 G /usr/bin/gnome-shell 46MiB | | |
| 0 N/A N/A 12701 G /usr/lib/firefox/firefox 0MiB | | |
| 0 N/A N/A 16374 C python3 17306MiB | | |
+---------------------------------------------------------------------------------------+ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Logs from longer request: