pmbaumgartner/_run_api.sh

## readme.md

      
    Raw
  

              readme.md
            
          
    Testing out mistralai/Mistral-7B-Instruct-v0.2 through vLLM and documenting the very basics to make an API call request. Run through docker. Requires the NVIDIA Container Toolkit.

  
## _run_api.sh
# https://docs.mistral.ai/self-deployment/vllm/

export HF_TOKEN=<Huggingface Token>

docker run --gpus all \
    -e HF_TOKEN=$HF_TOKEN -p 8000:8000 \
    ghcr.io/mistralai/mistral-src/vllm:latest \
    --host 0.0.0.0 \
    --model mistralai/Mistral-7B-Instruct-v0.2

## call.py
from pathlib import Path

import requests

lyrics = Path("lyrics.txt").read_text()

message = f"""Interpret the core message of these lyrics:

{lyrics}
"""

data = {"messages" : [{"role" : "user", "content" : message}], "model": "mistralai/Mistral-7B-Instruct-v0.2"}

# Note: Both the vLLM and Mistral docs don't mention you need the `v1` in the URL.
# You'll see a lot of {"detail" : "Not Found"} responses without this
r = requests.post("http://localhost:8000/v1/chat/completions", json=data).json()

print(r['choices'][0]['message']['content'])
# The core message of these lyrics appears to be about the speaker's experience of being in a relationship with someone who has hurt or confused them, and their struggle to decide whether to continue investing their emotions in the relationship or to let go and move on. The speaker expresses their desire for the other person to be open and expressive in their feelings, as they have been trying to be more forgiving and patient. However, they also acknowledge that they have been getting better at letting go of things and not getting too attached, particularly if the other person is not reciprocating their feelings or behavior is inconsistent. The speaker ultimately expresses their reluctance to leave the relationship but also their determination to protect themselves from unnecessary pain. Overall, the lyrics suggest a complex emotional landscape of love, hurt, and ambivalence.

## logs
INFO:     172.17.0.1:45074 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO 01-06 20:33:24 async_llm_engine.py:379] Received request cmpl-0493039beb574618be780e6235452a46: prompt: '<s>[INST] Interpret the core message of these lyrics.\n\nLyrics:\nGot so hung up\nOn something you said\nI should’ve guessed \nthat you would mess \nwith my head\n\nYou got up \nand I stayed in bed\nI was about to say something\nSaid nothing instead \n\nGetting good at letting things go\nBut you’re somebody I want to know\nSo if you love me than let it show\nCuz I’ve been getting good at letting things go\n\nI’ve got a feeling\nYou could prove me wrong\nA feeling that I haven’t felt in so long\nI can be patient\nI can play along\nForgive as fast I forget you, \nSo don’t make me have to move on\n\nGetting good at letting things go\nBut you’re somebody I want to know\nSo if you love me than let it show\nCuz I’ve been getting good at letting things go\n\nDon’t want to leave so I’m letting you know\nThat I’ve been getting good at letting things go done [/INST]', sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.7, top_p=1.0, top_k=-1, min_p=0.0, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], ignore_eos=False, max_tokens=32522, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True), prompt token ids: [1, 1, 733, 16289, 28793, 4287, 5520, 272, 6421, 2928, 302, 1167, 22583, 28723, 13, 13, 28758, 19591, 28747, 13, 28777, 322, 579, 7342, 582, 13, 2486, 1545, 368, 773, 13, 28737, 1023, 28809, 333, 26415, 28705, 13, 6087, 368, 682, 4687, 28705, 13, 3415, 586, 1335, 13, 13, 1976, 1433, 582, 28705, 13, 391, 315, 10452, 297, 2855, 13, 28737, 403, 684, 298, 1315, 1545, 13, 28735, 3439, 2511, 3519, 28705, 13, 13, 1458, 1157, 1179, 438, 12815, 1722, 576, 13, 2438, 368, 28809, 267, 12421, 315, 947, 298, 873, 13, 5142, 513, 368, 2016, 528, 821, 1346, 378, 1347, 13, 28743, 3533, 315, 28809, 333, 750, 2719, 1179, 438, 12815, 1722, 576, 13, 13, 28737, 28809, 333, 1433, 264, 4622, 13, 1976, 829, 7674, 528, 3544, 13, 28741, 4622, 369, 315, 6253, 28809, 28707, 2770, 297, 579, 1043, 13, 28737, 541, 347, 7749, 13, 28737, 541, 1156, 2267, 13, 28765, 1909, 495, 390, 4102, 315, 7120, 368, 28725, 28705, 13, 5142, 949, 28809, 28707, 1038, 528, 506, 298, 2318, 356, 13, 13, 1458, 1157, 1179, 438, 12815, 1722, 576, 13, 2438, 368, 28809, 267, 12421, 315, 947, 298, 873, 13, 5142, 513, 368, 2016, 528, 821, 1346, 378, 1347, 13, 28743, 3533, 315, 28809, 333, 750, 2719, 1179, 438, 12815, 1722, 576, 13, 13, 6017, 28809, 28707, 947, 298, 3530, 579, 315, 28809, 28719, 12815, 368, 873, 13, 3840, 315, 28809, 333, 750, 2719, 1179, 438, 12815, 1722, 576, 2203, 733, 28748, 16289, 28793].
INFO 01-06 20:33:24 llm_engine.py:649] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 1.1%, CPU KV cache usage: 0.0%
INFO 01-06 20:33:27 async_llm_engine.py:111] Finished request cmpl-0493039beb574618be780e6235452a46.
INFO:     172.17.0.1:55126 - "POST /v1/chat/completions HTTP/1.1" 200 OK

## lyrics.txt
Lyrics:
Got so hung up
On something you said
I should’ve guessed
that you would mess
with my head

You got up
and I stayed in bed
I was about to say something
Said nothing instead

Getting good at letting things go
But you’re somebody I want to know
So if you love me than let it show
Cuz I’ve been getting good at letting things go

I’ve got a feeling
You could prove me wrong
A feeling that I haven’t felt in so long
I can be patient
I can play along
Forgive as fast I forget you,
So don’t make me have to move on

Getting good at letting things go
But you’re somebody I want to know
So if you love me than let it show
Cuz I’ve been getting good at letting things go

Don’t want to leave so I’m letting you know
That I’ve been getting good at letting things go done

## nvidia-smi
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08              Driver Version: 545.23.08    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3090        On  | 00000000:09:00.0  On |                  N/A |
|  0%   27C    P8              24W / 420W |  17696MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1423      G   /usr/lib/xorg/Xorg                          161MiB |
|    0   N/A  N/A      1689      G   /usr/bin/gnome-shell                         46MiB |
|    0   N/A  N/A     12701      G   /usr/lib/firefox/firefox                      0MiB |
|    0   N/A  N/A     16374      C   python3                                   17306MiB |
+---------------------------------------------------------------------------------------+
	# https://docs.mistral.ai/self-deployment/vllm/

	export HF_TOKEN=<Huggingface Token>

	docker run --gpus all \
	-e HF_TOKEN=$HF_TOKEN -p 8000:8000 \
	ghcr.io/mistralai/mistral-src/vllm:latest \
	--host 0.0.0.0 \
	--model mistralai/Mistral-7B-Instruct-v0.2
	from pathlib import Path

	import requests

	lyrics = Path("lyrics.txt").read_text()

	message = f"""Interpret the core message of these lyrics:

	{lyrics}
	"""

	data = {"messages" : [{"role" : "user", "content" : message}], "model": "mistralai/Mistral-7B-Instruct-v0.2"}

	# Note: Both the vLLM and Mistral docs don't mention you need the `v1` in the URL.
	# You'll see a lot of {"detail" : "Not Found"} responses without this
	r = requests.post("http://localhost:8000/v1/chat/completions", json=data).json()

	print(r['choices'][0]['message']['content'])
	# The core message of these lyrics appears to be about the speaker's experience of being in a relationship with someone who has hurt or confused them, and their struggle to decide whether to continue investing their emotions in the relationship or to let go and move on. The speaker expresses their desire for the other person to be open and expressive in their feelings, as they have been trying to be more forgiving and patient. However, they also acknowledge that they have been getting better at letting go of things and not getting too attached, particularly if the other person is not reciprocating their feelings or behavior is inconsistent. The speaker ultimately expresses their reluctance to leave the relationship but also their determination to protect themselves from unnecessary pain. Overall, the lyrics suggest a complex emotional landscape of love, hurt, and ambivalence.
	INFO: 172.17.0.1:45074 - "POST /v1/chat/completions HTTP/1.1" 200 OK
	INFO 01-06 20:33:24 async_llm_engine.py:379] Received request cmpl-0493039beb574618be780e6235452a46: prompt: '<s>[INST] Interpret the core message of these lyrics.\n\nLyrics:\nGot so hung up\nOn something you said\nI should’ve guessed \nthat you would mess \nwith my head\n\nYou got up \nand I stayed in bed\nI was about to say something\nSaid nothing instead \n\nGetting good at letting things go\nBut you’re somebody I want to know\nSo if you love me than let it show\nCuz I’ve been getting good at letting things go\n\nI’ve got a feeling\nYou could prove me wrong\nA feeling that I haven’t felt in so long\nI can be patient\nI can play along\nForgive as fast I forget you, \nSo don’t make me have to move on\n\nGetting good at letting things go\nBut you’re somebody I want to know\nSo if you love me than let it show\nCuz I’ve been getting good at letting things go\n\nDon’t want to leave so I’m letting you know\nThat I’ve been getting good at letting things go done [/INST]', sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.7, top_p=1.0, top_k=-1, min_p=0.0, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], ignore_eos=False, max_tokens=32522, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True), prompt token ids: [1, 1, 733, 16289, 28793, 4287, 5520, 272, 6421, 2928, 302, 1167, 22583, 28723, 13, 13, 28758, 19591, 28747, 13, 28777, 322, 579, 7342, 582, 13, 2486, 1545, 368, 773, 13, 28737, 1023, 28809, 333, 26415, 28705, 13, 6087, 368, 682, 4687, 28705, 13, 3415, 586, 1335, 13, 13, 1976, 1433, 582, 28705, 13, 391, 315, 10452, 297, 2855, 13, 28737, 403, 684, 298, 1315, 1545, 13, 28735, 3439, 2511, 3519, 28705, 13, 13, 1458, 1157, 1179, 438, 12815, 1722, 576, 13, 2438, 368, 28809, 267, 12421, 315, 947, 298, 873, 13, 5142, 513, 368, 2016, 528, 821, 1346, 378, 1347, 13, 28743, 3533, 315, 28809, 333, 750, 2719, 1179, 438, 12815, 1722, 576, 13, 13, 28737, 28809, 333, 1433, 264, 4622, 13, 1976, 829, 7674, 528, 3544, 13, 28741, 4622, 369, 315, 6253, 28809, 28707, 2770, 297, 579, 1043, 13, 28737, 541, 347, 7749, 13, 28737, 541, 1156, 2267, 13, 28765, 1909, 495, 390, 4102, 315, 7120, 368, 28725, 28705, 13, 5142, 949, 28809, 28707, 1038, 528, 506, 298, 2318, 356, 13, 13, 1458, 1157, 1179, 438, 12815, 1722, 576, 13, 2438, 368, 28809, 267, 12421, 315, 947, 298, 873, 13, 5142, 513, 368, 2016, 528, 821, 1346, 378, 1347, 13, 28743, 3533, 315, 28809, 333, 750, 2719, 1179, 438, 12815, 1722, 576, 13, 13, 6017, 28809, 28707, 947, 298, 3530, 579, 315, 28809, 28719, 12815, 368, 873, 13, 3840, 315, 28809, 333, 750, 2719, 1179, 438, 12815, 1722, 576, 2203, 733, 28748, 16289, 28793].
	INFO 01-06 20:33:24 llm_engine.py:649] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 1.1%, CPU KV cache usage: 0.0%
	INFO 01-06 20:33:27 async_llm_engine.py:111] Finished request cmpl-0493039beb574618be780e6235452a46.
	INFO: 172.17.0.1:55126 - "POST /v1/chat/completions HTTP/1.1" 200 OK
	Lyrics:
	Got so hung up
	On something you said
	I should’ve guessed
	that you would mess
	with my head

	You got up
	and I stayed in bed
	I was about to say something
	Said nothing instead

	Getting good at letting things go
	But you’re somebody I want to know
	So if you love me than let it show
	Cuz I’ve been getting good at letting things go

	I’ve got a feeling
	You could prove me wrong
	A feeling that I haven’t felt in so long
	I can be patient
	I can play along
	Forgive as fast I forget you,
	So don’t make me have to move on

	Getting good at letting things go
	But you’re somebody I want to know
	So if you love me than let it show
	Cuz I’ve been getting good at letting things go

	Don’t want to leave so I’m letting you know
	That I’ve been getting good at letting things go done
	+---------------------------------------------------------------------------------------+
	\| NVIDIA-SMI 545.23.08 Driver Version: 545.23.08 CUDA Version: 12.3 \|
	\|-----------------------------------------+----------------------+----------------------+
	\| GPU Name Persistence-M \| Bus-Id Disp.A \| Volatile Uncorr. ECC \|
	\| Fan Temp Perf Pwr:Usage/Cap \| Memory-Usage \| GPU-Util Compute M. \|
	\| \| \| MIG M. \|
	\|=========================================+======================+======================\|
	\| 0 NVIDIA GeForce RTX 3090 On \| 00000000:09:00.0 On \| N/A \|
	\| 0% 27C P8 24W / 420W \| 17696MiB / 24576MiB \| 0% Default \|
	\| \| \| N/A \|
	+-----------------------------------------+----------------------+----------------------+

	+---------------------------------------------------------------------------------------+
	\| Processes: \|
	\| GPU GI CI PID Type Process name GPU Memory \|
	\| ID ID Usage \|
	\|=======================================================================================\|
	\| 0 N/A N/A 1423 G /usr/lib/xorg/Xorg 161MiB \|
	\| 0 N/A N/A 1689 G /usr/bin/gnome-shell 46MiB \|
	\| 0 N/A N/A 12701 G /usr/lib/firefox/firefox 0MiB \|
	\| 0 N/A N/A 16374 C python3 17306MiB \|
	+---------------------------------------------------------------------------------------+