Skip to content

Instantly share code, notes, and snippets.

View iseeyuan's full-sized avatar

Mengtao (Martin) Yuan iseeyuan

View GitHub Profile
(vllm) $ vllm serve Qwen/Qwen2.5-7B --runner pooling --convert reward
INFO 09-28 09:34:42 [__init__.py:216] Automatically detected platform cuda.
WARNING 09-28 09:34:46 [api_server.py:1191] Torch Profiler is enabled in the API server. This should ONLY be used for local development!
(APIServer pid=2359097) INFO 09-28 09:34:46 [api_server.py:1822] vLLM API server version 0.10.2rc3.dev325+g1c3ffdbec
(APIServer pid=2359097) INFO 09-28 09:34:46 [utils.py:328] non-default args: {'model_tag': 'Qwen/Qwen2.5-7B', 'model': 'Qwen/Qwen2.5-7B', 'runner': 'pooling', 'convert': 'reward'}
config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 686/686 [00:00<00:00, 6.04MB/s]
(APIServer pid=2359097) INFO 09-28 09:34:54 [model.py:551] Resolved architecture: Qwen2ForCausalLM
(APIServer pid=2359097) `torch_dtype` is deprecated! Use `dtype` instead!
(APIServer pid=2359097) INFO 09-28 09:34:54 [model.py:1581] Using max model len 131072
(APIServer pid=2359097) IN
=================================== FAILURES ===================================
___________________ TestConv1d.test_qs8_conv1d_batchnorm_seq ___________________
[gw6] darwin -- Python 3.11.11 /Users/ec2-user/runner/_work/_temp/conda_environment_13076876599/bin/python
Traceback (most recent call last):
File "/Users/ec2-user/runner/_work/_temp/conda_environment_13076876599/lib/python3.11/unittest/case.py", line 57, in testPartExecutor
yield
File "/Users/ec2-user/runner/_work/_temp/conda_environment_13076876599/lib/python3.11/unittest/case.py", line 623, in run
self._callTestMethod(testMethod)
File "/Users/ec2-user/runner/_work/_temp/conda_environment_13076876599/lib/python3.11/unittest/case.py", line 579, in _callTestMethod
if method() is not None: