Mengtao (Martin) Yuan iseeyuan

## gist:d21feea9f582a0eae7cddc5089509c09
(vllm) $ vllm serve Qwen/Qwen2.5-7B --runner pooling --convert reward
INFO 09-28 09:34:42 [__init__.py:216] Automatically detected platform cuda.
WARNING 09-28 09:34:46 [api_server.py:1191] Torch Profiler is enabled in the API server. This should ONLY be used for local development!
(APIServer pid=2359097) INFO 09-28 09:34:46 [api_server.py:1822] vLLM API server version 0.10.2rc3.dev325+g1c3ffdbec
(APIServer pid=2359097) INFO 09-28 09:34:46 [utils.py:328] non-default args: {'model_tag': 'Qwen/Qwen2.5-7B', 'model': 'Qwen/Qwen2.5-7B', 'runner': 'pooling', 'convert': 'reward'}
config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 686/686 [00:00<00:00, 6.04MB/s]
(APIServer pid=2359097) INFO 09-28 09:34:54 [model.py:551] Resolved architecture: Qwen2ForCausalLM
(APIServer pid=2359097) `torch_dtype` is deprecated! Use `dtype` instead!
(APIServer pid=2359097) INFO 09-28 09:34:54 [model.py:1581] Using max model len 131072
(APIServer pid=2359097) IN

## gist:9f73fec043053a2be8e5846c722808a4
=================================== FAILURES ===================================
___________________ TestConv1d.test_qs8_conv1d_batchnorm_seq ___________________
[gw6] darwin -- Python 3.11.11 /Users/ec2-user/runner/_work/_temp/conda_environment_13076876599/bin/python
Traceback (most recent call last):
  File "/Users/ec2-user/runner/_work/_temp/conda_environment_13076876599/lib/python3.11/unittest/case.py", line 57, in testPartExecutor
    yield
  File "/Users/ec2-user/runner/_work/_temp/conda_environment_13076876599/lib/python3.11/unittest/case.py", line 623, in run
    self._callTestMethod(testMethod)
  File "/Users/ec2-user/runner/_work/_temp/conda_environment_13076876599/lib/python3.11/unittest/case.py", line 579, in _callTestMethod
    if method() is not None:
	(vllm) $ vllm serve Qwen/Qwen2.5-7B --runner pooling --convert reward
	INFO 09-28 09:34:42 [__init__.py:216] Automatically detected platform cuda.
	WARNING 09-28 09:34:46 [api_server.py:1191] Torch Profiler is enabled in the API server. This should ONLY be used for local development!
	(APIServer pid=2359097) INFO 09-28 09:34:46 [api_server.py:1822] vLLM API server version 0.10.2rc3.dev325+g1c3ffdbec
	(APIServer pid=2359097) INFO 09-28 09:34:46 [utils.py:328] non-default args: {'model_tag': 'Qwen/Qwen2.5-7B', 'model': 'Qwen/Qwen2.5-7B', 'runner': 'pooling', 'convert': 'reward'}
	config.json: 100%\|██████████████████████████████████████████████████████████████████████████████████████████████████████████\| 686/686 [00:00<00:00, 6.04MB/s]
	(APIServer pid=2359097) INFO 09-28 09:34:54 [model.py:551] Resolved architecture: Qwen2ForCausalLM
	(APIServer pid=2359097) `torch_dtype` is deprecated! Use `dtype` instead!
	(APIServer pid=2359097) INFO 09-28 09:34:54 [model.py:1581] Using max model len 131072
	(APIServer pid=2359097) IN
	=================================== FAILURES ===================================
	___________________ TestConv1d.test_qs8_conv1d_batchnorm_seq ___________________
	[gw6] darwin -- Python 3.11.11 /Users/ec2-user/runner/_work/_temp/conda_environment_13076876599/bin/python
	Traceback (most recent call last):
	File "/Users/ec2-user/runner/_work/_temp/conda_environment_13076876599/lib/python3.11/unittest/case.py", line 57, in testPartExecutor
	yield
	File "/Users/ec2-user/runner/_work/_temp/conda_environment_13076876599/lib/python3.11/unittest/case.py", line 623, in run
	self._callTestMethod(testMethod)
	File "/Users/ec2-user/runner/_work/_temp/conda_environment_13076876599/lib/python3.11/unittest/case.py", line 579, in _callTestMethod
	if method() is not None: