Skip to content

Instantly share code, notes, and snippets.

A100
16
{"model_name": "llava-hf", "engine_args": {}, "benchmark_args": {"verbose": false, "backend": "VLLMChat", "results_filename": "metrics.jsonl", "port": 8000, "random_prompt_lens_mean": null, "random_prompt_lens_range": null, "distribution": "uniform", "qps": 16.0, "concurrency": 10000, "model": "llava-hf/llava-1.5-7b-hf", "warmup": false, "skip_wait_for_ready": true, "repeat": 1, "log_latencies": false, "fail_on_response_failure": false, "variable_response_lens_mean": null, "variable_response_lens_range": null, "variable_response_lens_distribution": "uniform", "num_requests": 1000, "prompts_filename": "./filtered-prompts.json", "gen_random_prompts": false, "allow_variable_generation_length": false, "fixed_max_tokens": 128, "print_generation_lens_and_exit": false, "name": ""}, "backend": "VLLMChat", "input_len": null, "output_len": null, "tp": "NA", "dur_s": 107.19906306266785, "tokens_per_s": 6543.2568154998535, "qps": 9.32843974033063, "successful_responses": 1000, "prompt_token_count": 597170, "respo
(base) root@70b3d47b1c72:/ray# pytest -s python/ray/tune/tests/test_cluster.py
Test session starts (platform: linux, Python 3.7.9, pytest 7.0.1, pytest-sugar 0.9.5)
rootdir: /ray/python
plugins: anyio-3.6.1, asyncio-0.16.0, docker-tools-3.1.3, forked-1.4.0, lazy-fixture-0.6.3, rerunfailures-10.2, shutil-1.7.0, sugar-0.9.5, timeout-2.1.0, virtualenv-1.7.0, remotedata-0.3.2, typeguard-2.13.3
collecting ... 2022-08-05 10:19:12,277 INFO worker.py:1312 -- Connecting to existing Ray cluster at address: 172.18.0.3:64540...
2022-08-05 10:19:12,282 INFO worker.py:1487 -- Connected to Ray cluster. View the dashboard at http://127.0.0.1:8265.
2022-08-05 10:19:12,299 INFO cluster_utils.py:162 -- RayContext(dashboard_url='127.0.0.1:8265', python_version='3.7.9', ray_version='2.0.0rc0', ray_commit='{{RAY_COMMIT_SHA}}', address_info={'node_ip_address': '172.18.0.3', 'raylet_ip_address': '172.18.0.3', 'redis_address': None, 'object_store_address': '/tmp/ray/session_2022-08-05_10-19-10_016654_12497/sockets/plasma_store', 'ray
(base) ray@ip-172-31-79-189:~/e2e-tests$ vi a.py
(base) ray@ip-172-31-79-189:~/e2e-tests$ python a.py
2022-04-20 13:51:42,108 INFO main.py:985 -- [RayXGBoost] Created 4 new actors (4 total actors). Waiting until actors are ready for training.
2022-04-20 13:51:45,235 INFO main.py:1030 -- [RayXGBoost] Starting XGBoost training.
(_RemoteRayXGBoostActor pid=310, ip=172.31.75.107) [13:51:45] task [xgboost.ray]:140243459005904 got new rank 1
(_RemoteRayXGBoostActor pid=343, ip=172.31.68.8) [13:51:45] task [xgboost.ray]:140569271499600 got new rank 0
(_RemoteRayXGBoostActor pid=312, ip=172.31.86.190) [13:51:45] task [xgboost.ray]:139676711361104 got new rank 3
(_RemoteRayXGBoostActor pid=311, ip=172.31.82.234) [13:51:45] task [xgboost.ray]:140698607811280 got new rank 2
(_RemoteRayXGBoostActor pid=312, ip=172.31.86.190) [13:51:46] DEBUG: ../src/tree/updater_gpu_hist.cu:819: [GPU Hist]: Configure
(_RemoteRayXGBoostActor pid=311, ip=172.31.82.234) [13:51:46] DEBUG: ../src/tree/updater_gpu_hist.cu:819: [GPU Hist]: Conf
== Status ==
Current time: 2022-04-19 14:41:36 (running for 00:02:26.83)
Memory usage on this node: 7.4/62.0 GiB
Using HyperBand: num_stopped=0 total_brackets=1
Round #0:
Bracket(Max Size (n)=27, Milestone (r)=1, completed=0.0%): {RUNNING: 4}
Resources requested: 24.0/32 CPUs, 0/0 GPUs, 0.0/126.46 GiB heap, 0.0/55.15 GiB objects
Result logdir: /home/ray/ray_results/bohb_test
Number of trials: 4/4 (4 RUNNING)
+----------------------+----------+------------------+--------------+-------------+-----------+-----------+
import math
import torch
import torch.nn as nn
import transformers
import random
import os
import numpy as np
import time
import pandas as pd
import functools
import time
_enabled = True
_timing = dict()
def enable_record_timing():
global _enabled
_enabled = True
import ray
from ray.util.placement_group import placement_group, remove_placement_group
import time
NUM_TRIALS = 100
@ray.remote
class Actor:
def __init__(self, *args):
if len(args) > 0:
ubuntu@ip-172-31-50-62:~/ray/release/util$ ./pip_download_test.sh
conda exists
Start downloading Ray version 1.8.0.post1 of commit e94e67b9badd542b7b99efe412e7019983260ad0
Requirement already satisfied: pip in /home/ubuntu/anaconda3/lib/python3.8/site-packages (21.0.1)
Collecting pip
Downloading pip-21.3.1-py3-none-any.whl (1.7 MB)
|████████████████████████████████| 1.7 MB 5.9 MB/s
Installing collected packages: pip
Attempting uninstall: pip
Found existing installation: pip 21.0.1
(releases_1.8.0) xwjiang@xw  ~/ray_1.8.0/ray/release/util   releases/1.8.0.post1  ./pip_download_test.sh
conda exists
Start downloading Ray version 1.8.0.post1 of commit e94e67b9badd542b7b99efe412e7019983260ad0
Requirement already satisfied: pip in /Users/xwjiang/anaconda3/envs/releases_1.8.0/lib/python3.9/site-packages (21.3.1)
Collecting package metadata (current_repodata.json): done
Solving environment: done
## Package Plan ##
environment location: /Users/xwjiang/anaconda3/envs/1.8.0.post1-3.6-env
from collections import Counter, deque, defaultdict
from typing import Mapping
import ray
from ray.tune.utils.placement_groups import PlacementGroupFactory
class TrialRunner:
def __init__(self, trial_executor, max_resource_requests: int):