Skip to content

Instantly share code, notes, and snippets.

@msaroufim
Created Dec 10, 2021
Embed
What would you like to do?

Create a cluster

ray up -y ray_cluster.yaml
pip install torchx[dev]

Launching a job

(ray) ubuntu@ip-172-31-51-124:~/torchx$ torchx run -s ray -cfg dashboard_address=34.209.89.185:20002,working_dir=aivanou_test utils.binary --entrypoint ray_simple.py
torchx 2021-12-10 19:00:25 INFO     Uploading package gcs://_ray_pkg_9a445caa07d08cad.zip.
torchx 2021-12-10 19:00:25 INFO     Creating a file package for local directory '/tmp/tmp95frh9vx'.
ray://torchx/34.209.89.185:20002-raysubmit_aKvezN3NyA2mqZeW
torchx 2021-12-10 19:00:25 INFO     Launched app: ray://torchx/34.209.89.185:20002-raysubmit_aKvezN3NyA2mqZeW
Status is PENDING
torchx 2021-12-10 19:00:25 INFO     AppStatus:
  msg: <NONE>
  num_restarts: -1
  roles:
  - replicas:
    - hostname: ''
      id: 0
      role: ray
      state: !!python/object/apply:torchx.specs.api.AppState
      - 2
      structured_error_msg: <NONE>
    role: ray
  state: PENDING (2)
  structured_error_msg: <NONE>
  ui_url: null

torchx 2021-12-10 19:00:25 INFO     Job URL: None

Describing a job

(ray) ubuntu@ip-172-31-51-124:~/torchx$ torchx describe ray://torchx/34.209.89.185:20002-raysubmit_aKvezN3NyA2mqZeW
Status is SUCCEEDED
{ 'metadata': {},
  'name': '34.209.89.185:20002-raysubmit_aKvezN3NyA2mqZeW',
  'roles': [ { 'args': [],
               'base_image': None,
               'entrypoint': '<MISSING>',
               'env': {},
               'image': '',
               'max_retries': 0,
               'metadata': {},
               'name': 'ray',
               'num_replicas': 1,
               'port_map': {},
               'resource': { 'capabilities': {},
                             'cpu': -1,
                             'gpu': -1,
                             'memMB': -1},

Getting logs

(ray) ubuntu@ip-172-31-51-124:~/torchx$ torchx log ray://torchx/34.209.89.185:20002-raysubmit_aKvezN3NyA2mqZeW
Status is SUCCEEDED
Status is SUCCEEDED
Status is SUCCEEDED
ray/0 2021-12-10 11:00:27,816   INFO worker.py:843 -- Connecting to existing Ray cluster at address: 10.10.6.185:6379
ray/0 (CommandActor pid=700) hello
ray/0 
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment