Skip to content

Instantly share code, notes, and snippets.

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@stephanie-wang
stephanie-wang / test_f_g.py
Created January 12, 2024 22:55
Custom resources workaround for nested tasks
import ray
import time
@ray.remote(num_cpus=0)
class Counter:
def __init__(self):
self.num_f = 0
self.num_g = 0
@stephanie-wang
stephanie-wang / image_loader_microbenchmark.py
Created September 14, 2023 19:03
Data preprocessing for image classification microbenchmark
"""
Runs single-node microbenchmarks for reading and preprocessing an image dataset
from local disk and (for supported data loaders) cloud storage. Pass a --data-root,
--parquet-data-root, --tf-data-root, and/or --mosaic-data-root pointing to the
dataset directory. Throughputs are written to `output.csv` in total images/s.
"""
import ray
import torch
import torchvision
import os
@stephanie-wang
stephanie-wang / mapreduce_generator.py
Created June 15, 2022 16:58
map reduce with generators
import ray
@ray.remote
def map(start, end, boundaries):
vals = list(range(start, end))
partitions = []
prev_bound = 0
for next_bound in boundaries:
@stephanie-wang
stephanie-wang / sort.py
Last active March 17, 2022 23:50
Sort benchmark for datasets
import ray
import pandas as pd
import numpy as np
import time
import builtins
from typing import Any, Generic, List, Callable, Union, Tuple, Iterable
import os
import psutil
import resource
@stephanie-wang
stephanie-wang / out
Created March 16, 2022 17:23
threaded_actors_stress_test output
(MemoryMonitorActor pid=1328) 148 10.8GiB /home/ray/anaconda3/lib/python3.7/site-packages/ray/core/src/ray/gcs/gcs_server --log_dir=/tmp/ray/s
(MemoryMonitorActor pid=1328) 1252 1.24GiB python stress_tests/test_threaded_actors.py --test-runtime 3600 --kill-interval_s 60
(MemoryMonitorActor pid=1328) 173 0.87GiB /home/ray/anaconda3/bin/python -u /home/ray/anaconda3/lib/python3.7/site-packages/ray/dashboard/dash
(MemoryMonitorActor pid=1328) 250 0.12GiB /home/ray/anaconda3/bin/python -u /home/ray/anaconda3/lib/python3.7/site-packages/ray/dashboard/agen
(MemoryMonitorActor pid=1328) 64 0.09GiB /home/ray/anaconda3/bin/python /home/ray/anaconda3/bin/anyscale session web_terminal_server --deploy
(MemoryMonitorActor pid=1328) 384 0.09GiB /home/ray/anaconda3/bin/python /home/ray/anaconda3/bin/anyscale session auth_start
(MemoryMonitorActor pid=1328) 1328 0.06GiB ray::MemoryMonitorActor.run()
(MemoryMonitorActor pid=132
@stephanie-wang
stephanie-wang / memory.py
Created October 7, 2021 01:22
Monitoring memory usage with /proc
import numpy as np
import ray
import os
@ray.remote
def f():
return np.random.rand(1000_000_000 // 8)
@stephanie-wang
stephanie-wang / output.csv
Last active May 28, 2021 15:21
Sorting on Dask
num_nodes nbytes npartitions dask_tasks dask_nprocs dask_nthreads dask_memlimit duration
1 1000000000 100 False 0 0 0 12.28133487701416
1 1000000000 100 False 0 0 0 11.294680833816528
1 1000000000 100 False 0 0 0 11.143301963806152
1 1000000000 100 False 0 0 0 10.956552743911743
1 1000000000 100 False 0 0 0 11.068711757659912
1 1000000000 100 False 0 0 0 11.079143285751343
1 10000000000 100 False 0 0 0 114.72856569290161
1 20000000000 100 False 0 0 0 258.343745470047
1 100000000000 100 False 0 0 0 1911.8010439872742
@stephanie-wang
stephanie-wang / sort.py
Created March 18, 2021 18:19
Distributed sort on Ray
import ray
import numpy as np
@ray.remote
def map(data, npartitions):
outputs = [list() for _ in range(npartitions)]
for row in data:
outputs[int(row * npartitions)].append(row)
return tuple(sorted(output) for output in outputs)
@stephanie-wang
stephanie-wang / dask_on_ray.py
Created February 16, 2021 19:34
Data processing support in Ray
import ray
from ray.util.dask import ray_dask_get
import dask
import dask.dataframe as dd
import pandas as pd
import numpy as np
dask.config.set(scheduler=ray_dask_get) # Sets Ray as the default backend.