Skip to content

Instantly share code, notes, and snippets.

@ncclementi
Created October 12, 2021 22:29
Show Gist options
  • Save ncclementi/fd6b3b1ec8244229e1f8249313ffbe49 to your computer and use it in GitHub Desktop.
Save ncclementi/fd6b3b1ec8244229e1f8249313ffbe49 to your computer and use it in GitHub Desktop.
coiled-afar-gpus.ipynb
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"metadata": {},
"id": "360887a4",
"cell_type": "markdown",
"source": "## GPUs on Coiled using Afar\n\n[Afar](https://github.com/eriknw/afar) allows you to run code on remote Dask clusters. In particular, you can run GPUs computations on Coiled from your laptop. You don't need a GPU locally to do so, Afar will take care of that. \n\n### Software environment\n\nTo run computations that uses CuPy, CuDF, Dask-CuDF and numba-cuda, we recommend you build your software environment on coiled starting from a rapids image, that has a version of distributed > 2021.08.1 (Due to this [issue](https://github.com/dask/distributed/issues/5224) fixed on this [PR](https://github.com/dask/distributed/pull/5236), and you will only need to install as extra, Afar >= 0.6.1\n\nYou will need to locally match this environment with the corresponding matching versions of python, dask, distributed and afar, as well as installing coiled. \n\nIf you are using conda, you can install your local environment with a file that looks like the following. \n\n```yaml\nname: gpu-afar\nchannels:\n - conda-forge\ndependencies:\n - python=3.8\n - dask=2021.09.1\n - afar=0.6.1\n - ipywidgets\n - coiled\n```\nExcept you'll be matching the versions of what's installed in coiled's software environment. \n\nFor the examples below we use the following coiled's software environment:"
},
{
"metadata": {
"trusted": false
},
"id": "060bc92c",
"cell_type": "code",
"source": "import coiled",
"execution_count": null,
"outputs": []
},
{
"metadata": {
"trusted": false
},
"id": "b596779c",
"cell_type": "code",
"source": "#run this only first to build, comment afterwards\ncoiled.create_software_environment(\n name=\"gpu-afar\", \n container=\"rapidsai/rapidsai-nightly:21.10-cuda11.2-runtime-ubuntu20.04-py3.8\",\n conda_env_name=\"rapids\",\n conda={\n \"channels\": [\"conda-forge\"], \n \"dependencies\": [\"afar=0.6.1\"],\n }\n)",
"execution_count": null,
"outputs": []
},
{
"metadata": {},
"id": "cc0b05d6",
"cell_type": "markdown",
"source": "### Cluster configuration and Client\n\nThis cluster by default will request `n_workers=4` if you desire to request more you can pass this argument to the Cluster constructor. \n\nNote: On AWS `worker_gpu=1` is the only valid option. When creating the following cluster this will allocate [Amazon EC2 G4dn Instances](https://aws.amazon.com/ec2/instance-types/g4/)"
},
{
"metadata": {
"trusted": false
},
"id": "f86d7f00",
"cell_type": "code",
"source": "cluster = coiled.Cluster(\n scheduler_cpu=2,\n scheduler_memory=\"4 GiB\",\n worker_cpu=4,\n worker_memory=\"16 GiB\",\n worker_gpu=1, #on aws 1 is the only valid option\n software=\"gpu-afar\",\n worker_class=\"dask_cuda.CUDAWorker\",\n)",
"execution_count": null,
"outputs": []
},
{
"metadata": {
"trusted": false
},
"id": "a2b85ca3",
"cell_type": "code",
"source": "from dask.distributed import Client\nimport afar",
"execution_count": null,
"outputs": []
},
{
"metadata": {
"trusted": false
},
"id": "db479672",
"cell_type": "code",
"source": "client = Client(cluster)\nclient",
"execution_count": null,
"outputs": []
},
{
"metadata": {},
"id": "635f8b18",
"cell_type": "markdown",
"source": "#### Toy example with cudf"
},
{
"metadata": {
"trusted": false
},
"id": "3ccfd2b2",
"cell_type": "code",
"source": "with afar.run, remotely:\n import cudf # gpu libraries imports happen within afar context manager\n x = str(cudf)\n\nx.result()",
"execution_count": null,
"outputs": []
},
{
"metadata": {},
"id": "e331affd",
"cell_type": "markdown",
"source": "Afar also has a magic that you can use in jupyter notebooks. You will first load the extension and then use as a cell or inline magic. "
},
{
"metadata": {},
"id": "34bdd7f3",
"cell_type": "markdown",
"source": "#### CuPy example"
},
{
"metadata": {
"trusted": false
},
"id": "e783ec6b",
"cell_type": "code",
"source": "%load_ext afar",
"execution_count": null,
"outputs": []
},
{
"metadata": {
"trusted": false
},
"id": "87554314",
"cell_type": "code",
"source": "%%afar\nimport cupy as cp\nx_gpu = cp.array([1, 2, 3])\nl2_gpu = cp.linalg.norm(x_gpu)\nl2_cpu = cp.asnumpy(l2_gpu) ## we return numpy like variables",
"execution_count": null,
"outputs": []
},
{
"metadata": {},
"id": "2e3421e1",
"cell_type": "markdown",
"source": "Notice that locally we don't have any GPU, to be able to inspect the result we need to convert it to something that the CPU understands. Therefore the in the code above we convert have `l2_cpu` to be a `numpy` like object."
},
{
"metadata": {
"trusted": false
},
"id": "97c23612",
"cell_type": "code",
"source": "l2_cpu.result()",
"execution_count": null,
"outputs": []
},
{
"metadata": {},
"id": "22143c7f",
"cell_type": "markdown",
"source": "#### Dask-CuDF example"
},
{
"metadata": {
"trusted": false
},
"id": "c7590f0f",
"cell_type": "code",
"source": "with afar.run, remotely:\n import dask_cudf\n df = dask_cudf.read_csv(\n \"s3://nyc-tlc/trip data/yellow_tripdata_2019-*.csv\",\n parse_dates=[\"tpep_pickup_datetime\", \"tpep_dropoff_datetime\"],\n storage_options={\"anon\": True},\n assume_missing=True).persist()\n\n res = df.groupby(\"passenger_count\").tip_amount.mean().compute().to_pandas() # convert from cuDF to pandas\n\nres.result()",
"execution_count": null,
"outputs": []
},
{
"metadata": {},
"id": "78043bc0",
"cell_type": "markdown",
"source": "#### Example using numba-cuda\n\n[Random number generator from numba-docs](https://numba.pydata.org/numba-doc/latest/cuda/random.html#example)"
},
{
"metadata": {
"trusted": false
},
"id": "44d259d0",
"cell_type": "code",
"source": "with afar.run, remotely:\n from numba import cuda\n from numba.cuda.random import create_xoroshiro128p_states, xoroshiro128p_uniform_float32\n import numpy as np\n\n @cuda.jit\n def compute_pi(rng_states, iterations, out):\n \"\"\"Find the maximum value in values and store in result[0]\"\"\"\n thread_id = cuda.grid(1)\n\n # Compute pi by drawing random (x, y) points and finding what\n # fraction lie inside a unit circle\n inside = 0\n for i in range(iterations):\n x = xoroshiro128p_uniform_float32(rng_states, thread_id)\n y = xoroshiro128p_uniform_float32(rng_states, thread_id)\n if x**2 + y**2 <= 1.0:\n inside += 1\n\n out[thread_id] = 4.0 * inside / iterations\n\n threads_per_block = 64\n blocks = 24\n rng_states = create_xoroshiro128p_states(threads_per_block * blocks, seed=1)\n out = np.zeros(threads_per_block * blocks, dtype=np.float32)\n\n compute_pi[blocks, threads_per_block](rng_states, 10000, out)\n test = out.mean()\n",
"execution_count": null,
"outputs": []
},
{
"metadata": {
"trusted": false
},
"id": "79581d1d-5080-4746-90e5-415d04291430",
"cell_type": "code",
"source": "test.result()",
"execution_count": null,
"outputs": []
}
],
"metadata": {
"kernelspec": {
"name": "python3",
"display_name": "Python 3 (ipykernel)",
"language": "python"
},
"language_info": {
"name": "python",
"version": "3.8.12",
"mimetype": "text/x-python",
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"pygments_lexer": "ipython3",
"nbconvert_exporter": "python",
"file_extension": ".py"
},
"gist": {
"id": "",
"data": {
"description": "coiled-afar-gpus.ipynb",
"public": true
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment