Skip to content

Instantly share code, notes, and snippets.

@brews
Last active October 31, 2023 16:22
Show Gist options
  • Save brews/f0f3946007cad8c4c379919149d9cf32 to your computer and use it in GitHub Desktop.
Save brews/f0f3946007cad8c4c379919149d9cf32 to your computer and use it in GitHub Desktop.
Argo Workflow demo to launch kubernetes dask distributed job
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: dask-test-
spec:
entrypoint: dask
activeDeadlineSeconds: 1800 # Safety first, kids!
templates:
- name: dask
script:
image: daskdev/dask:2.30.0
env:
- name: EXTRA_PIP_PACKAGES
value: "dask-kubernetes dask[distributed]"
command: ["python"]
source: |
# Hack to get the dask container entrypoint to fire-off, otherwise
# container skips EXTRA_PIP_PACKAGES install
import subprocess
subprocess.run(
["bash", "/usr/bin/prepare.sh"], stdout=subprocess.PIPE, universal_newlines=True
)
# ...And now for something completely different...
from dask.distributed import Client
from dask_kubernetes import KubeCluster, make_pod_spec
import dask.array as da
pod_spec = make_pod_spec(
image="daskdev/dask:latest",
memory_limit="500Mi",
memory_request="500m",
cpu_limit=1,
cpu_request=1,
env={"EXTRA_PIP_PACKAGES": "dask[distributed]"},
)
cluster = KubeCluster(pod_spec)
cluster.scale(10) # specify number of workers explicitly
client = Client(cluster)
array = da.ones((1000, 1000, 1000))
print(f"OUR ANSWER IS: {array.mean().compute()} (...it should be 1.0 ...)")
resources:
limits:
memory: 500Mi
cpu: 500m
@brews
Copy link
Author

brews commented Mar 8, 2022

@emileten Yeah, and if you're working on a fresh un-configured cluster you also need to ensure the ServiceAccount has permissions like here https://argoproj.github.io/argo-workflows/workflow-rbac/, in addition to those you pointed to above in your link.

@RamonGal
Copy link

RamonGal commented Aug 4, 2023

thank you for this, this helped me on so many levels

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment