Skip to content

Instantly share code, notes, and snippets.

@goodgravy
Last active September 16, 2022 14:09
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save goodgravy/8567b2a8505a9d21d21c2427f629e96a to your computer and use it in GitHub Desktop.
Save goodgravy/8567b2a8505a9d21d21c2427f629e96a to your computer and use it in GitHub Desktop.
Repro for problems decorating and importing functions in Dask
# Scheduler spec
apiVersion: v1
kind: Pod
metadata:
name: prefect-scheduler-debug
namespace: prefect
spec:
restartPolicy: Never
containers:
- image: ghcr.io/dask/dask:latest
imagePullPolicy: IfNotPresent
name: dask-scheduler
env:
- name: DISABLE_JUPYTER
value: "true"
- name: EXTRA_PIP_PACKAGES
value: bokeh>=2.1.1
resources:
limits:
cpu: "1"
memory: 2G
requests:
cpu: "1"
memory: 2G
# Pod to issue commands from
apiVersion: v1
kind: Pod
metadata:
name: dask-debugger
namespace: prefect
spec:
restartPolicy: Never
containers:
- image: ghcr.io/dask/dask:latest
imagePullPolicy: IfNotPresent
name: dask-debugger
env:
- name: EXTRA_PIP_PACKAGES
value: dask_kubernetes
command: ["sleep", "3600"]
resources:
limits:
cpu: "1"
memory: 2G
requests:
cpu: "1"
memory: 2G
# Worker spec
apiVersion: v1
kind: Pod
metadata:
name: prefect-worker-debug
namespace: prefect
spec:
restartPolicy: Never
containers:
- image: ghcr.io/dask/dask:latest
imagePullPolicy: IfNotPresent
name: dask-worker
args: [dask-worker]
env:
- name: DISABLE_JUPYTER
value: "true"
- name: EXTRA_PIP_PACKAGES
value: bokeh>=2.1.1
resources:
limits:
cpu: "1"
memory: 2G
requests:
cpu: "1"
memory: 2G
from functools import lru_cache
from dask_kubernetes import KubeCluster
from dask.distributed import Client
from operation import expensive_operation
cluster = KubeCluster(
pod_template="debug-worker.yaml", scheduler_pod_template="debug-scheduler.yaml", n_workers=1
)
client = Client(cluster)
# @lru_cache
# def expensive_operation():
# return 1
## Problem #1:
# If I define expensive_operation in this file and decorate it with @lru_cache, I get the following error:
# Traceback (most recent call last):
# File "<stdin>", line 1, in <module>
# File "/opt/conda/lib/python3.8/site-packages/distributed/client.py", line 277, in result
# raise exc.with_traceback(tb)
# File "/opt/conda/lib/python3.8/site-packages/distributed/protocol/pickle.py", line 73, in loads
# return pickle.loads(x)
# AttributeError: Can't get attribute 'expensive_operation' on <module '__mp_main__' from '/opt/conda/bin/dask-worker'>
## Problem #2:
# If I import expensive_operation from operation.py, I get the following error:
# Traceback (most recent call last):
# File "debug.py", line 29, in <module>
# print(x.result())
# File "/opt/conda/lib/python3.8/site-packages/distributed/client.py", line 277, in result
# raise exc.with_traceback(tb)
# File "/opt/conda/lib/python3.8/site-packages/distributed/protocol/pickle.py", line 73, in loads
# return pickle.loads(x)
# ModuleNotFoundError: No module named 'operation'
def inc(x):
return x + expensive_operation()
x = client.submit(inc, 10)
print(x)
print(x.result())
def expensive_operation():
return 1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment