Skip to content

Instantly share code, notes, and snippets.

@brews
Last active February 12, 2022 00:38
Show Gist options
  • Save brews/68d2c0cc2b6a55fcf0bb3a8d8c7ba209 to your computer and use it in GitHub Desktop.
Save brews/68d2c0cc2b6a55fcf0bb3a8d8c7ba209 to your computer and use it in GitHub Desktop.
Simple Jupyter notebook with a "parameter" tagged cell for running with papermill. Includes Argo Workflow running notebook on jupyterhub container image.
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: papermill-test-
spec:
entrypoint: main
templates:
- name: main
inputs:
parameters:
- name: in-zarr
value: "gs://fakebucketname/data.zarr"
artifacts:
- name: input-notebook
path: /src
git:
repo: https://gist.github.com/68d2c0cc2b6a55fcf0bb3a8d8c7ba209.git
outputs:
artifacts:
- name: output-notebook
path: /tmp/papermill_test_out.ipynb
container:
image: gcr.io/rhg-project-1/notebook:latest
command: [papermill]
args:
- "/src/papermill_test.ipynb"
- "/tmp/papermill_test_out.ipynb"
- "-p"
- "inzarr"
- "{{ inputs.parameters.in-zarr }}"
- "--kernel"
- "python3"
resources:
requests:
memory: 2Gi
cpu: "1000m"
limits:
memory: 2Gi
cpu: "2000m"
activeDeadlineSeconds: 900
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: papermilldask-test-
spec:
entrypoint: main
templates:
- name: main
inputs:
parameters:
- name: in-zarr
value: "gs://fakebucketname/data.zarr"
artifacts:
- name: input-notebook
path: /src
git:
repo: https://gist.github.com/68d2c0cc2b6a55fcf0bb3a8d8c7ba209.git
outputs:
artifacts:
- name: output-notebook
path: /tmp/papermilldask_test_out.ipynb
container:
image: gcr.io/rhg-project-1/notebook:latest
env:
- name: DASK_GATEWAY__ADDRESS
value: "http://traefik-dask-gateway.dask-gateway.svc.cluster.local"
- name: DASK_GATEWAY__AUTH__TYPE
value: basic
command: [papermill]
args:
- "/src/papermilldask_test.ipynb"
- "/tmp/papermilldask_test_out.ipynb"
- "-p"
- "inzarr"
- "{{ inputs.parameters.in-zarr }}"
- "--kernel"
- "python3"
securityContext:
runAsUser: 1000
resources:
requests:
memory: 2Gi
cpu: "1000m"
limits:
memory: 2Gi
cpu: "2000m"
activeDeadlineSeconds: 900
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from dask_gateway import Gateway\n",
"import dask.array as da\n",
"import xarray as xr"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"parameters"
]
},
"outputs": [],
"source": [
"inzarr = None"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"gateway = Gateway()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n",
"gateway.list_clusters()\n",
"\n",
"with gateway.new_cluster() as cluster:\n",
" cluster.scale(2)\n",
"\n",
" client = cluster.get_client()\n",
" print(client)\n",
"\n",
" array = da.ones((1000, 1000, 1000))\n",
" print(f\"OUR ANSWER IS: {array.mean().compute()} (...it should be 1.0 ...)\")\n",
"\n",
" gateway.list_clusters()\n",
"\n",
"gateway.list_clusters()\n",
"\n",
"print(\"done\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Because the image we're using is old old old...\n",
"import gcsfs\n",
"fs = gcsfs.GCSFileSystem()\n",
"inzarr = fs.get_mapper(inzarr)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ds = xr.open_zarr(inzarr)\n",
"print(ds)"
]
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment