Skip to content

Instantly share code, notes, and snippets.

@zonca
Created October 26, 2023 15:47
Show Gist options
  • Save zonca/ab3f9f3db475331f6d8d68731636a70e to your computer and use it in GitHub Desktop.
Save zonca/ab3f9f3db475331f6d8d68731636a70e to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: dask in /opt/conda/lib/python3.11/site-packages (2023.9.3)\n",
"Collecting s3fs\n",
" Obtaining dependency information for s3fs from https://files.pythonhosted.org/packages/36/93/8aed66523d90361211a02dc0435855cc1ef357978decc2b05c8291fc515f/s3fs-2023.9.2-py3-none-any.whl.metadata\n",
" Using cached s3fs-2023.9.2-py3-none-any.whl.metadata (1.6 kB)\n",
"Collecting xarray\n",
" Obtaining dependency information for xarray from https://files.pythonhosted.org/packages/fd/82/268ff8e9e15fd55b085ba3e8a632a50add5228ed221bc05117ad6dee2841/xarray-2023.10.0-py3-none-any.whl.metadata\n",
" Using cached xarray-2023.10.0-py3-none-any.whl.metadata (10 kB)\n",
"Collecting zarr==2.16.1\n",
" Obtaining dependency information for zarr==2.16.1 from https://files.pythonhosted.org/packages/ba/55/0f5ec28561a1698ac5c11edc5724f8c6d48d01baecf740ffd62107d95e7f/zarr-2.16.1-py3-none-any.whl.metadata\n",
" Using cached zarr-2.16.1-py3-none-any.whl.metadata (5.8 kB)\n",
"Collecting asciitree (from zarr==2.16.1)\n",
" Using cached asciitree-0.3.3-py3-none-any.whl\n",
"Requirement already satisfied: numpy!=1.21.0,>=1.20 in /opt/conda/lib/python3.11/site-packages (from zarr==2.16.1) (1.24.4)\n",
"Collecting fasteners (from zarr==2.16.1)\n",
" Obtaining dependency information for fasteners from https://files.pythonhosted.org/packages/61/bf/fd60001b3abc5222d8eaa4a204cd8c0ae78e75adc688f33ce4bf25b7fafa/fasteners-0.19-py3-none-any.whl.metadata\n",
" Using cached fasteners-0.19-py3-none-any.whl.metadata (4.9 kB)\n",
"Collecting numcodecs>=0.10.0 (from zarr==2.16.1)\n",
" Obtaining dependency information for numcodecs>=0.10.0 from https://files.pythonhosted.org/packages/14/e6/8f9d4a498a06f11a06297f0b02af9968844d2e40ee79d372ccee33595285/numcodecs-0.12.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata\n",
" Using cached numcodecs-0.12.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.8 kB)\n",
"Requirement already satisfied: click>=8.0 in /opt/conda/lib/python3.11/site-packages (from dask) (8.1.7)\n",
"Requirement already satisfied: cloudpickle>=1.5.0 in /opt/conda/lib/python3.11/site-packages (from dask) (2.2.1)\n",
"Requirement already satisfied: fsspec>=2021.09.0 in /opt/conda/lib/python3.11/site-packages (from dask) (2023.9.2)\n",
"Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.11/site-packages (from dask) (23.2)\n",
"Requirement already satisfied: partd>=1.2.0 in /opt/conda/lib/python3.11/site-packages (from dask) (1.4.1)\n",
"Requirement already satisfied: pyyaml>=5.3.1 in /opt/conda/lib/python3.11/site-packages (from dask) (6.0.1)\n",
"Requirement already satisfied: toolz>=0.10.0 in /opt/conda/lib/python3.11/site-packages (from dask) (0.12.0)\n",
"Requirement already satisfied: importlib-metadata>=4.13.0 in /opt/conda/lib/python3.11/site-packages (from dask) (6.8.0)\n",
"Collecting aiobotocore~=2.5.4 (from s3fs)\n",
" Obtaining dependency information for aiobotocore~=2.5.4 from https://files.pythonhosted.org/packages/20/00/01780c5fa93e3feb6d776ac8c7bd05dbe9290165636c13edcbdde6853537/aiobotocore-2.5.4-py3-none-any.whl.metadata\n",
" Using cached aiobotocore-2.5.4-py3-none-any.whl.metadata (19 kB)\n",
"Requirement already satisfied: aiohttp!=4.0.0a0,!=4.0.0a1 in /opt/conda/lib/python3.11/site-packages (from s3fs) (3.8.6)\n",
"Requirement already satisfied: pandas>=1.4 in /opt/conda/lib/python3.11/site-packages (from xarray) (2.1.1)\n",
"Collecting botocore<1.31.18,>=1.31.17 (from aiobotocore~=2.5.4->s3fs)\n",
" Obtaining dependency information for botocore<1.31.18,>=1.31.17 from https://files.pythonhosted.org/packages/3d/e5/32a88f5a95e3d43c2e3ed86fc1ffdb715547a04f95a51d00e1185af63b0c/botocore-1.31.17-py3-none-any.whl.metadata\n",
" Using cached botocore-1.31.17-py3-none-any.whl.metadata (5.9 kB)\n",
"Collecting wrapt<2.0.0,>=1.10.10 (from aiobotocore~=2.5.4->s3fs)\n",
" Using cached wrapt-1.15.0-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (78 kB)\n",
"Collecting aioitertools<1.0.0,>=0.5.1 (from aiobotocore~=2.5.4->s3fs)\n",
" Using cached aioitertools-0.11.0-py3-none-any.whl (23 kB)\n",
"Requirement already satisfied: attrs>=17.3.0 in /opt/conda/lib/python3.11/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs) (23.1.0)\n",
"Requirement already satisfied: charset-normalizer<4.0,>=2.0 in /opt/conda/lib/python3.11/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs) (3.2.0)\n",
"Requirement already satisfied: multidict<7.0,>=4.5 in /opt/conda/lib/python3.11/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs) (6.0.4)\n",
"Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /opt/conda/lib/python3.11/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs) (4.0.3)\n",
"Requirement already satisfied: yarl<2.0,>=1.0 in /opt/conda/lib/python3.11/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs) (1.9.2)\n",
"Requirement already satisfied: frozenlist>=1.1.1 in /opt/conda/lib/python3.11/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs) (1.4.0)\n",
"Requirement already satisfied: aiosignal>=1.1.2 in /opt/conda/lib/python3.11/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs) (1.3.1)\n",
"Requirement already satisfied: zipp>=0.5 in /opt/conda/lib/python3.11/site-packages (from importlib-metadata>=4.13.0->dask) (3.17.0)\n",
"Requirement already satisfied: python-dateutil>=2.8.2 in /opt/conda/lib/python3.11/site-packages (from pandas>=1.4->xarray) (2.8.2)\n",
"Requirement already satisfied: pytz>=2020.1 in /opt/conda/lib/python3.11/site-packages (from pandas>=1.4->xarray) (2023.3.post1)\n",
"Requirement already satisfied: tzdata>=2022.1 in /opt/conda/lib/python3.11/site-packages (from pandas>=1.4->xarray) (2023.3)\n",
"Requirement already satisfied: locket in /opt/conda/lib/python3.11/site-packages (from partd>=1.2.0->dask) (1.0.0)\n",
"Collecting jmespath<2.0.0,>=0.7.1 (from botocore<1.31.18,>=1.31.17->aiobotocore~=2.5.4->s3fs)\n",
" Using cached jmespath-1.0.1-py3-none-any.whl (20 kB)\n",
"Collecting urllib3<1.27,>=1.25.4 (from botocore<1.31.18,>=1.31.17->aiobotocore~=2.5.4->s3fs)\n",
" Obtaining dependency information for urllib3<1.27,>=1.25.4 from https://files.pythonhosted.org/packages/b0/53/aa91e163dcfd1e5b82d8a890ecf13314e3e149c05270cc644581f77f17fd/urllib3-1.26.18-py2.py3-none-any.whl.metadata\n",
" Using cached urllib3-1.26.18-py2.py3-none-any.whl.metadata (48 kB)\n",
"Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.11/site-packages (from python-dateutil>=2.8.2->pandas>=1.4->xarray) (1.16.0)\n",
"Requirement already satisfied: idna>=2.0 in /opt/conda/lib/python3.11/site-packages (from yarl<2.0,>=1.0->aiohttp!=4.0.0a0,!=4.0.0a1->s3fs) (3.4)\n",
"Using cached zarr-2.16.1-py3-none-any.whl (206 kB)\n",
"Using cached s3fs-2023.9.2-py3-none-any.whl (28 kB)\n",
"Using cached xarray-2023.10.0-py3-none-any.whl (1.1 MB)\n",
"Using cached aiobotocore-2.5.4-py3-none-any.whl (73 kB)\n",
"Using cached numcodecs-0.12.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.9 MB)\n",
"Using cached fasteners-0.19-py3-none-any.whl (18 kB)\n",
"Using cached botocore-1.31.17-py3-none-any.whl (11.1 MB)\n",
"Using cached urllib3-1.26.18-py2.py3-none-any.whl (143 kB)\n",
"Installing collected packages: asciitree, wrapt, urllib3, numcodecs, jmespath, fasteners, aioitertools, zarr, botocore, xarray, aiobotocore, s3fs\n",
" Attempting uninstall: urllib3\n",
" Found existing installation: urllib3 2.0.5\n",
" Uninstalling urllib3-2.0.5:\n",
" Successfully uninstalled urllib3-2.0.5\n",
"Successfully installed aiobotocore-2.5.4 aioitertools-0.11.0 asciitree-0.3.3 botocore-1.31.17 fasteners-0.19 jmespath-1.0.1 numcodecs-0.12.1 s3fs-2023.9.2 urllib3-1.26.18 wrapt-1.15.0 xarray-2023.10.0 zarr-2.16.1\n"
]
}
],
"source": [
"!pip install dask s3fs xarray zarr==2.16.1\n",
"# issue between s3fs and zarr 2.16.2"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"from dask import distributed\n",
"import dask.array as da\n",
"import s3fs\n",
"import os"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"user = os.environ[\"JUPYTERHUB_USER\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Connect to object store and check the content of a folder"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"# see https://www.zonca.dev/posts/2022-04-04-zarr_jetstream2 for creating credentials for your own allocation\n",
"\n",
"KEY = os.environ[\"OS_S3_KEY\"]\n",
"SECRET = os.environ[\"OS_S3_SECRET\"]"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['gateways23/zarrtest-julianpistorius', 'gateways23/zarrtest-lemaiw', 'gateways23/zarrtest-zo', 'gateways23/zarrtest-zonca', 'gateways23/zarrtest-zoncaoverleafbot']\n"
]
}
],
"source": [
"fs = s3fs.S3FileSystem(\n",
" key=KEY,\n",
" secret=SECRET,\n",
" use_ssl=True,\n",
" client_kwargs=dict(\n",
" endpoint_url=\"https://js2.jetstream-cloud.org:8001/\",\n",
" region_name=\"RegionOne\",\n",
" ),\n",
")\n",
"\n",
"print(fs.ls(\"gateways23\"))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create a data store compatible with xarray"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"dataset_path = f\"gateways23/zarrtest-{user}\""
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"try:\n",
" fs.rm(dataset_path, recursive=True)\n",
"except FileNotFoundError:\n",
" pass"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"store = s3fs.S3Map(dataset_path, s3=fs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create an empty Zarr array"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"import zarr\n",
"\n",
"z = zarr.empty(\n",
" shape=(1000, 1000), chunks=(100, 100), dtype=\"f4\", store=store, compression=None\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create a random array in Dask"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"x = da.random.random(size=z.shape, chunks=z.chunks).astype(z.dtype)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Store it in Zarr"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"x.store(z, lock=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Check the content"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['gateways23/zarrtest-zo/.zarray',\n",
" 'gateways23/zarrtest-zo/0.0',\n",
" 'gateways23/zarrtest-zo/0.1',\n",
" 'gateways23/zarrtest-zo/0.2',\n",
" 'gateways23/zarrtest-zo/0.3']"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fs.ls(dataset_path)[:5]"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment