constantinpape/a_tutorial_for_python_and_ome_zarr.ipynb

## a_tutorial_for_python_and_ome_zarr.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "d7cd9d4f",
   "metadata": {},
   "source": [
    "# python & ome.zarr\n",
    "\n",
    "[NGFF](https://ngff.openmicroscopy.org/latest/) is a community driven effort to develop standardized data formats for microscopy that are performant and cloud ready. The primary format is `ome.zarr`, which is based on the popular [zarr](https://github.com/zarr-developers/zarr-python) n-dimensonial array data format.\n",
    "The development effort is ongoing, but the specification for n-dimensional image data is already present (version 0.3 is under development at the time of writing this notebook) and ready for use.\n",
    "For more information check out [this preprint](https://www.biorxiv.org/content/10.1101/2021.03.31.437929v4).\n",
    "\n",
    "Two python packages exist to make life easier for python developers working with the ome.zarr data format:\n",
    "- https://github.com/ome/ome-zarr-py for reading, writing and other functionality\n",
    "- https://github.com/ome/napari-ome-zarr for displaying ome.zarr files in napari.\n",
    "\n",
    "In this notebook, we will use them to create data in the ome.zarr format, read it back into memory, display it with napari and explore additional functionality.\n",
    "\n",
    "NOTE: both this tutorial and the ome-zarr library are WIP and some of the features used in the tutorial may only work on the current main branch and not in the pip package yet."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "595c4fa2",
   "metadata": {},
   "source": [
    "## Installation\n",
    "\n",
    "### Via pip\n",
    "\n",
    "You can install both packages with pip:\n",
    "```\n",
    "$ pip install ome-zarr\n",
    "$ pip install napari-ome-zarr\n",
    "```\n",
    "\n",
    "### Setting up conda envs\n",
    "\n",
    "You can find two conda environment files in this gist: `environment.yaml` for a standard environment and `devel_env.yaml` for a development environment.\n",
    "\n",
    "Check out [the miniconda documentation](https://docs.conda.io/en/latest/miniconda.html) if you need to install conda first.\n",
    "\n",
    "The default environment can be set up via\n",
    "```\n",
    "$ conda env create -f environment.yaml\n",
    "```\n",
    "and then activated via\n",
    "```\n",
    "$ conda activate ome-zarr-py\n",
    "```\n",
    "\n",
    "To set up the development environment, first run\n",
    "```\n",
    "$ conda env create -f devel_env.yaml\n",
    "```\n",
    "then activate it via\n",
    "```\n",
    "$ conda activate ome-zarr-py-dev\n",
    "```\n",
    "and clone the two repositories https://github.com/ome/ome-zarr-py, https://github.com/ome/napari-ome-zarr. Install them by running \n",
    "```\n",
    "$ pip install --no-deps -e .\n",
    "```\n",
    "inside both top-level folders. At the time of writing support for spec v0.3 is still under development, but can be used experimentally by using these two PRs instead of the main branches: https://github.com/ome/ome-zarr-py/pull/89, https://github.com/ome/napari-ome-zarr/pull/8."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b0fae27f",
   "metadata": {},
   "source": [
    "## Writing ome.zarr data\n",
    "\n",
    "Write an example image, represented in memory by a numpy array, to an ome.zarr file on disc."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b236d2d9",
   "metadata": {},
   "outputs": [],
   "source": [
    "# the ome_zarr imports we require\n",
    "import ome_zarr.reader\n",
    "import ome_zarr.scale\n",
    "import ome_zarr.writer\n",
    "# additional imports\n",
    "import zarr\n",
    "from skimage.data import astronaut  # example data\n",
    "\n",
    "image = astronaut().transpose((2, 1, 0))  # transpose to channel first\n",
    "# check the image shape, it should be cyx\n",
    "print(\"Image shape:\", image.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c670b17c",
   "metadata": {},
   "outputs": [],
   "source": [
    "to_5d = True  # convert the data to normalized 5d axes?\n",
    "\n",
    "ngff_version = ome_zarr.format.CurrentFormat().version\n",
    "print(\"Using ngff format version\", ngff_version)\n",
    "\n",
    "# versions 0.1 and 0.2 only support normalized 5d\n",
    "if int(ngff_version.split('.')[1]) <= 2:\n",
    "    to_5d = True\n",
    "\n",
    "if to_5d:\n",
    "    axes = (\"t\", \"c\", \"z\", \"y\", \"x\")\n",
    "    print(\"Convert data to tczyx\")\n",
    "    image = image[None, :, None]  # insert singleton t and z axis\n",
    "    print(\"New shape:\", image.shape)\n",
    "else:\n",
    "    axes = (\"c\", \"y\", \"x\")\n",
    "    print(\"Keep data as cyx\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7276222d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# create multiscale image pyramid (mip) using the scaler class\n",
    "scaler = ome_zarr.scale.Scaler()\n",
    "# TODO support <5d\n",
    "# TODO how do we control downscaling levels and options?\n",
    "mip = scaler.local_mean(image)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "76c74037",
   "metadata": {},
   "outputs": [],
   "source": [
    "file_path = \"my-first.ome.zarr\"  # where to save the ome.zarr file\n",
    "# create a zarr handle in write mode\n",
    "# WARNING: this will delete everything in 'file_path'.\n",
    "# if you don't want this, use mode=\"a\" instead\n",
    "loc = ome_zarr.io.parse_url(file_path, mode=\"w\")  # FIXME 'w' does not truncate!\n",
    "\n",
    "# create a zarr root level group at the file path\n",
    "group = zarr.group(loc.store)\n",
    "\n",
    "# write the actual data\n",
    "ome_zarr.writer.write_multiscale(mip, group)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "265dded9",
   "metadata": {},
   "outputs": [],
   "source": [
    "# convince yourself that the data is there\n",
    "import os\n",
    "print(os.listdir(file_path))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8c1ff8a4",
   "metadata": {},
   "source": [
    "## Reading ome.zarr data\n",
    "\n",
    "Read back the example data from the ome.zarr file into memory."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "66bfa9bf",
   "metadata": {},
   "outputs": [],
   "source": [
    "loc = ome_zarr.io.parse_url(file_path, mode=\"r\")  # open the file in read mode\n",
    "# this will return a reader object, which enables access to the indvidual resolution levels \n",
    "zarr_reader = ome_zarr.reader.Reader(loc).zarr"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c39b61b5",
   "metadata": {},
   "outputs": [],
   "source": [
    "# TODO is there a way to list the available resolution arrays?\n",
    "# the 'load' functionality returns the specified resolution data as a dask array\n",
    "res0 = zarr_reader.load(\"0\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c664f034",
   "metadata": {},
   "outputs": [],
   "source": [
    "# the dask array can be used for lazy computation, or converted to numpy via .compute()\n",
    "# for more information on dask arrays check out https://docs.dask.org/en/latest/array.html\n",
    "full_image_npy = res0.compute()\n",
    "print(full_image_npy.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ddfb0a12",
   "metadata": {},
   "outputs": [],
   "source": [
    "# data slices can be used to select parts of the image.\n",
    "# these will also be returned as dask arrays\n",
    "sub_image = res0[:, 0, :, :256, :256]\n",
    "sub_image_npy = sub_image.compute()\n",
    "print(sub_image_npy.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "77e570a6",
   "metadata": {},
   "source": [
    "## Using napari with ome.zarr\n",
    "\n",
    "Use the napari plugin installed with `napari-ome-zarr` to open ome.zarr files directly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3ad8b8bd",
   "metadata": {},
   "outputs": [],
   "source": [
    "# this is how we can open the file we just wrote in napari\n",
    "import napari\n",
    "viewer = napari.Viewer()\n",
    "viewer.open(file_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3f007a7e",
   "metadata": {},
   "source": [
    "## More functionality\n",
    "\n",
    "The `ome_zarr` library provides additional functionality, like command line tools for inspecting and downloading ome.zarr data from s3."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "61848489",
   "metadata": {},
   "outputs": [],
   "source": [
    "# inspect ome.zarr data on s3\n",
    "!ome_zarr info 'https://s3.embassy.ebi.ac.uk/idr/zarr/v0.1/6001240.zarr/'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d695bc0b",
   "metadata": {},
   "outputs": [],
   "source": [
    "# download the same data from s3\n",
    "!ome_zarr download 'https://s3.embassy.ebi.ac.uk/idr/zarr/v0.1/6001240.zarr/'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ad91f268",
   "metadata": {},
   "outputs": [],
   "source": [
    "# we can also open the ome.zarr file directly in napari via the command line\n",
    "!napari '6001240.zarr'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b5a7cf80",
   "metadata": {},
   "outputs": [],
   "source": [
    "# or we can pass the s3 address to open it on demand\n",
    "!napari 'https://s3.embassy.ebi.ac.uk/idr/zarr/v0.1/6001240.zarr/'"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3.bkp"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}

## dev_env.yaml
name: ome-zarr-py-dev
channels:
    - conda-forge
dependencies:
    - jupyter
    - napari
    - pip
    - requests
    - s3fs
    - scikit-image
    - xarray
    - zarr >= 2.4.0

## environment.yaml
name: ome-zarr-py
channels:
    - conda-forge
dependencies:
    - jupyter
    - napari
    - pip
    - requests
    - opencv  # get rid of opencv dependencies when PR is merged
    - py-opencv
    - s3fs
    - scikit-image
    - xarray
    - zarr >= 2.4.0
    - pip:
        - ome-zarr
        - napari-ome-zarr
	{
	"cells": [
	{
	"cell_type": "markdown",
	"id": "d7cd9d4f",
	"metadata": {},
	"source": [
	"# python & ome.zarr\n",
	"\n",
	"[NGFF](https://ngff.openmicroscopy.org/latest/) is a community driven effort to develop standardized data formats for microscopy that are performant and cloud ready. The primary format is `ome.zarr`, which is based on the popular [zarr](https://github.com/zarr-developers/zarr-python) n-dimensonial array data format.\n",
	"The development effort is ongoing, but the specification for n-dimensional image data is already present (version 0.3 is under development at the time of writing this notebook) and ready for use.\n",
	"For more information check out [this preprint](https://www.biorxiv.org/content/10.1101/2021.03.31.437929v4).\n",
	"\n",
	"Two python packages exist to make life easier for python developers working with the ome.zarr data format:\n",
	"- https://github.com/ome/ome-zarr-py for reading, writing and other functionality\n",
	"- https://github.com/ome/napari-ome-zarr for displaying ome.zarr files in napari.\n",
	"\n",
	"In this notebook, we will use them to create data in the ome.zarr format, read it back into memory, display it with napari and explore additional functionality.\n",
	"\n",
	"NOTE: both this tutorial and the ome-zarr library are WIP and some of the features used in the tutorial may only work on the current main branch and not in the pip package yet."
	]
	},
	{
	"cell_type": "markdown",
	"id": "595c4fa2",
	"metadata": {},
	"source": [
	"## Installation\n",
	"\n",
	"### Via pip\n",
	"\n",
	"You can install both packages with pip:\n",
	"```\n",
	"$ pip install ome-zarr\n",
	"$ pip install napari-ome-zarr\n",
	"```\n",
	"\n",
	"### Setting up conda envs\n",
	"\n",
	"You can find two conda environment files in this gist: `environment.yaml` for a standard environment and `devel_env.yaml` for a development environment.\n",
	"\n",
	"Check out [the miniconda documentation](https://docs.conda.io/en/latest/miniconda.html) if you need to install conda first.\n",
	"\n",
	"The default environment can be set up via\n",
	"```\n",
	"$ conda env create -f environment.yaml\n",
	"```\n",
	"and then activated via\n",
	"```\n",
	"$ conda activate ome-zarr-py\n",
	"```\n",
	"\n",
	"To set up the development environment, first run\n",
	"```\n",
	"$ conda env create -f devel_env.yaml\n",
	"```\n",
	"then activate it via\n",
	"```\n",
	"$ conda activate ome-zarr-py-dev\n",
	"```\n",
	"and clone the two repositories https://github.com/ome/ome-zarr-py, https://github.com/ome/napari-ome-zarr. Install them by running \n",
	"```\n",
	"$ pip install --no-deps -e .\n",
	"```\n",
	"inside both top-level folders. At the time of writing support for spec v0.3 is still under development, but can be used experimentally by using these two PRs instead of the main branches: https://github.com/ome/ome-zarr-py/pull/89, https://github.com/ome/napari-ome-zarr/pull/8."
	]
	},
	{
	"cell_type": "markdown",
	"id": "b0fae27f",
	"metadata": {},
	"source": [
	"## Writing ome.zarr data\n",
	"\n",
	"Write an example image, represented in memory by a numpy array, to an ome.zarr file on disc."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "b236d2d9",
	"metadata": {},
	"outputs": [],
	"source": [
	"# the ome_zarr imports we require\n",
	"import ome_zarr.reader\n",
	"import ome_zarr.scale\n",
	"import ome_zarr.writer\n",
	"# additional imports\n",
	"import zarr\n",
	"from skimage.data import astronaut # example data\n",
	"\n",
	"image = astronaut().transpose((2, 1, 0)) # transpose to channel first\n",
	"# check the image shape, it should be cyx\n",
	"print(\"Image shape:\", image.shape)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "c670b17c",
	"metadata": {},
	"outputs": [],
	"source": [
	"to_5d = True # convert the data to normalized 5d axes?\n",
	"\n",
	"ngff_version = ome_zarr.format.CurrentFormat().version\n",
	"print(\"Using ngff format version\", ngff_version)\n",
	"\n",
	"# versions 0.1 and 0.2 only support normalized 5d\n",
	"if int(ngff_version.split('.')[1]) <= 2:\n",
	" to_5d = True\n",
	"\n",
	"if to_5d:\n",
	" axes = (\"t\", \"c\", \"z\", \"y\", \"x\")\n",
	" print(\"Convert data to tczyx\")\n",
	" image = image[None, :, None] # insert singleton t and z axis\n",
	" print(\"New shape:\", image.shape)\n",
	"else:\n",
	" axes = (\"c\", \"y\", \"x\")\n",
	" print(\"Keep data as cyx\")"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "7276222d",
	"metadata": {},
	"outputs": [],
	"source": [
	"# create multiscale image pyramid (mip) using the scaler class\n",
	"scaler = ome_zarr.scale.Scaler()\n",
	"# TODO support <5d\n",
	"# TODO how do we control downscaling levels and options?\n",
	"mip = scaler.local_mean(image)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "76c74037",
	"metadata": {},
	"outputs": [],
	"source": [
	"file_path = \"my-first.ome.zarr\" # where to save the ome.zarr file\n",
	"# create a zarr handle in write mode\n",
	"# WARNING: this will delete everything in 'file_path'.\n",
	"# if you don't want this, use mode=\"a\" instead\n",
	"loc = ome_zarr.io.parse_url(file_path, mode=\"w\") # FIXME 'w' does not truncate!\n",
	"\n",
	"# create a zarr root level group at the file path\n",
	"group = zarr.group(loc.store)\n",
	"\n",
	"# write the actual data\n",
	"ome_zarr.writer.write_multiscale(mip, group)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "265dded9",
	"metadata": {},
	"outputs": [],
	"source": [
	"# convince yourself that the data is there\n",
	"import os\n",
	"print(os.listdir(file_path))"
	]
	},
	{
	"cell_type": "markdown",
	"id": "8c1ff8a4",
	"metadata": {},
	"source": [
	"## Reading ome.zarr data\n",
	"\n",
	"Read back the example data from the ome.zarr file into memory."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "66bfa9bf",
	"metadata": {},
	"outputs": [],
	"source": [
	"loc = ome_zarr.io.parse_url(file_path, mode=\"r\") # open the file in read mode\n",
	"# this will return a reader object, which enables access to the indvidual resolution levels \n",
	"zarr_reader = ome_zarr.reader.Reader(loc).zarr"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "c39b61b5",
	"metadata": {},
	"outputs": [],
	"source": [
	"# TODO is there a way to list the available resolution arrays?\n",
	"# the 'load' functionality returns the specified resolution data as a dask array\n",
	"res0 = zarr_reader.load(\"0\")"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "c664f034",
	"metadata": {},
	"outputs": [],
	"source": [
	"# the dask array can be used for lazy computation, or converted to numpy via .compute()\n",
	"# for more information on dask arrays check out https://docs.dask.org/en/latest/array.html\n",
	"full_image_npy = res0.compute()\n",
	"print(full_image_npy.shape)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "ddfb0a12",
	"metadata": {},
	"outputs": [],
	"source": [
	"# data slices can be used to select parts of the image.\n",
	"# these will also be returned as dask arrays\n",
	"sub_image = res0[:, 0, :, :256, :256]\n",
	"sub_image_npy = sub_image.compute()\n",
	"print(sub_image_npy.shape)"
	]
	},
	{
	"cell_type": "markdown",
	"id": "77e570a6",
	"metadata": {},
	"source": [
	"## Using napari with ome.zarr\n",
	"\n",
	"Use the napari plugin installed with `napari-ome-zarr` to open ome.zarr files directly."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "3ad8b8bd",
	"metadata": {},
	"outputs": [],
	"source": [
	"# this is how we can open the file we just wrote in napari\n",
	"import napari\n",
	"viewer = napari.Viewer()\n",
	"viewer.open(file_path)"
	]
	},
	{
	"cell_type": "markdown",
	"id": "3f007a7e",
	"metadata": {},
	"source": [
	"## More functionality\n",
	"\n",
	"The `ome_zarr` library provides additional functionality, like command line tools for inspecting and downloading ome.zarr data from s3."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "61848489",
	"metadata": {},
	"outputs": [],
	"source": [
	"# inspect ome.zarr data on s3\n",
	"!ome_zarr info 'https://s3.embassy.ebi.ac.uk/idr/zarr/v0.1/6001240.zarr/'"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "d695bc0b",
	"metadata": {},
	"outputs": [],
	"source": [
	"# download the same data from s3\n",
	"!ome_zarr download 'https://s3.embassy.ebi.ac.uk/idr/zarr/v0.1/6001240.zarr/'"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "ad91f268",
	"metadata": {},
	"outputs": [],
	"source": [
	"# we can also open the ome.zarr file directly in napari via the command line\n",
	"!napari '6001240.zarr'"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "b5a7cf80",
	"metadata": {},
	"outputs": [],
	"source": [
	"# or we can pass the s3 address to open it on demand\n",
	"!napari 'https://s3.embassy.ebi.ac.uk/idr/zarr/v0.1/6001240.zarr/'"
	]
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3",
	"language": "python",
	"name": "python3.bkp"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.7.10"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 5
	}
	name: ome-zarr-py-dev
	channels:
	- conda-forge
	dependencies:
	- jupyter
	- napari
	- pip
	- requests
	- s3fs
	- scikit-image
	- xarray
	- zarr >= 2.4.0
	name: ome-zarr-py
	channels:
	- conda-forge
	dependencies:
	- jupyter
	- napari
	- pip
	- requests
	- opencv # get rid of opencv dependencies when PR is merged
	- py-opencv
	- s3fs
	- scikit-image
	- xarray
	- zarr >= 2.4.0
	- pip:
	- ome-zarr
	- napari-ome-zarr