Skip to content

Instantly share code, notes, and snippets.

@tinaok
Created November 17, 2023 09:30
Show Gist options
  • Save tinaok/d232cb7b9f31fd0cee26ce7c3c865958 to your computer and use it in GitHub Desktop.
Save tinaok/d232cb7b9f31fd0cee26ce7c3c865958 to your computer and use it in GitHub Desktop.
variable_chunking_kerchunk_netcdf.ipynb
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"id": "09fea2b1-64dd-4ffb-a1ff-e33e0ac809d8",
"metadata": {},
"source": [
"# Demo 'optimized' variable chunk sizes used by MPI parallelised Earth System Model.\n",
"\n",
"\n",
"Earth system models that are MPI parallelized decompose the domain and distribute tasks among multiple workers. Some of these models, each MPI task periodically exchange data, and write the computed results to disk.\n",
"\n",
"Uneven distribution of computation tasks among MPI workers can lead to slower overall computation. When the number of tasks cannot be evenly divided among the MPI workers, it is common for the MPI code to distribute the remaining tasks among all workers. This ensures that the worker with the highest workload and the worker with the lowest workload differ by only one task.\n",
"\n",
"The Earth System model generates a significantly large output relative to the computation time required. This is because the focus of this research area involves analyzing time-dependent 3D geospatial material movement. The model seeks a balance between time, 3D accuracy, increasing precision, and data volume.\n",
"\n",
"When utilizing MPI to decompose a computational task, it's often divided into two dimensions (`x` and `y`), sometimes just one dimension. Consequently, the output of the simulation may also be decomposed into two dimensions (`x` and `y`). This optimized 'decomposed' size, within the context of zarr and dask, is 'variable chunk-sized data'. \n",
"Making it possible for zarr to handle ''variable chunk-sized data' will enable these model outputs to take advantage of kerchunk to load these files to Xarray, without rewriting them to 'non-variable chunk-sized ' that zarr spec force us to do today, but rather reading these chunks as they are.\n",
"\n",
"In my use case [(realted to this work)]( https://doi.org/10.5194/egusphere-egu23-15509), 544 MPI processes write decomposed NetCDF outputs simultaneously. The output format is only decomposed in the `y`-direction with a dimension size of 6540, leading to 12 workers with a chunk size of 13 and all remaining 532 workers with chunk size of 12. The total 4D datasets for 1 year is 15To, and we have several different simulation set, each less than 10 years. \n",
"\n",
"To illustrate our problem with a simple example, below i use xarray's tutorial dataset. Let's consider only the first 14 longitudinal dimensions as our computing domoain, and we assume to have 3 MPI process. It is be decomposed as 5, 5, and 4, resulting in MPI processes creating output files named `lon0.nc`, `lon1.nc`, and `lon2.nc`."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "5760d888-4a05-4963-b222-c454199a2191",
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/var/folders/1c/q1jqr0h541n720bvcqb_0rsm001mmz/T/ipykernel_74815/2749165109.py:5: SerializationWarning: saving variable air with floating point data as an integer dtype without any _FillValue to use for NaNs\n",
" ds.isel(lon=slice(0,5)).to_netcdf('lon0.nc')\n",
"/var/folders/1c/q1jqr0h541n720bvcqb_0rsm001mmz/T/ipykernel_74815/2749165109.py:6: SerializationWarning: saving variable air with floating point data as an integer dtype without any _FillValue to use for NaNs\n",
" ds.isel(lon=slice(5,10)).to_netcdf('lon1.nc')\n",
"/var/folders/1c/q1jqr0h541n720bvcqb_0rsm001mmz/T/ipykernel_74815/2749165109.py:7: SerializationWarning: saving variable air with floating point data as an integer dtype without any _FillValue to use for NaNs\n",
" ds.isel(lon=slice(10,14)).to_netcdf('lon2.nc')\n"
]
}
],
"source": [
"import xarray as xr\n",
"ds = xr.tutorial.load_dataset('air_temperature')\n",
"\n",
"\n",
"ds.isel(lon=slice(0,5)).to_netcdf('lon0.nc')\n",
"ds.isel(lon=slice(5,10)).to_netcdf('lon1.nc')\n",
"ds.isel(lon=slice(10,14)).to_netcdf('lon2.nc')\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "cbd76d3e-cba1-4c68-804d-c088878c3a7c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['/Users/todaka/python/git/monitor-sedna/notebook/lon1.nc',\n",
" '/Users/todaka/python/git/monitor-sedna/notebook/lon0.nc',\n",
" '/Users/todaka/python/git/monitor-sedna/notebook/lon2.nc']"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import glob\n",
"\n",
"dir_url = \"/Users/todaka/python/git/monitor-sedna/notebook\"\n",
"file_pattern = \"/lon*.nc\"\n",
"file_paths = glob.glob(dir_url + file_pattern)\n",
"file_paths"
]
},
{
"cell_type": "markdown",
"id": "600237e3-bb2a-4c92-8042-53eea52e24e3",
"metadata": {},
"source": [
"## We can use kerchunk to only the first 2 files with the zarr variable chunk size limitation. "
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "e783c356-7374-4ee2-9dd6-da707c6466ad",
"metadata": {},
"outputs": [],
"source": [
"file_paths=file_paths[0:2]"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "10efeaac-8ae7-42ec-8512-e93e2a8b372c",
"metadata": {},
"outputs": [],
"source": [
"import fsspec\n",
"from kerchunk.hdf import SingleHdf5ToZarr\n",
"def translate_dask(file):\n",
" url = \"file://\" + file\n",
" with fsspec.open(url) as inf:\n",
" h5chunks = SingleHdf5ToZarr(inf, url, inline_threshold=100)\n",
" return h5chunks.translate()\n",
"result=[translate_dask(file) for file in file_paths]\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "2754aaf6-2359-4eb6-997f-17eb4e738ea9",
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div><svg style=\"position: absolute; width: 0; height: 0; overflow: hidden\">\n",
"<defs>\n",
"<symbol id=\"icon-database\" viewBox=\"0 0 32 32\">\n",
"<path d=\"M16 0c-8.837 0-16 2.239-16 5v4c0 2.761 7.163 5 16 5s16-2.239 16-5v-4c0-2.761-7.163-5-16-5z\"></path>\n",
"<path d=\"M16 17c-8.837 0-16-2.239-16-5v6c0 2.761 7.163 5 16 5s16-2.239 16-5v-6c0 2.761-7.163 5-16 5z\"></path>\n",
"<path d=\"M16 26c-8.837 0-16-2.239-16-5v6c0 2.761 7.163 5 16 5s16-2.239 16-5v-6c0 2.761-7.163 5-16 5z\"></path>\n",
"</symbol>\n",
"<symbol id=\"icon-file-text2\" viewBox=\"0 0 32 32\">\n",
"<path d=\"M28.681 7.159c-0.694-0.947-1.662-2.053-2.724-3.116s-2.169-2.030-3.116-2.724c-1.612-1.182-2.393-1.319-2.841-1.319h-15.5c-1.378 0-2.5 1.121-2.5 2.5v27c0 1.378 1.122 2.5 2.5 2.5h23c1.378 0 2.5-1.122 2.5-2.5v-19.5c0-0.448-0.137-1.23-1.319-2.841zM24.543 5.457c0.959 0.959 1.712 1.825 2.268 2.543h-4.811v-4.811c0.718 0.556 1.584 1.309 2.543 2.268zM28 29.5c0 0.271-0.229 0.5-0.5 0.5h-23c-0.271 0-0.5-0.229-0.5-0.5v-27c0-0.271 0.229-0.5 0.5-0.5 0 0 15.499-0 15.5 0v7c0 0.552 0.448 1 1 1h7v19.5z\"></path>\n",
"<path d=\"M23 26h-14c-0.552 0-1-0.448-1-1s0.448-1 1-1h14c0.552 0 1 0.448 1 1s-0.448 1-1 1z\"></path>\n",
"<path d=\"M23 22h-14c-0.552 0-1-0.448-1-1s0.448-1 1-1h14c0.552 0 1 0.448 1 1s-0.448 1-1 1z\"></path>\n",
"<path d=\"M23 18h-14c-0.552 0-1-0.448-1-1s0.448-1 1-1h14c0.552 0 1 0.448 1 1s-0.448 1-1 1z\"></path>\n",
"</symbol>\n",
"</defs>\n",
"</svg>\n",
"<style>/* CSS stylesheet for displaying xarray objects in jupyterlab.\n",
" *\n",
" */\n",
"\n",
":root {\n",
" --xr-font-color0: var(--jp-content-font-color0, rgba(0, 0, 0, 1));\n",
" --xr-font-color2: var(--jp-content-font-color2, rgba(0, 0, 0, 0.54));\n",
" --xr-font-color3: var(--jp-content-font-color3, rgba(0, 0, 0, 0.38));\n",
" --xr-border-color: var(--jp-border-color2, #e0e0e0);\n",
" --xr-disabled-color: var(--jp-layout-color3, #bdbdbd);\n",
" --xr-background-color: var(--jp-layout-color0, white);\n",
" --xr-background-color-row-even: var(--jp-layout-color1, white);\n",
" --xr-background-color-row-odd: var(--jp-layout-color2, #eeeeee);\n",
"}\n",
"\n",
"html[theme=dark],\n",
"body[data-theme=dark],\n",
"body.vscode-dark {\n",
" --xr-font-color0: rgba(255, 255, 255, 1);\n",
" --xr-font-color2: rgba(255, 255, 255, 0.54);\n",
" --xr-font-color3: rgba(255, 255, 255, 0.38);\n",
" --xr-border-color: #1F1F1F;\n",
" --xr-disabled-color: #515151;\n",
" --xr-background-color: #111111;\n",
" --xr-background-color-row-even: #111111;\n",
" --xr-background-color-row-odd: #313131;\n",
"}\n",
"\n",
".xr-wrap {\n",
" display: block !important;\n",
" min-width: 300px;\n",
" max-width: 700px;\n",
"}\n",
"\n",
".xr-text-repr-fallback {\n",
" /* fallback to plain text repr when CSS is not injected (untrusted notebook) */\n",
" display: none;\n",
"}\n",
"\n",
".xr-header {\n",
" padding-top: 6px;\n",
" padding-bottom: 6px;\n",
" margin-bottom: 4px;\n",
" border-bottom: solid 1px var(--xr-border-color);\n",
"}\n",
"\n",
".xr-header > div,\n",
".xr-header > ul {\n",
" display: inline;\n",
" margin-top: 0;\n",
" margin-bottom: 0;\n",
"}\n",
"\n",
".xr-obj-type,\n",
".xr-array-name {\n",
" margin-left: 2px;\n",
" margin-right: 10px;\n",
"}\n",
"\n",
".xr-obj-type {\n",
" color: var(--xr-font-color2);\n",
"}\n",
"\n",
".xr-sections {\n",
" padding-left: 0 !important;\n",
" display: grid;\n",
" grid-template-columns: 150px auto auto 1fr 20px 20px;\n",
"}\n",
"\n",
".xr-section-item {\n",
" display: contents;\n",
"}\n",
"\n",
".xr-section-item input {\n",
" display: none;\n",
"}\n",
"\n",
".xr-section-item input + label {\n",
" color: var(--xr-disabled-color);\n",
"}\n",
"\n",
".xr-section-item input:enabled + label {\n",
" cursor: pointer;\n",
" color: var(--xr-font-color2);\n",
"}\n",
"\n",
".xr-section-item input:enabled + label:hover {\n",
" color: var(--xr-font-color0);\n",
"}\n",
"\n",
".xr-section-summary {\n",
" grid-column: 1;\n",
" color: var(--xr-font-color2);\n",
" font-weight: 500;\n",
"}\n",
"\n",
".xr-section-summary > span {\n",
" display: inline-block;\n",
" padding-left: 0.5em;\n",
"}\n",
"\n",
".xr-section-summary-in:disabled + label {\n",
" color: var(--xr-font-color2);\n",
"}\n",
"\n",
".xr-section-summary-in + label:before {\n",
" display: inline-block;\n",
" content: '►';\n",
" font-size: 11px;\n",
" width: 15px;\n",
" text-align: center;\n",
"}\n",
"\n",
".xr-section-summary-in:disabled + label:before {\n",
" color: var(--xr-disabled-color);\n",
"}\n",
"\n",
".xr-section-summary-in:checked + label:before {\n",
" content: '▼';\n",
"}\n",
"\n",
".xr-section-summary-in:checked + label > span {\n",
" display: none;\n",
"}\n",
"\n",
".xr-section-summary,\n",
".xr-section-inline-details {\n",
" padding-top: 4px;\n",
" padding-bottom: 4px;\n",
"}\n",
"\n",
".xr-section-inline-details {\n",
" grid-column: 2 / -1;\n",
"}\n",
"\n",
".xr-section-details {\n",
" display: none;\n",
" grid-column: 1 / -1;\n",
" margin-bottom: 5px;\n",
"}\n",
"\n",
".xr-section-summary-in:checked ~ .xr-section-details {\n",
" display: contents;\n",
"}\n",
"\n",
".xr-array-wrap {\n",
" grid-column: 1 / -1;\n",
" display: grid;\n",
" grid-template-columns: 20px auto;\n",
"}\n",
"\n",
".xr-array-wrap > label {\n",
" grid-column: 1;\n",
" vertical-align: top;\n",
"}\n",
"\n",
".xr-preview {\n",
" color: var(--xr-font-color3);\n",
"}\n",
"\n",
".xr-array-preview,\n",
".xr-array-data {\n",
" padding: 0 5px !important;\n",
" grid-column: 2;\n",
"}\n",
"\n",
".xr-array-data,\n",
".xr-array-in:checked ~ .xr-array-preview {\n",
" display: none;\n",
"}\n",
"\n",
".xr-array-in:checked ~ .xr-array-data,\n",
".xr-array-preview {\n",
" display: inline-block;\n",
"}\n",
"\n",
".xr-dim-list {\n",
" display: inline-block !important;\n",
" list-style: none;\n",
" padding: 0 !important;\n",
" margin: 0;\n",
"}\n",
"\n",
".xr-dim-list li {\n",
" display: inline-block;\n",
" padding: 0;\n",
" margin: 0;\n",
"}\n",
"\n",
".xr-dim-list:before {\n",
" content: '(';\n",
"}\n",
"\n",
".xr-dim-list:after {\n",
" content: ')';\n",
"}\n",
"\n",
".xr-dim-list li:not(:last-child):after {\n",
" content: ',';\n",
" padding-right: 5px;\n",
"}\n",
"\n",
".xr-has-index {\n",
" font-weight: bold;\n",
"}\n",
"\n",
".xr-var-list,\n",
".xr-var-item {\n",
" display: contents;\n",
"}\n",
"\n",
".xr-var-item > div,\n",
".xr-var-item label,\n",
".xr-var-item > .xr-var-name span {\n",
" background-color: var(--xr-background-color-row-even);\n",
" margin-bottom: 0;\n",
"}\n",
"\n",
".xr-var-item > .xr-var-name:hover span {\n",
" padding-right: 5px;\n",
"}\n",
"\n",
".xr-var-list > li:nth-child(odd) > div,\n",
".xr-var-list > li:nth-child(odd) > label,\n",
".xr-var-list > li:nth-child(odd) > .xr-var-name span {\n",
" background-color: var(--xr-background-color-row-odd);\n",
"}\n",
"\n",
".xr-var-name {\n",
" grid-column: 1;\n",
"}\n",
"\n",
".xr-var-dims {\n",
" grid-column: 2;\n",
"}\n",
"\n",
".xr-var-dtype {\n",
" grid-column: 3;\n",
" text-align: right;\n",
" color: var(--xr-font-color2);\n",
"}\n",
"\n",
".xr-var-preview {\n",
" grid-column: 4;\n",
"}\n",
"\n",
".xr-index-preview {\n",
" grid-column: 2 / 5;\n",
" color: var(--xr-font-color2);\n",
"}\n",
"\n",
".xr-var-name,\n",
".xr-var-dims,\n",
".xr-var-dtype,\n",
".xr-preview,\n",
".xr-attrs dt {\n",
" white-space: nowrap;\n",
" overflow: hidden;\n",
" text-overflow: ellipsis;\n",
" padding-right: 10px;\n",
"}\n",
"\n",
".xr-var-name:hover,\n",
".xr-var-dims:hover,\n",
".xr-var-dtype:hover,\n",
".xr-attrs dt:hover {\n",
" overflow: visible;\n",
" width: auto;\n",
" z-index: 1;\n",
"}\n",
"\n",
".xr-var-attrs,\n",
".xr-var-data,\n",
".xr-index-data {\n",
" display: none;\n",
" background-color: var(--xr-background-color) !important;\n",
" padding-bottom: 5px !important;\n",
"}\n",
"\n",
".xr-var-attrs-in:checked ~ .xr-var-attrs,\n",
".xr-var-data-in:checked ~ .xr-var-data,\n",
".xr-index-data-in:checked ~ .xr-index-data {\n",
" display: block;\n",
"}\n",
"\n",
".xr-var-data > table {\n",
" float: right;\n",
"}\n",
"\n",
".xr-var-name span,\n",
".xr-var-data,\n",
".xr-index-name div,\n",
".xr-index-data,\n",
".xr-attrs {\n",
" padding-left: 25px !important;\n",
"}\n",
"\n",
".xr-attrs,\n",
".xr-var-attrs,\n",
".xr-var-data,\n",
".xr-index-data {\n",
" grid-column: 1 / -1;\n",
"}\n",
"\n",
"dl.xr-attrs {\n",
" padding: 0;\n",
" margin: 0;\n",
" display: grid;\n",
" grid-template-columns: 125px auto;\n",
"}\n",
"\n",
".xr-attrs dt,\n",
".xr-attrs dd {\n",
" padding: 0;\n",
" margin: 0;\n",
" float: left;\n",
" padding-right: 10px;\n",
" width: auto;\n",
"}\n",
"\n",
".xr-attrs dt {\n",
" font-weight: normal;\n",
" grid-column: 1;\n",
"}\n",
"\n",
".xr-attrs dt:hover span {\n",
" display: inline-block;\n",
" background: var(--xr-background-color);\n",
" padding-right: 10px;\n",
"}\n",
"\n",
".xr-attrs dd {\n",
" grid-column: 2;\n",
" white-space: pre-wrap;\n",
" word-break: break-all;\n",
"}\n",
"\n",
".xr-icon-database,\n",
".xr-icon-file-text2,\n",
".xr-no-icon {\n",
" display: inline-block;\n",
" vertical-align: middle;\n",
" width: 1em;\n",
" height: 1.5em !important;\n",
" stroke-width: 0;\n",
" stroke: currentColor;\n",
" fill: currentColor;\n",
"}\n",
"</style><pre class='xr-text-repr-fallback'>&lt;xarray.Dataset&gt;\n",
"Dimensions: (time: 2920, lat: 25, lon: 10)\n",
"Coordinates:\n",
" * lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0\n",
" * lon (lon) float32 200.0 202.5 205.0 207.5 ... 215.0 217.5 220.0 222.5\n",
" * time (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00\n",
"Data variables:\n",
" air (time, lat, lon) float32 dask.array&lt;chunksize=(2920, 25, 5), meta=np.ndarray&gt;\n",
"Attributes:\n",
" Conventions: COARDS\n",
" description: Data is from NMC initialized reanalysis\\n(4x/day). These a...\n",
" platform: Model\n",
" references: http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...\n",
" title: 4x daily NMC reanalysis (1948)</pre><div class='xr-wrap' style='display:none'><div class='xr-header'><div class='xr-obj-type'>xarray.Dataset</div></div><ul class='xr-sections'><li class='xr-section-item'><input id='section-c7fe92b3-eb32-4fde-a698-ccce0a5464fa' class='xr-section-summary-in' type='checkbox' disabled ><label for='section-c7fe92b3-eb32-4fde-a698-ccce0a5464fa' class='xr-section-summary' title='Expand/collapse section'>Dimensions:</label><div class='xr-section-inline-details'><ul class='xr-dim-list'><li><span class='xr-has-index'>time</span>: 2920</li><li><span class='xr-has-index'>lat</span>: 25</li><li><span class='xr-has-index'>lon</span>: 10</li></ul></div><div class='xr-section-details'></div></li><li class='xr-section-item'><input id='section-af42c0bd-8167-489c-b80a-11d1a7f7569e' class='xr-section-summary-in' type='checkbox' checked><label for='section-af42c0bd-8167-489c-b80a-11d1a7f7569e' class='xr-section-summary' >Coordinates: <span>(3)</span></label><div class='xr-section-inline-details'></div><div class='xr-section-details'><ul class='xr-var-list'><li class='xr-var-item'><div class='xr-var-name'><span class='xr-has-index'>lat</span></div><div class='xr-var-dims'>(lat)</div><div class='xr-var-dtype'>float32</div><div class='xr-var-preview xr-preview'>75.0 72.5 70.0 ... 20.0 17.5 15.0</div><input id='attrs-77c9d655-681c-47fb-8eaa-15eade1f0341' class='xr-var-attrs-in' type='checkbox' ><label for='attrs-77c9d655-681c-47fb-8eaa-15eade1f0341' title='Show/Hide attributes'><svg class='icon xr-icon-file-text2'><use xlink:href='#icon-file-text2'></use></svg></label><input id='data-70b6a219-44a0-46d3-a52a-08ea7454a0a8' class='xr-var-data-in' type='checkbox'><label for='data-70b6a219-44a0-46d3-a52a-08ea7454a0a8' title='Show/Hide data repr'><svg class='icon xr-icon-database'><use xlink:href='#icon-database'></use></svg></label><div class='xr-var-attrs'><dl class='xr-attrs'><dt><span>axis :</span></dt><dd>Y</dd><dt><span>long_name :</span></dt><dd>Latitude</dd><dt><span>standard_name :</span></dt><dd>latitude</dd><dt><span>units :</span></dt><dd>degrees_north</dd></dl></div><div class='xr-var-data'><pre>array([75. , 72.5, 70. , 67.5, 65. , 62.5, 60. , 57.5, 55. , 52.5, 50. , 47.5,\n",
" 45. , 42.5, 40. , 37.5, 35. , 32.5, 30. , 27.5, 25. , 22.5, 20. , 17.5,\n",
" 15. ], dtype=float32)</pre></div></li><li class='xr-var-item'><div class='xr-var-name'><span class='xr-has-index'>lon</span></div><div class='xr-var-dims'>(lon)</div><div class='xr-var-dtype'>float32</div><div class='xr-var-preview xr-preview'>200.0 202.5 205.0 ... 220.0 222.5</div><input id='attrs-dc6baff3-fd60-4a61-8c9f-29f4a17009c7' class='xr-var-attrs-in' type='checkbox' ><label for='attrs-dc6baff3-fd60-4a61-8c9f-29f4a17009c7' title='Show/Hide attributes'><svg class='icon xr-icon-file-text2'><use xlink:href='#icon-file-text2'></use></svg></label><input id='data-b42dedac-7b46-4a2c-810d-9f6f9c6f98dd' class='xr-var-data-in' type='checkbox'><label for='data-b42dedac-7b46-4a2c-810d-9f6f9c6f98dd' title='Show/Hide data repr'><svg class='icon xr-icon-database'><use xlink:href='#icon-database'></use></svg></label><div class='xr-var-attrs'><dl class='xr-attrs'><dt><span>axis :</span></dt><dd>X</dd><dt><span>long_name :</span></dt><dd>Longitude</dd><dt><span>standard_name :</span></dt><dd>longitude</dd><dt><span>units :</span></dt><dd>degrees_east</dd></dl></div><div class='xr-var-data'><pre>array([200. , 202.5, 205. , 207.5, 210. , 212.5, 215. , 217.5, 220. , 222.5],\n",
" dtype=float32)</pre></div></li><li class='xr-var-item'><div class='xr-var-name'><span class='xr-has-index'>time</span></div><div class='xr-var-dims'>(time)</div><div class='xr-var-dtype'>datetime64[ns]</div><div class='xr-var-preview xr-preview'>2013-01-01 ... 2014-12-31T18:00:00</div><input id='attrs-b0e3ec11-9d11-4ac0-9999-e05361f9a52f' class='xr-var-attrs-in' type='checkbox' ><label for='attrs-b0e3ec11-9d11-4ac0-9999-e05361f9a52f' title='Show/Hide attributes'><svg class='icon xr-icon-file-text2'><use xlink:href='#icon-file-text2'></use></svg></label><input id='data-444dd489-7205-4ada-bdc7-cc341e58120e' class='xr-var-data-in' type='checkbox'><label for='data-444dd489-7205-4ada-bdc7-cc341e58120e' title='Show/Hide data repr'><svg class='icon xr-icon-database'><use xlink:href='#icon-database'></use></svg></label><div class='xr-var-attrs'><dl class='xr-attrs'><dt><span>long_name :</span></dt><dd>Time</dd><dt><span>standard_name :</span></dt><dd>time</dd></dl></div><div class='xr-var-data'><pre>array([&#x27;2013-01-01T00:00:00.000000000&#x27;, &#x27;2013-01-01T06:00:00.000000000&#x27;,\n",
" &#x27;2013-01-01T12:00:00.000000000&#x27;, ..., &#x27;2014-12-31T06:00:00.000000000&#x27;,\n",
" &#x27;2014-12-31T12:00:00.000000000&#x27;, &#x27;2014-12-31T18:00:00.000000000&#x27;],\n",
" dtype=&#x27;datetime64[ns]&#x27;)</pre></div></li></ul></div></li><li class='xr-section-item'><input id='section-f9fe8dd6-90ea-4739-ac8b-480d751afd8c' class='xr-section-summary-in' type='checkbox' checked><label for='section-f9fe8dd6-90ea-4739-ac8b-480d751afd8c' class='xr-section-summary' >Data variables: <span>(1)</span></label><div class='xr-section-inline-details'></div><div class='xr-section-details'><ul class='xr-var-list'><li class='xr-var-item'><div class='xr-var-name'><span>air</span></div><div class='xr-var-dims'>(time, lat, lon)</div><div class='xr-var-dtype'>float32</div><div class='xr-var-preview xr-preview'>dask.array&lt;chunksize=(2920, 25, 5), meta=np.ndarray&gt;</div><input id='attrs-18849398-2cf4-4335-90e4-9c00a24fc806' class='xr-var-attrs-in' type='checkbox' ><label for='attrs-18849398-2cf4-4335-90e4-9c00a24fc806' title='Show/Hide attributes'><svg class='icon xr-icon-file-text2'><use xlink:href='#icon-file-text2'></use></svg></label><input id='data-2bf28996-c427-4142-bbf7-638f5f4321c0' class='xr-var-data-in' type='checkbox'><label for='data-2bf28996-c427-4142-bbf7-638f5f4321c0' title='Show/Hide data repr'><svg class='icon xr-icon-database'><use xlink:href='#icon-database'></use></svg></label><div class='xr-var-attrs'><dl class='xr-attrs'><dt><span>GRIB_id :</span></dt><dd>11</dd><dt><span>GRIB_name :</span></dt><dd>TMP</dd><dt><span>actual_range :</span></dt><dd>[185.16000366210938, 322.1000061035156]</dd><dt><span>dataset :</span></dt><dd>NMC Reanalysis</dd><dt><span>level_desc :</span></dt><dd>Surface</dd><dt><span>long_name :</span></dt><dd>4xDaily Air temperature at sigma level 995</dd><dt><span>parent_stat :</span></dt><dd>Other</dd><dt><span>precision :</span></dt><dd>2</dd><dt><span>statistic :</span></dt><dd>Individual Obs</dd><dt><span>units :</span></dt><dd>degK</dd><dt><span>var_desc :</span></dt><dd>Air temperature</dd></dl></div><div class='xr-var-data'><table>\n",
" <tr>\n",
" <td>\n",
" <table style=\"border-collapse: collapse;\">\n",
" <thead>\n",
" <tr>\n",
" <td> </td>\n",
" <th> Array </th>\n",
" <th> Chunk </th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" \n",
" <tr>\n",
" <th> Bytes </th>\n",
" <td> 2.78 MiB </td>\n",
" <td> 1.39 MiB </td>\n",
" </tr>\n",
" \n",
" <tr>\n",
" <th> Shape </th>\n",
" <td> (2920, 25, 10) </td>\n",
" <td> (2920, 25, 5) </td>\n",
" </tr>\n",
" <tr>\n",
" <th> Dask graph </th>\n",
" <td colspan=\"2\"> 2 chunks in 2 graph layers </td>\n",
" </tr>\n",
" <tr>\n",
" <th> Data type </th>\n",
" <td colspan=\"2\"> float32 numpy.ndarray </td>\n",
" </tr>\n",
" </tbody>\n",
" </table>\n",
" </td>\n",
" <td>\n",
" <svg width=\"156\" height=\"146\" style=\"stroke:rgb(0,0,0);stroke-width:1\" >\n",
"\n",
" <!-- Horizontal lines -->\n",
" <line x1=\"10\" y1=\"0\" x2=\"80\" y2=\"70\" style=\"stroke-width:2\" />\n",
" <line x1=\"10\" y1=\"25\" x2=\"80\" y2=\"96\" style=\"stroke-width:2\" />\n",
"\n",
" <!-- Vertical lines -->\n",
" <line x1=\"10\" y1=\"0\" x2=\"10\" y2=\"25\" style=\"stroke-width:2\" />\n",
" <line x1=\"80\" y1=\"70\" x2=\"80\" y2=\"96\" style=\"stroke-width:2\" />\n",
"\n",
" <!-- Colored Rectangle -->\n",
" <polygon points=\"10.0,0.0 80.58823529411765,70.58823529411765 80.58823529411765,96.00085180870013 10.0,25.41261651458249\" style=\"fill:#ECB172A0;stroke-width:0\"/>\n",
"\n",
" <!-- Horizontal lines -->\n",
" <line x1=\"10\" y1=\"0\" x2=\"35\" y2=\"0\" style=\"stroke-width:2\" />\n",
" <line x1=\"80\" y1=\"70\" x2=\"106\" y2=\"70\" style=\"stroke-width:2\" />\n",
"\n",
" <!-- Vertical lines -->\n",
" <line x1=\"10\" y1=\"0\" x2=\"80\" y2=\"70\" style=\"stroke-width:2\" />\n",
" <line x1=\"22\" y1=\"0\" x2=\"93\" y2=\"70\" />\n",
" <line x1=\"35\" y1=\"0\" x2=\"106\" y2=\"70\" style=\"stroke-width:2\" />\n",
"\n",
" <!-- Colored Rectangle -->\n",
" <polygon points=\"10.0,0.0 35.41261651458248,0.0 106.00085180870013,70.58823529411765 80.58823529411765,70.58823529411765\" style=\"fill:#ECB172A0;stroke-width:0\"/>\n",
"\n",
" <!-- Horizontal lines -->\n",
" <line x1=\"80\" y1=\"70\" x2=\"106\" y2=\"70\" style=\"stroke-width:2\" />\n",
" <line x1=\"80\" y1=\"96\" x2=\"106\" y2=\"96\" style=\"stroke-width:2\" />\n",
"\n",
" <!-- Vertical lines -->\n",
" <line x1=\"80\" y1=\"70\" x2=\"80\" y2=\"96\" style=\"stroke-width:2\" />\n",
" <line x1=\"93\" y1=\"70\" x2=\"93\" y2=\"96\" />\n",
" <line x1=\"106\" y1=\"70\" x2=\"106\" y2=\"96\" style=\"stroke-width:2\" />\n",
"\n",
" <!-- Colored Rectangle -->\n",
" <polygon points=\"80.58823529411765,70.58823529411765 106.00085180870013,70.58823529411765 106.00085180870013,96.00085180870013 80.58823529411765,96.00085180870013\" style=\"fill:#ECB172A0;stroke-width:0\"/>\n",
"\n",
" <!-- Text -->\n",
" <text x=\"93.294544\" y=\"116.000852\" font-size=\"1.0rem\" font-weight=\"100\" text-anchor=\"middle\" >10</text>\n",
" <text x=\"126.000852\" y=\"83.294544\" font-size=\"1.0rem\" font-weight=\"100\" text-anchor=\"middle\" transform=\"rotate(0,126.000852,83.294544)\">25</text>\n",
" <text x=\"35.294118\" y=\"80.706734\" font-size=\"1.0rem\" font-weight=\"100\" text-anchor=\"middle\" transform=\"rotate(45,35.294118,80.706734)\">2920</text>\n",
"</svg>\n",
" </td>\n",
" </tr>\n",
"</table></div></li></ul></div></li><li class='xr-section-item'><input id='section-7af3b6f4-37e6-47dc-bb84-08b1c13626f5' class='xr-section-summary-in' type='checkbox' ><label for='section-7af3b6f4-37e6-47dc-bb84-08b1c13626f5' class='xr-section-summary' >Indexes: <span>(3)</span></label><div class='xr-section-inline-details'></div><div class='xr-section-details'><ul class='xr-var-list'><li class='xr-var-item'><div class='xr-index-name'><div>lat</div></div><div class='xr-index-preview'>PandasIndex</div><div></div><input id='index-bcce77ad-8c33-4f4d-977f-e74824af1ee5' class='xr-index-data-in' type='checkbox'/><label for='index-bcce77ad-8c33-4f4d-977f-e74824af1ee5' title='Show/Hide index repr'><svg class='icon xr-icon-database'><use xlink:href='#icon-database'></use></svg></label><div class='xr-index-data'><pre>PandasIndex(Index([75.0, 72.5, 70.0, 67.5, 65.0, 62.5, 60.0, 57.5, 55.0, 52.5, 50.0, 47.5,\n",
" 45.0, 42.5, 40.0, 37.5, 35.0, 32.5, 30.0, 27.5, 25.0, 22.5, 20.0, 17.5,\n",
" 15.0],\n",
" dtype=&#x27;float32&#x27;, name=&#x27;lat&#x27;))</pre></div></li><li class='xr-var-item'><div class='xr-index-name'><div>lon</div></div><div class='xr-index-preview'>PandasIndex</div><div></div><input id='index-c5e66ea2-2f72-4fc9-8842-fcb63ae1b627' class='xr-index-data-in' type='checkbox'/><label for='index-c5e66ea2-2f72-4fc9-8842-fcb63ae1b627' title='Show/Hide index repr'><svg class='icon xr-icon-database'><use xlink:href='#icon-database'></use></svg></label><div class='xr-index-data'><pre>PandasIndex(Index([200.0, 202.5, 205.0, 207.5, 210.0, 212.5, 215.0, 217.5, 220.0, 222.5], dtype=&#x27;float32&#x27;, name=&#x27;lon&#x27;))</pre></div></li><li class='xr-var-item'><div class='xr-index-name'><div>time</div></div><div class='xr-index-preview'>PandasIndex</div><div></div><input id='index-47ae3791-8652-4d08-8bd3-f202fdfbf853' class='xr-index-data-in' type='checkbox'/><label for='index-47ae3791-8652-4d08-8bd3-f202fdfbf853' title='Show/Hide index repr'><svg class='icon xr-icon-database'><use xlink:href='#icon-database'></use></svg></label><div class='xr-index-data'><pre>PandasIndex(DatetimeIndex([&#x27;2013-01-01 00:00:00&#x27;, &#x27;2013-01-01 06:00:00&#x27;,\n",
" &#x27;2013-01-01 12:00:00&#x27;, &#x27;2013-01-01 18:00:00&#x27;,\n",
" &#x27;2013-01-02 00:00:00&#x27;, &#x27;2013-01-02 06:00:00&#x27;,\n",
" &#x27;2013-01-02 12:00:00&#x27;, &#x27;2013-01-02 18:00:00&#x27;,\n",
" &#x27;2013-01-03 00:00:00&#x27;, &#x27;2013-01-03 06:00:00&#x27;,\n",
" ...\n",
" &#x27;2014-12-29 12:00:00&#x27;, &#x27;2014-12-29 18:00:00&#x27;,\n",
" &#x27;2014-12-30 00:00:00&#x27;, &#x27;2014-12-30 06:00:00&#x27;,\n",
" &#x27;2014-12-30 12:00:00&#x27;, &#x27;2014-12-30 18:00:00&#x27;,\n",
" &#x27;2014-12-31 00:00:00&#x27;, &#x27;2014-12-31 06:00:00&#x27;,\n",
" &#x27;2014-12-31 12:00:00&#x27;, &#x27;2014-12-31 18:00:00&#x27;],\n",
" dtype=&#x27;datetime64[ns]&#x27;, name=&#x27;time&#x27;, length=2920, freq=None))</pre></div></li></ul></div></li><li class='xr-section-item'><input id='section-a97bbb3a-8e6d-496b-896b-806edc16e0ed' class='xr-section-summary-in' type='checkbox' checked><label for='section-a97bbb3a-8e6d-496b-896b-806edc16e0ed' class='xr-section-summary' >Attributes: <span>(5)</span></label><div class='xr-section-inline-details'></div><div class='xr-section-details'><dl class='xr-attrs'><dt><span>Conventions :</span></dt><dd>COARDS</dd><dt><span>description :</span></dt><dd>Data is from NMC initialized reanalysis\n",
"(4x/day). These are the 0.9950 sigma level values.</dd><dt><span>platform :</span></dt><dd>Model</dd><dt><span>references :</span></dt><dd>http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis.html</dd><dt><span>title :</span></dt><dd>4x daily NMC reanalysis (1948)</dd></dl></div></li></ul></div></div>"
],
"text/plain": [
"<xarray.Dataset>\n",
"Dimensions: (time: 2920, lat: 25, lon: 10)\n",
"Coordinates:\n",
" * lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0\n",
" * lon (lon) float32 200.0 202.5 205.0 207.5 ... 215.0 217.5 220.0 222.5\n",
" * time (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00\n",
"Data variables:\n",
" air (time, lat, lon) float32 dask.array<chunksize=(2920, 25, 5), meta=np.ndarray>\n",
"Attributes:\n",
" Conventions: COARDS\n",
" description: Data is from NMC initialized reanalysis\\n(4x/day). These a...\n",
" platform: Model\n",
" references: http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...\n",
" title: 4x daily NMC reanalysis (1948)"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from kerchunk.combine import MultiZarrToZarr\n",
"\n",
"mzz = MultiZarrToZarr(\n",
" result,\n",
" concat_dims=[\"lon\"],\n",
")\n",
"a = mzz.translate()\n",
"xr.open_dataset(a,engine='kerchunk',chunks={})"
]
},
{
"cell_type": "markdown",
"id": "2786170e-01c7-4a2d-9d61-fbf2421bee55",
"metadata": {},
"source": [
"## Here I try the proposition made by @ivirshup at https://gist.github.com/ivirshup/9ba2b570d541ff1393990f632bc7a6ea"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "80e195f1-183f-48de-b4b2-eace99756005",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"id": "aceab7e9-4f5e-4af7-a18e-0a37919e88d3",
"metadata": {},
"outputs": [],
"source": [
"!pip install git+https://github.com/ivirshup/kerchunk.git@concat-varchunks\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "cd14b3d6-9ed8-4ab2-b802-9b2de3b31fb3",
"metadata": {},
"outputs": [],
"source": [
"\n",
"import zarr\n",
"\n",
"\n",
"from kerchunk.zarr import single_zarr\n",
"from kerchunk.combine import merge_vars, concatenate_arrays\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "4d2c0fe4-f0fd-4b00-af00-6c456583dc8a",
"metadata": {},
"outputs": [],
"source": [
"def concatenate_zarr_csr_arrays(arrays: list[zarr.Group]) -> dict:\n",
" # Group metadata\n",
" #print(arrays)\n",
" shapes = [array.attrs[\"shape\"] for array in arrays]\n",
" merged_shape = [sum(s[0] for s in shapes), shapes[0][1]]\n",
" metadata = {\"encoding-type\": \"csr_matrix\", \"encoding-version\": \"0.1.0\", \"shape\": merged_shape}\n",
"\n",
" # Concatenating indices & data\n",
" references = [single_zarr(array.store.dir_path(array.path)) for array in arrays]\n",
" data_refs = concatenate_arrays(references, path=\"data\")\n",
" indices_refs = concatenate_arrays(references, path=\"indices\")\n",
"\n",
" # Concatenating indptr in memory\n",
" indptrs = [array[\"indptr\"][:] for array in arrays]\n",
"\n",
" new_indptrs = []\n",
" pos = indptrs[0][-1]\n",
" for indptr in indptrs[1:]:\n",
" new_indptr = indptr[1:]\n",
" new_indptr += pos\n",
" new_indptrs.append(new_indptr)\n",
" pos = new_indptr[-1]\n",
" new_indptr = np.concatenate([indptrs[0]] + new_indptrs)\n",
" new_indptr_zarr = zarr.array(new_indptr)\n",
" \n",
" indptr_refs = {\n",
" \"version\": 1,\n",
" \"refs\": {f\"indptr/{k}\": v for k, v in new_indptr_zarr.store.items()}\n",
" }\n",
"\n",
" # Merging into group\n",
" merged = merge_vars([data_refs, indices_refs, indptr_refs])\n",
" # Setting .zattrs\n",
" merged[\"refs\"][\".zattrs\"] = json.dumps(metadata)\n",
" \n",
" return merged\n"
]
},
{
"cell_type": "markdown",
"id": "6471e793-3c83-486a-9111-5be860794fee",
"metadata": {},
"source": [
"### I'm sure i'm not using the example well, what should I do?"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "61457510-bde9-47be-9908-e18587271b0e",
"metadata": {},
"outputs": [
{
"ename": "AttributeError",
"evalue": "'dict' object has no attribute 'attrs'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[10], line 10\u001b[0m\n\u001b[1;32m 7\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m h5chunks\u001b[38;5;241m.\u001b[39mtranslate()\n\u001b[1;32m 8\u001b[0m result\u001b[38;5;241m=\u001b[39m[translate(file) \u001b[38;5;28;01mfor\u001b[39;00m file \u001b[38;5;129;01min\u001b[39;00m file_paths]\n\u001b[0;32m---> 10\u001b[0m \u001b[43mconcatenate_zarr_csr_arrays\u001b[49m\u001b[43m(\u001b[49m\u001b[43mresult\u001b[49m\u001b[43m)\u001b[49m\n",
"Cell \u001b[0;32mIn[9], line 4\u001b[0m, in \u001b[0;36mconcatenate_zarr_csr_arrays\u001b[0;34m(arrays)\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mconcatenate_zarr_csr_arrays\u001b[39m(arrays: \u001b[38;5;28mlist\u001b[39m[zarr\u001b[38;5;241m.\u001b[39mGroup]) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m \u001b[38;5;28mdict\u001b[39m:\n\u001b[1;32m 2\u001b[0m \u001b[38;5;66;03m# Group metadata\u001b[39;00m\n\u001b[1;32m 3\u001b[0m \u001b[38;5;66;03m#print(arrays)\u001b[39;00m\n\u001b[0;32m----> 4\u001b[0m shapes \u001b[38;5;241m=\u001b[39m [array\u001b[38;5;241m.\u001b[39mattrs[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mshape\u001b[39m\u001b[38;5;124m\"\u001b[39m] \u001b[38;5;28;01mfor\u001b[39;00m array \u001b[38;5;129;01min\u001b[39;00m arrays]\n\u001b[1;32m 5\u001b[0m merged_shape \u001b[38;5;241m=\u001b[39m [\u001b[38;5;28msum\u001b[39m(s[\u001b[38;5;241m0\u001b[39m] \u001b[38;5;28;01mfor\u001b[39;00m s \u001b[38;5;129;01min\u001b[39;00m shapes), shapes[\u001b[38;5;241m0\u001b[39m][\u001b[38;5;241m1\u001b[39m]]\n\u001b[1;32m 6\u001b[0m metadata \u001b[38;5;241m=\u001b[39m {\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mencoding-type\u001b[39m\u001b[38;5;124m\"\u001b[39m: \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mcsr_matrix\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mencoding-version\u001b[39m\u001b[38;5;124m\"\u001b[39m: \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m0.1.0\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mshape\u001b[39m\u001b[38;5;124m\"\u001b[39m: merged_shape}\n",
"Cell \u001b[0;32mIn[9], line 4\u001b[0m, in \u001b[0;36m<listcomp>\u001b[0;34m(.0)\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mconcatenate_zarr_csr_arrays\u001b[39m(arrays: \u001b[38;5;28mlist\u001b[39m[zarr\u001b[38;5;241m.\u001b[39mGroup]) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m \u001b[38;5;28mdict\u001b[39m:\n\u001b[1;32m 2\u001b[0m \u001b[38;5;66;03m# Group metadata\u001b[39;00m\n\u001b[1;32m 3\u001b[0m \u001b[38;5;66;03m#print(arrays)\u001b[39;00m\n\u001b[0;32m----> 4\u001b[0m shapes \u001b[38;5;241m=\u001b[39m [\u001b[43marray\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mattrs\u001b[49m[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mshape\u001b[39m\u001b[38;5;124m\"\u001b[39m] \u001b[38;5;28;01mfor\u001b[39;00m array \u001b[38;5;129;01min\u001b[39;00m arrays]\n\u001b[1;32m 5\u001b[0m merged_shape \u001b[38;5;241m=\u001b[39m [\u001b[38;5;28msum\u001b[39m(s[\u001b[38;5;241m0\u001b[39m] \u001b[38;5;28;01mfor\u001b[39;00m s \u001b[38;5;129;01min\u001b[39;00m shapes), shapes[\u001b[38;5;241m0\u001b[39m][\u001b[38;5;241m1\u001b[39m]]\n\u001b[1;32m 6\u001b[0m metadata \u001b[38;5;241m=\u001b[39m {\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mencoding-type\u001b[39m\u001b[38;5;124m\"\u001b[39m: \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mcsr_matrix\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mencoding-version\u001b[39m\u001b[38;5;124m\"\u001b[39m: \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m0.1.0\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mshape\u001b[39m\u001b[38;5;124m\"\u001b[39m: merged_shape}\n",
"\u001b[0;31mAttributeError\u001b[0m: 'dict' object has no attribute 'attrs'"
]
}
],
"source": [
"import fsspec\n",
"from kerchunk.hdf import SingleHdf5ToZarr\n",
"def translate(file):\n",
" url = \"file://\" + file\n",
" with fsspec.open(url) as inf:\n",
" h5chunks = SingleHdf5ToZarr(inf, url, inline_threshold=100)\n",
" return h5chunks.translate()\n",
"result=[translate(file) for file in file_paths]\n",
"\n",
"concatenate_zarr_csr_arrays(result)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5b9861b5-7ccd-4094-962d-c6cea8f3ed20",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment