Created
March 29, 2022 15:26
-
-
Save aaronspring/7b9ea18127dca56d5478af5cd0aadc86 to your computer and use it in GitHub Desktop.
cdo and compressed netcdf files from xarray
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"id": "db55a025-cab3-4c77-8298-b648d6222b39", | |
"metadata": {}, | |
"source": [ | |
"# `cdo` with compressed data" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "937d9e10-acb0-480e-9f64-7c108f383479", | |
"metadata": {}, | |
"source": [ | |
"> I want to compress the real information of `netcdf` files based on `bitinformation.jl` as showcased [in this gist](https://gist.github.com/aaronspring/383dbbfe31baa4618c5b0dbef4f6d574), i.e. to `bitround` and `compress` existing `netcdf` `MPI-M` output and replace previous files with smaller new files, while still being able to work with all tools: `xarray`, `cdo`, ... \n", | |
"\n", | |
"> However, when I compress the rounded data with the `xarray` standard `zlib`, these files are very slow in `cdo`, whereas they remain still quite fast in `xarray`." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"id": "de099b6d-c713-4c20-9bac-9722e5d3bacf", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import xarray as xr\n", | |
"import numpy as np" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"id": "1147f4f5-1052-44b9-9015-101afc6d55bd", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"path = f\"/work/ik1017/CMIP6/data/CMIP6/CMIP/MPI-M/MPI-ESM1-2-LR/historical/r1i1p1f1/Omon/fgco2/gn/v20190710/fgco2_Omon_MPI-ESM1-2-LR_historical_r1i1p1f1_gn_201001-201412.nc\"" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"id": "3437fd71-ebbf-4333-8aad-17ac7547c6ec", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"\u001b[0;1m File format\u001b[0m : NetCDF4 classic zip\n", | |
"\u001b[0;1m -1 : Institut Source T Steptype Levels Num Points Num Dtype : Parameter ID\u001b[0m\n", | |
" 1 : \u001b[34mMPIMET MPI-ESM1.2-LR v instant \u001b[0m\u001b[32m 1 \u001b[0m 1 \u001b[32m 56320 \u001b[0m 1 \u001b[34m F32z \u001b[0m: -1 \n", | |
"\u001b[0;1m Grid coordinates\u001b[0m :\n", | |
" 1 : \u001b[34mcurvilinear \u001b[0m : \u001b[32mpoints=56320 (256x220)\u001b[0m\n", | |
" longitude : 0.007175368 to 359.996 degrees_east\n", | |
" latitude : -83.96551 to 89.7266 degrees_north\n", | |
" available : cellbounds\n", | |
"\u001b[34m mapping\u001b[0m : \u001b[32mProjection\n", | |
"\u001b[0m i : 0 to 255 by 1 1\n", | |
" j : 0 to 219 by 1 1\n", | |
"\u001b[0;1m Vertical coordinates\u001b[0m :\n", | |
" 1 : \u001b[34mdepth_below_sea \u001b[0m :\u001b[32m levels=1 scalar\u001b[0m\n", | |
" depth : 0 m\n", | |
"\u001b[0;1m Time coordinate\u001b[0m : \u001b[32m60 steps\n", | |
"\u001b[0m RefTime = 1850-01-01 00:00:00 Units = days Calendar = proleptic_gregorian Bounds = true\n", | |
" YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss\n", | |
"\u001b[35m 2010-01-16 12:00:00 2010-02-15 00:00:00 2010-03-16 12:00:00 2010-04-16 00:00:00\n", | |
" 2010-05-16 12:00:00 2010-06-16 00:00:00 2010-07-16 12:00:00 2010-08-16 12:00:00\n", | |
" 2010-09-16 00:00:00 2010-10-16 12:00:00 2010-11-16 00:00:00 2010-12-16 12:00:00\n", | |
" 2011-01-16 12:00:00 2011-02-15 00:00:00 2011-03-16 12:00:00 2011-04-16 00:00:00\n", | |
" 2011-05-16 12:00:00 2011-06-16 00:00:00 2011-07-16 12:00:00 2011-08-16 12:00:00\n", | |
" 2011-09-16 00:00:00 2011-10-16 12:00:00 2011-11-16 00:00:00 2011-12-16 12:00:00\n", | |
" 2012-01-16 12:00:00 2012-02-15 12:00:00 2012-03-16 12:00:00 2012-04-16 00:00:00\n", | |
" 2012-05-16 12:00:00 2012-06-16 00:00:00 2012-07-16 12:00:00 2012-08-16 12:00:00\n", | |
" 2012-09-16 00:00:00 2012-10-16 12:00:00 2012-11-16 00:00:00 2012-12-16 12:00:00\n", | |
" 2013-01-16 12:00:00 2013-02-15 00:00:00 2013-03-16 12:00:00 2013-04-16 00:00:00\n", | |
" 2013-05-16 12:00:00 2013-06-16 00:00:00 2013-07-16 12:00:00 2013-08-16 12:00:00\n", | |
" 2013-09-16 00:00:00 2013-10-16 12:00:00 2013-11-16 00:00:00 2013-12-16 12:00:00\n", | |
" 2014-01-16 12:00:00 2014-02-15 00:00:00 2014-03-16 12:00:00 2014-04-16 00:00:00\n", | |
" 2014-05-16 12:00:00 2014-06-16 00:00:00 2014-07-16 12:00:00 2014-08-16 12:00:00\n", | |
" 2014-09-16 00:00:00 2014-10-16 12:00:00 2014-11-16 00:00:00 2014-12-16 12:00:00\u001b[0m\n", | |
"\u001b[32mcdo sinfo: \u001b[0mProcessed 1 variable over 60 timesteps [0.05s 103MB].\n" | |
] | |
} | |
], | |
"source": [ | |
"!cdo sinfo /work/ik1017/CMIP6/data/CMIP6/CMIP/MPI-M/MPI-ESM1-2-LR/historical/r1i1p1f1/Omon/fgco2/gn/v20190710/fgco2_Omon_MPI-ESM1-2-LR_historical_r1i1p1f1_gn_201001-201412.nc" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 26, | |
"id": "f2dbaee8-8e85-4ea4-8326-8bfb350f0af3", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"9.4M\t/work/ik1017/CMIP6/data/CMIP6/CMIP/MPI-M/MPI-ESM1-2-LR/historical/r1i1p1f1/Omon/fgco2/gn/v20190710/fgco2_Omon_MPI-ESM1-2-LR_historical_r1i1p1f1_gn_201001-201412.nc\n" | |
] | |
} | |
], | |
"source": [ | |
"!du -hs /work/ik1017/CMIP6/data/CMIP6/CMIP/MPI-M/MPI-ESM1-2-LR/historical/r1i1p1f1/Omon/fgco2/gn/v20190710/fgco2_Omon_MPI-ESM1-2-LR_historical_r1i1p1f1_gn_201001-201412.nc" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"id": "b89e3226-ea9b-4339-87c4-e21b5ceaaf4d", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"v='fgco2'" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 16, | |
"id": "ee356597-8f84-40ae-aded-239f4a46727a", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"ori = xr.open_dataset(path)[v]" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "2297d559-9dfd-47f8-adac-8b9132274531", | |
"metadata": {}, | |
"source": [ | |
"# save to disk" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 56, | |
"id": "586f3acd-43d8-4426-8f13-c9752c47f4e3", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"\u001b[0;1m File format\u001b[0m : NetCDF4 classic zip\n", | |
"\u001b[0;1m -1 : Institut Source T Steptype Levels Num Points Num Dtype : Parameter ID\u001b[0m\n", | |
" 1 : \u001b[34mMPIMET MPI-ESM1.2-LR v instant \u001b[0m\u001b[32m 1 \u001b[0m 1 \u001b[32m 56320 \u001b[0m 1 \u001b[34m F32z \u001b[0m: -1 \n", | |
"\u001b[0;1m Grid coordinates\u001b[0m :\n", | |
" 1 : \u001b[34mcurvilinear \u001b[0m : \u001b[32mpoints=56320 (256x220)\u001b[0m\n", | |
" longitude : 0.007175368 to 359.996 degrees_east\n", | |
" latitude : -83.96551 to 89.7266 degrees_north\n", | |
" available : cellbounds\n", | |
"\u001b[34m mapping\u001b[0m : \u001b[32mProjection\n", | |
"\u001b[0m i : 0 to 255 by 1 1\n", | |
" j : 0 to 219 by 1 1\n", | |
"\u001b[0;1m Vertical coordinates\u001b[0m :\n", | |
" 1 : \u001b[34mdepth_below_sea \u001b[0m :\u001b[32m levels=1 scalar\u001b[0m\n", | |
" depth : 0 m\n", | |
"\u001b[0;1m Time coordinate\u001b[0m : \u001b[32m60 steps\n", | |
"\u001b[0m RefTime = 1850-01-01 00:00:00 Units = days Calendar = proleptic_gregorian Bounds = true\n", | |
" YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss\n", | |
"\u001b[35m 2010-01-16 12:00:00 2010-02-15 00:00:00 2010-03-16 12:00:00 2010-04-16 00:00:00\n", | |
" 2010-05-16 12:00:00 2010-06-16 00:00:00 2010-07-16 12:00:00 2010-08-16 12:00:00\n", | |
" 2010-09-16 00:00:00 2010-10-16 12:00:00 2010-11-16 00:00:00 2010-12-16 12:00:00\n", | |
" 2011-01-16 12:00:00 2011-02-15 00:00:00 2011-03-16 12:00:00 2011-04-16 00:00:00\n", | |
" 2011-05-16 12:00:00 2011-06-16 00:00:00 2011-07-16 12:00:00 2011-08-16 12:00:00\n", | |
" 2011-09-16 00:00:00 2011-10-16 12:00:00 2011-11-16 00:00:00 2011-12-16 12:00:00\n", | |
" 2012-01-16 12:00:00 2012-02-15 12:00:00 2012-03-16 12:00:00 2012-04-16 00:00:00\n", | |
" 2012-05-16 12:00:00 2012-06-16 00:00:00 2012-07-16 12:00:00 2012-08-16 12:00:00\n", | |
" 2012-09-16 00:00:00 2012-10-16 12:00:00 2012-11-16 00:00:00 2012-12-16 12:00:00\n", | |
" 2013-01-16 12:00:00 2013-02-15 00:00:00 2013-03-16 12:00:00 2013-04-16 00:00:00\n", | |
" 2013-05-16 12:00:00 2013-06-16 00:00:00 2013-07-16 12:00:00 2013-08-16 12:00:00\n", | |
" 2013-09-16 00:00:00 2013-10-16 12:00:00 2013-11-16 00:00:00 2013-12-16 12:00:00\n", | |
" 2014-01-16 12:00:00 2014-02-15 00:00:00 2014-03-16 12:00:00 2014-04-16 00:00:00\n", | |
" 2014-05-16 12:00:00 2014-06-16 00:00:00 2014-07-16 12:00:00 2014-08-16 12:00:00\n", | |
" 2014-09-16 00:00:00 2014-10-16 12:00:00 2014-11-16 00:00:00 2014-12-16 12:00:00\u001b[0m\n", | |
"\u001b[32mcdo sinfo: \u001b[0mProcessed 1 variable over 60 timesteps [0.04s 213MB].\n" | |
] | |
} | |
], | |
"source": [ | |
"# I assume this file was created with cdo -szip\n", | |
"!cdo sinfo /work/ik1017/CMIP6/data/CMIP6/CMIP/MPI-M/MPI-ESM1-2-LR/historical/r1i1p1f1/Omon/fgco2/gn/v20190710/fgco2_Omon_MPI-ESM1-2-LR_historical_r1i1p1f1_gn_201001-201412.nc" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 43, | |
"id": "4d405d1e-5c42-447e-982b-262ced8a756b", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"!cp /work/ik1017/CMIP6/data/CMIP6/CMIP/MPI-M/MPI-ESM1-2-LR/historical/r1i1p1f1/Omon/fgco2/gn/v20190710/fgco2_Omon_MPI-ESM1-2-LR_historical_r1i1p1f1_gn_201001-201412.nc test2/ori_szip.nc" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 32, | |
"id": "3475b95c-fbc6-4e6e-9e10-b8e95f34cb08", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"(Frozen({'time': 60, 'j': 220, 'i': 256}), dtype('float32'))" | |
] | |
}, | |
"execution_count": 32, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"ori.sizes, ori.dtype" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 34, | |
"id": "c41612e7-6f32-4254-b856-bc5183986dbc", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"CPU times: user 274 ms, sys: 35 ms, total: 309 ms\n", | |
"Wall time: 325 ms\n" | |
] | |
} | |
], | |
"source": [ | |
"%time ori.to_dataset().to_netcdf(\"/work/mh0727/m300524/bitinformation/test2/ori_nc4.nc\")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 35, | |
"id": "76c11c79-4f1e-49b9-97cc-aaae253139cf", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"encoding={v:{'zlib':True, 'shuffle':True, 'complevel':9}}" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 36, | |
"id": "d605dd15-5c2b-4e3f-bed1-812211512113", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"CPU times: user 3.12 s, sys: 7 ms, total: 3.13 s\n", | |
"Wall time: 3.12 s\n" | |
] | |
} | |
], | |
"source": [ | |
"%%time \n", | |
"ori.to_dataset().to_netcdf(\"/work/mh0727/m300524/bitinformation/test2/ori_compressed.nc\",\n", | |
" encoding=encoding, unlimited_dims=[], format='NETCDF4')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "5d711bd6-ab61-400c-a9ec-f31d32261f6d", | |
"metadata": {}, | |
"source": [ | |
"## another implementation of bitround done by `ncks`" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "33ebb576-0c49-4d3f-ad0c-c903efe6bf62", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"!ncks --baa=8 --ppc fgco2=5 ori_nc4.nc /work/mh0727/m300524/bitinformation/test2/round5_ncks.nc" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 73, | |
"id": "9543f4e2-a3ef-4a6f-998b-1c1c2a3b5330", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"7.7M\t/work/mh0727/m300524/bitinformation/test2/ori_compressed.nc\n", | |
"8.1M\t/work/mh0727/m300524/bitinformation/test2/ori_nc4.nc\n", | |
"9.4M\t/work/mh0727/m300524/bitinformation/test2/ori_szip.nc\n", | |
"3.6M\t/work/mh0727/m300524/bitinformation/test2/round5_ncks.nc\n" | |
] | |
} | |
], | |
"source": [ | |
"!du -hs /work/mh0727/m300524/bitinformation/test2/*" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "0e4c7eca-a0b1-4fae-b390-96589e5b758a", | |
"metadata": {}, | |
"source": [ | |
"> So there is not much disk storaged gained by this compression, but in [this gist](https://gist.github.com/aaronspring/383dbbfe31baa4618c5b0dbef4f6d574), I gain up to factor 10." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "3952f822-760a-4c37-acab-0d856c4c417e", | |
"metadata": {}, | |
"source": [ | |
"# working with compressed files" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "c2f74340-f2e1-4c62-8580-62756c759260", | |
"metadata": {}, | |
"source": [ | |
"## `cdo` seems slow\n", | |
"\n", | |
"- which decompressions are native to `cdo`?\n", | |
"\n", | |
"Context: \"File size can be misleading though: NetCDF and GRIB2 have very effective compression algorisms built-in (zip-compressed nc4, aec/szip compressed grb2). The downside is that in both cases decompression is slow. Especially with large horizontal fields the time for decompressing supersedes the saved read-in time compared to uncompressed data. These compressions are essentially made for saving storage space, but not for extensive work with the data.\" https://code.mpimet.mpg.de/projects/cdo/wiki/Tutorial#Tips-and-tricks-for-high-resolution-data" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 74, | |
"id": "91e5acfc-1fa6-43ae-93a0-e97181d9eb97", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"/sw/rhel6-x64/cdo/cdo-1.9.10-magicsxx-gcc64/bin/cdo\n" | |
] | |
} | |
], | |
"source": [ | |
"!which cdo" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 69, | |
"id": "af4c00e2-6c39-48ac-a765-67d25db97fd1", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"\u001b[32mcdo(1) fldmean: \u001b[0mProcess started\n", | |
"Warning (cdfScanVarAttr): NetCDF: Variable not found - time_bnds\n", | |
"Warning (cdfScanVarAttr): NetCDF: Variable not found - vertices_latitude\n", | |
"Warning (cdfScanVarAttr): NetCDF: Variable not found - vertices_longitude\n", | |
"cdo timmean: 1%\u001b[33mcdo(1) fldmean (Warning): \u001b[0mGrid cell bounds not available, using constant grid cell area weights!\n", | |
" 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 5 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7 7 7 8 8 8 8 8 8 9 9 9 9 9 910\u001b[32mcdo(1) fldmean: \u001b[0mProcessed 3379200 values from 1 variable over 60 timesteps.\n", | |
"\u001b[32mcdo timmean: \u001b[0mProcessed 60 values from 1 variable over 60 timesteps [0.13s 264MB].\n" | |
] | |
} | |
], | |
"source": [ | |
"!cdo -timmean -fldmean /work/mh0727/m300524/bitinformation/test2/ori_nc4.nc dummy.nc" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 70, | |
"id": "7d607930-7056-4003-bc4b-0b23b2bc77bc", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"\u001b[32mcdo(1) fldmean: \u001b[0mProcess started\n", | |
"cdo timmean: 1%cdo(1) fldmean: 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 910 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 5 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7 7 7 8 8 8 8 8 8 9 9 9 9 9 9100%\u001b[32mcdo(1) fldmean: \u001b[0mProcessed 3379200 values from 1 variable over 60 timesteps.\n", | |
"\u001b[32mcdo timmean: \u001b[0mProcessed 60 values from 1 variable over 60 timesteps [0.17s 264MB].\n" | |
] | |
} | |
], | |
"source": [ | |
"!cdo -timmean -fldmean /work/mh0727/m300524/bitinformation/test2/ori_szip.nc dummy.nc" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 71, | |
"id": "44e77648-bef0-4c1d-9b50-edfd1057c6bb", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"\u001b[32mcdo(1) fldmean: \u001b[0mProcess started\n", | |
"Warning (cdfScanVarAttr): NetCDF: Variable not found - time_bnds\n", | |
"Warning (cdfScanVarAttr): NetCDF: Variable not found - vertices_latitude\n", | |
"Warning (cdfScanVarAttr): NetCDF: Variable not found - vertices_longitude\n", | |
"cdo timmean: 1%\u001b[33mcdo(1) fldmean (Warning): \u001b[0mGrid cell bounds not available, using constant grid cell area weights!\n", | |
" 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 5 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7 7 7 8 8 8 8 8 8 9 9 9 9 9 910\u001b[32mcdo(1) fldmean: \u001b[0mProcessed 3379200 values from 1 variable over 60 timesteps.\n", | |
"\u001b[32mcdo timmean: \u001b[0mProcessed 60 values from 1 variable over 60 timesteps [1.55s 264MB].\n" | |
] | |
} | |
], | |
"source": [ | |
"!cdo -timmean -fldmean /work/mh0727/m300524/bitinformation/test2/ori_compressed.nc dummy.nc" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "260560c3-156b-4d8a-8d98-2a12d6e9cacb", | |
"metadata": {}, | |
"source": [ | |
"> `zlib`-compressed files from `xarray` are very slow in `cdo`" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 72, | |
"id": "be6d4fff-36e0-4a1e-942c-245215063848", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"\u001b[32mcdo(1) fldmean: \u001b[0mProcess started\n", | |
"Warning (cdfScanVarAttr): NetCDF: Variable not found - vertices_latitude\n", | |
"Warning (cdfScanVarAttr): NetCDF: Variable not found - vertices_longitude\n", | |
"Warning (cdfScanVarAttr): NetCDF: Variable not found - time_bnds\n", | |
"cdo timmean: 1%\u001b[33mcdo(1) fldmean (Warning): \u001b[0mGrid cell bounds not available, using constant grid cell area weights!\n", | |
" 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 5 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7 7 7 8 8 8 8 8 8 9 9 9 9 9 910\u001b[32mcdo(1) fldmean: \u001b[0mProcessed 3379200 values from 1 variable over 60 timesteps.\n", | |
"\u001b[32mcdo timmean: \u001b[0mProcessed 60 values from 1 variable over 60 timesteps [0.12s 264MB].\n" | |
] | |
} | |
], | |
"source": [ | |
"!cdo -timmean -fldmean /work/mh0727/m300524/bitinformation/test2/round5_ncks.nc dummy.nc" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "b1468b71-43fc-4c8f-a13c-0be2522a1f0c", | |
"metadata": {}, | |
"source": [ | |
"> Apparently compression by `ncks` doesnt harm `cdo`'s I/O." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "7df1fb32-8274-487e-9347-23430c06db87", | |
"metadata": {}, | |
"source": [ | |
"## `xarray`\n", | |
"\n", | |
"- is fast" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 65, | |
"id": "61efe600-2a54-4ada-83ef-4eed184fcfce", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"CPU times: user 95 ms, sys: 17 ms, total: 112 ms\n", | |
"Wall time: 108 ms\n" | |
] | |
} | |
], | |
"source": [ | |
"%time _ = xr.open_dataset(\"/work/mh0727/m300524/bitinformation/test2/ori_nc4.nc\").mean().compute()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 66, | |
"id": "68b0fc94-4680-4f0b-9001-e38131d74233", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"CPU times: user 107 ms, sys: 19 ms, total: 126 ms\n", | |
"Wall time: 124 ms\n" | |
] | |
} | |
], | |
"source": [ | |
"%time _ = xr.open_dataset(\"/work/mh0727/m300524/bitinformation/test2/ori_szip.nc\").mean().compute()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 67, | |
"id": "7215f972-7634-407a-9537-0bccc6c30ef7", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"CPU times: user 79 ms, sys: 13 ms, total: 92 ms\n", | |
"Wall time: 89.5 ms\n" | |
] | |
} | |
], | |
"source": [ | |
"%time _ = xr.open_dataset(\"/work/mh0727/m300524/bitinformation/test2/ori_compressed.nc\").mean().compute()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 68, | |
"id": "bc7ec227-b706-4c78-bd43-f1cc8b820c56", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"CPU times: user 79 ms, sys: 9 ms, total: 88 ms\n", | |
"Wall time: 85.9 ms\n" | |
] | |
} | |
], | |
"source": [ | |
"%time _ = xr.open_dataset(\"/work/mh0727/m300524/bitinformation/test2/round5_ncks.nc\").mean().compute()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "1494f4f5-0fec-4080-bd11-d91352e00adc", | |
"metadata": {}, | |
"outputs": [], | |
"source": [] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python [conda env:mistral]", | |
"language": "python", | |
"name": "conda-env-mistral-py" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.8.6" | |
}, | |
"toc-autonumbering": true, | |
"toc-showcode": false | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 5 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment