Skip to content

Instantly share code, notes, and snippets.

@tinaok
Last active October 1, 2019 14:27
Show Gist options
  • Save tinaok/c2ef193e94508a5ba426979d01e99307 to your computer and use it in GitHub Desktop.
Save tinaok/c2ef193e94508a5ba426979d01e99307 to your computer and use it in GitHub Desktop.
automatic chunk size
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import dask.array as da\n",
"import dask"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dask.array<standard_normal, shape=(20834, 384, 320), dtype=float64, chunksize=(380, 192, 320)>"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dask.config.set({\"array.chunk-size\": '370MB'})\n",
"da.random.RandomState(0).standard_normal((20834, 384, 320), chunks='auto') "
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"186777600"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"380*192*320*8"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When specifying chunk size as 370MB in dask, dask array creates chunksize 187MB."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dask.array<standard_normal, shape=(20834, 384, 320), dtype=float64, chunksize=(386, 384, 320)>"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dask.config.set({\"array.chunk-size\": '380MB'})\n",
"da.random.RandomState(0).standard_normal((20834, 384, 320), chunks='auto') "
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"379453440"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"386*384*320*8"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When specifying chunk size as 380MB in dask, dask array creates chunksize 380MB."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Is this a bug? or due to this specific array shape? What can we do to have chunk_size as we desired with automatic chunking scheme?"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dask.array<standard_normal, shape=(20834, 384, 320), dtype=float64, chunksize=(155, 128, 80)>"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dask.config.set({\"array.chunk-size\": '30MB'})\n",
"da.random.RandomState(0).standard_normal((20834, 384, 320), chunks='auto') "
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"12697600"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"155*128* 80*8"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When specifying chunk size as 30MB, same problem as specifying 370MB, dask create only 13MB of chunk."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "pangeobench",
"language": "python",
"name": "pangeobench"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment