Last active
July 30, 2018 06:31
-
-
Save ccarouge/2783f2eb39b8db2b24f0bf6592176ad2 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Let's say you have 2 datasets coming from different sources but representing the same quantity. You'd like to merge those datasets into a single one via a mean, unfortunately both datasets have missing data at different times and places. \n", | |
"Accordingly, we want the merged dataset to follow these rules:\n", | |
" - if both original datasets have data, we take the mean of both\n", | |
" - if one dataset only has data, we take this data\n", | |
" - if the data is missing in both original datasets, we keep a missing data\n", | |
" \n", | |
"The strategy using xarray is to open each dataset in an DataArray, concatenate both arrays on a new dimension and then average along this dimension." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import xarray as xr\n", | |
"import numpy as np" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"First define 2 arrays of same dimensions with missing data at different places" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"aa = xr.DataArray([[0,1,2],[3,4,np.nan]],dims=('x','y'))\n", | |
"bb = xr.DataArray([[5,np.nan,6],[np.nan,7,np.nan]],dims=('x','y'))" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"<xarray.DataArray (x: 2, y: 3)>\n", | |
"array([[ 0., 1., 2.],\n", | |
" [ 3., 4., nan]])\n", | |
"Dimensions without coordinates: x, y" | |
] | |
}, | |
"execution_count": 4, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"aa" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"<xarray.DataArray (x: 2, y: 3)>\n", | |
"array([[ 5., nan, 6.],\n", | |
" [nan, 7., nan]])\n", | |
"Dimensions without coordinates: x, y" | |
] | |
}, | |
"execution_count": 5, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"bb" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Now, if we simply sum the arrays together, we do not get what we want. The missing value take precedence. That is, if any of the array has a missing value, the sum is missing. So summing and dividing by the number of arrays won't work" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"<xarray.DataArray (x: 2, y: 3)>\n", | |
"array([[ 5., nan, 8.],\n", | |
" [nan, 11., nan]])\n", | |
"Dimensions without coordinates: x, y" | |
] | |
}, | |
"execution_count": 6, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"aa+bb" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"At the opposite, if we can do a mean, it will work as then the missing value is ignored (mean(1,nan) = 1). For this, we need to \"merge\" the arrays into a single array. For this we'll use the `xarray.concat()` method.\n", | |
"\n", | |
"Concatenate the arrays along a new dimension we'll call z" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"cc = xr.concat((aa,bb),'z')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"<xarray.DataArray (z: 2, x: 2, y: 3)>\n", | |
"array([[[ 0., 1., 2.],\n", | |
" [ 3., 4., nan]],\n", | |
"\n", | |
" [[ 5., nan, 6.],\n", | |
" [nan, 7., nan]]])\n", | |
"Dimensions without coordinates: z, x, y" | |
] | |
}, | |
"execution_count": 8, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"cc" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"As you see above the concatenation allows us to have the 2 arrays aligned together in a new array. Now we take advantage of the fact xarray handles missing data correctly. That is, a mean will not count missing data." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"<xarray.DataArray (x: 2, y: 3)>\n", | |
"array([[2.5, 1. , 4. ],\n", | |
" [3. , 5.5, nan]])\n", | |
"Dimensions without coordinates: x, y" | |
] | |
}, | |
"execution_count": 9, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"cc.mean(dim='z')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Usually you would find these last 2 operations combined as you don't need to store the results of the `concat` operation." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"<xarray.DataArray (x: 2, y: 3)>\n", | |
"array([[2.5, 1. , 4. ],\n", | |
" [3. , 5.5, nan]])\n", | |
"Dimensions without coordinates: x, y" | |
] | |
}, | |
"execution_count": 10, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"xr.concat((aa,bb),'z').mean(dim='z')" | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python [conda env:analysis3-18.04]", | |
"language": "python", | |
"name": "conda-env-analysis3-18.04-py" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.6.5" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment