Skip to content

Instantly share code, notes, and snippets.

@ccarouge
Last active July 30, 2018 06:31
Show Gist options
  • Save ccarouge/2783f2eb39b8db2b24f0bf6592176ad2 to your computer and use it in GitHub Desktop.
Save ccarouge/2783f2eb39b8db2b24f0bf6592176ad2 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's say you have 2 datasets coming from different sources but representing the same quantity. You'd like to merge those datasets into a single one via a mean, unfortunately both datasets have missing data at different times and places. \n",
"Accordingly, we want the merged dataset to follow these rules:\n",
" - if both original datasets have data, we take the mean of both\n",
" - if one dataset only has data, we take this data\n",
" - if the data is missing in both original datasets, we keep a missing data\n",
" \n",
"The strategy using xarray is to open each dataset in an DataArray, concatenate both arrays on a new dimension and then average along this dimension."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import xarray as xr\n",
"import numpy as np"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First define 2 arrays of same dimensions with missing data at different places"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"aa = xr.DataArray([[0,1,2],[3,4,np.nan]],dims=('x','y'))\n",
"bb = xr.DataArray([[5,np.nan,6],[np.nan,7,np.nan]],dims=('x','y'))"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray (x: 2, y: 3)>\n",
"array([[ 0., 1., 2.],\n",
" [ 3., 4., nan]])\n",
"Dimensions without coordinates: x, y"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"aa"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray (x: 2, y: 3)>\n",
"array([[ 5., nan, 6.],\n",
" [nan, 7., nan]])\n",
"Dimensions without coordinates: x, y"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"bb"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, if we simply sum the arrays together, we do not get what we want. The missing value take precedence. That is, if any of the array has a missing value, the sum is missing. So summing and dividing by the number of arrays won't work"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray (x: 2, y: 3)>\n",
"array([[ 5., nan, 8.],\n",
" [nan, 11., nan]])\n",
"Dimensions without coordinates: x, y"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"aa+bb"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"At the opposite, if we can do a mean, it will work as then the missing value is ignored (mean(1,nan) = 1). For this, we need to \"merge\" the arrays into a single array. For this we'll use the `xarray.concat()` method.\n",
"\n",
"Concatenate the arrays along a new dimension we'll call z"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"cc = xr.concat((aa,bb),'z')"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray (z: 2, x: 2, y: 3)>\n",
"array([[[ 0., 1., 2.],\n",
" [ 3., 4., nan]],\n",
"\n",
" [[ 5., nan, 6.],\n",
" [nan, 7., nan]]])\n",
"Dimensions without coordinates: z, x, y"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cc"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As you see above the concatenation allows us to have the 2 arrays aligned together in a new array. Now we take advantage of the fact xarray handles missing data correctly. That is, a mean will not count missing data."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray (x: 2, y: 3)>\n",
"array([[2.5, 1. , 4. ],\n",
" [3. , 5.5, nan]])\n",
"Dimensions without coordinates: x, y"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cc.mean(dim='z')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Usually you would find these last 2 operations combined as you don't need to store the results of the `concat` operation."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<xarray.DataArray (x: 2, y: 3)>\n",
"array([[2.5, 1. , 4. ],\n",
" [3. , 5.5, nan]])\n",
"Dimensions without coordinates: x, y"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"xr.concat((aa,bb),'z').mean(dim='z')"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python [conda env:analysis3-18.04]",
"language": "python",
"name": "conda-env-analysis3-18.04-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment