pletchm/Xarray Example Cheatsheet.ipynb

## Xarray Example Cheatsheet.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Cheatsheet Outline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1. Common Xarray-related Imports\n",
    "2. Creating Xarray DataArrays\n",
    "3. Converting from Xarray to Pandas\n",
    "4. Converting from Pandas to Xarray\n",
    "5. Reading and writing Xarrays to netCDF files\n",
    "6. Slicing and dicing data\n",
    "7. Changing values\n",
    "8. Data Reduction\n",
    "9. Vectorized operations\n",
    "10. Changing and adding coordinates/Expanding or broadcasting dimensions\n",
    "11. Datasets"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 1. Common Xarray-related Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "import xarray as xr"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 2. Creating Xarray DataArrays\n",
    "There are two kinds of data structures in Xarray: DataArrays and Datasets. We'll start with DataArrays, because Datasets are actually just a collection of DataArrays. Also Datasets are less commonly useful. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "da = xr.DataArray(\n",
    "    data=np.random.random([2, 3, 11, 100]),\n",
    "    dims=[\"sex_id\", \"age_group_id\", \"year_id\", \"draw\"],\n",
    "    coords={\n",
    "        \"sex_id\": [1, 2],\n",
    "        \"age_group_id\": [11, 12, 13],\n",
    "        \"year_id\": range(1990, 2000+1),\n",
    "        \"draw\": range(100),\n",
    "        },\n",
    "    name=\"fake_thing\"\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 11, draw: 100)>\n",
       "array([[[[0.996779, ..., 0.193241],\n",
       "         ...,\n",
       "         [0.50371 , ..., 0.654273]],\n",
       "\n",
       "        ...,\n",
       "\n",
       "        [[0.227811, ..., 0.912856],\n",
       "         ...,\n",
       "         [0.24642 , ..., 0.581184]]],\n",
       "\n",
       "\n",
       "       [[[0.072334, ..., 0.684663],\n",
       "         ...,\n",
       "         [0.628984, ..., 0.358811]],\n",
       "\n",
       "        ...,\n",
       "\n",
       "        [[0.491698, ..., 0.876439],\n",
       "         ...,\n",
       "         [0.829525, ..., 0.611719]]]])\n",
       "Coordinates:\n",
       "  * sex_id        (sex_id) int64 1 2\n",
       "  * age_group_id  (age_group_id) int64 11 12 13\n",
       "  * year_id       (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000\n",
       "  * draw          (draw) int64 0 1 2 3 4 5 6 7 8 ... 91 92 93 94 95 96 97 98 99"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "da"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 3. Converting from Xarray to Pandas"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age_group_id</th>\n",
       "      <th>sex_id</th>\n",
       "      <th>year_id</th>\n",
       "      <th>draw_0</th>\n",
       "      <th>draw_1</th>\n",
       "      <th>draw_2</th>\n",
       "      <th>draw_3</th>\n",
       "      <th>draw_4</th>\n",
       "      <th>draw_5</th>\n",
       "      <th>draw_6</th>\n",
       "      <th>...</th>\n",
       "      <th>draw_90</th>\n",
       "      <th>draw_91</th>\n",
       "      <th>draw_92</th>\n",
       "      <th>draw_93</th>\n",
       "      <th>draw_94</th>\n",
       "      <th>draw_95</th>\n",
       "      <th>draw_96</th>\n",
       "      <th>draw_97</th>\n",
       "      <th>draw_98</th>\n",
       "      <th>draw_99</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>11</td>\n",
       "      <td>1</td>\n",
       "      <td>1990</td>\n",
       "      <td>0.996779</td>\n",
       "      <td>0.761632</td>\n",
       "      <td>0.350849</td>\n",
       "      <td>0.750393</td>\n",
       "      <td>0.433888</td>\n",
       "      <td>0.764425</td>\n",
       "      <td>0.122375</td>\n",
       "      <td>...</td>\n",
       "      <td>0.225819</td>\n",
       "      <td>0.836557</td>\n",
       "      <td>0.885162</td>\n",
       "      <td>0.222884</td>\n",
       "      <td>0.641429</td>\n",
       "      <td>0.393851</td>\n",
       "      <td>0.381577</td>\n",
       "      <td>0.294711</td>\n",
       "      <td>0.650573</td>\n",
       "      <td>0.193241</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>11</td>\n",
       "      <td>1</td>\n",
       "      <td>1991</td>\n",
       "      <td>0.786725</td>\n",
       "      <td>0.014690</td>\n",
       "      <td>0.184935</td>\n",
       "      <td>0.269309</td>\n",
       "      <td>0.493112</td>\n",
       "      <td>0.365666</td>\n",
       "      <td>0.573797</td>\n",
       "      <td>...</td>\n",
       "      <td>0.196058</td>\n",
       "      <td>0.190651</td>\n",
       "      <td>0.266525</td>\n",
       "      <td>0.453888</td>\n",
       "      <td>0.333859</td>\n",
       "      <td>0.377547</td>\n",
       "      <td>0.304548</td>\n",
       "      <td>0.035076</td>\n",
       "      <td>0.905141</td>\n",
       "      <td>0.262088</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>11</td>\n",
       "      <td>1</td>\n",
       "      <td>1992</td>\n",
       "      <td>0.851437</td>\n",
       "      <td>0.367362</td>\n",
       "      <td>0.736778</td>\n",
       "      <td>0.500674</td>\n",
       "      <td>0.885498</td>\n",
       "      <td>0.350236</td>\n",
       "      <td>0.837336</td>\n",
       "      <td>...</td>\n",
       "      <td>0.526494</td>\n",
       "      <td>0.398270</td>\n",
       "      <td>0.609992</td>\n",
       "      <td>0.480893</td>\n",
       "      <td>0.261509</td>\n",
       "      <td>0.537468</td>\n",
       "      <td>0.326550</td>\n",
       "      <td>0.393128</td>\n",
       "      <td>0.236991</td>\n",
       "      <td>0.239981</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>11</td>\n",
       "      <td>1</td>\n",
       "      <td>1993</td>\n",
       "      <td>0.431025</td>\n",
       "      <td>0.786596</td>\n",
       "      <td>0.385705</td>\n",
       "      <td>0.140987</td>\n",
       "      <td>0.742205</td>\n",
       "      <td>0.380742</td>\n",
       "      <td>0.247266</td>\n",
       "      <td>...</td>\n",
       "      <td>0.811025</td>\n",
       "      <td>0.964106</td>\n",
       "      <td>0.484327</td>\n",
       "      <td>0.387248</td>\n",
       "      <td>0.862704</td>\n",
       "      <td>0.320871</td>\n",
       "      <td>0.288251</td>\n",
       "      <td>0.752603</td>\n",
       "      <td>0.482269</td>\n",
       "      <td>0.423913</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>11</td>\n",
       "      <td>1</td>\n",
       "      <td>1994</td>\n",
       "      <td>0.715613</td>\n",
       "      <td>0.937093</td>\n",
       "      <td>0.276558</td>\n",
       "      <td>0.155267</td>\n",
       "      <td>0.892415</td>\n",
       "      <td>0.782576</td>\n",
       "      <td>0.620654</td>\n",
       "      <td>...</td>\n",
       "      <td>0.038820</td>\n",
       "      <td>0.025020</td>\n",
       "      <td>0.422900</td>\n",
       "      <td>0.139842</td>\n",
       "      <td>0.229250</td>\n",
       "      <td>0.092306</td>\n",
       "      <td>0.262763</td>\n",
       "      <td>0.009972</td>\n",
       "      <td>0.457518</td>\n",
       "      <td>0.653466</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 103 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   age_group_id  sex_id  year_id    draw_0    draw_1    draw_2    draw_3  \\\n",
       "0            11       1     1990  0.996779  0.761632  0.350849  0.750393   \n",
       "1            11       1     1991  0.786725  0.014690  0.184935  0.269309   \n",
       "2            11       1     1992  0.851437  0.367362  0.736778  0.500674   \n",
       "3            11       1     1993  0.431025  0.786596  0.385705  0.140987   \n",
       "4            11       1     1994  0.715613  0.937093  0.276558  0.155267   \n",
       "\n",
       "     draw_4    draw_5    draw_6  ...   draw_90   draw_91   draw_92   draw_93  \\\n",
       "0  0.433888  0.764425  0.122375  ...  0.225819  0.836557  0.885162  0.222884   \n",
       "1  0.493112  0.365666  0.573797  ...  0.196058  0.190651  0.266525  0.453888   \n",
       "2  0.885498  0.350236  0.837336  ...  0.526494  0.398270  0.609992  0.480893   \n",
       "3  0.742205  0.380742  0.247266  ...  0.811025  0.964106  0.484327  0.387248   \n",
       "4  0.892415  0.782576  0.620654  ...  0.038820  0.025020  0.422900  0.139842   \n",
       "\n",
       "    draw_94   draw_95   draw_96   draw_97   draw_98   draw_99  \n",
       "0  0.641429  0.393851  0.381577  0.294711  0.650573  0.193241  \n",
       "1  0.333859  0.377547  0.304548  0.035076  0.905141  0.262088  \n",
       "2  0.261509  0.537468  0.326550  0.393128  0.236991  0.239981  \n",
       "3  0.862704  0.320871  0.288251  0.752603  0.482269  0.423913  \n",
       "4  0.229250  0.092306  0.262763  0.009972  0.457518  0.653466  \n",
       "\n",
       "[5 rows x 103 columns]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "draw_cols = [\"draw_{}\".format(i) for i in range(100)]\n",
    "da_draw_dim = da.assign_coords(draw=draw_cols)\n",
    "ds = da_draw_dim.to_dataset(dim=\"draw\")\n",
    "df = ds.to_dataframe().reset_index()\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 11)>\n",
       "array([[[0.515046, 0.482924, 0.519905, 0.46185 , 0.465647, 0.438642,\n",
       "         0.484231, 0.543931, 0.512703, 0.495674, 0.490514],\n",
       "        [0.479378, 0.483819, 0.535785, 0.557384, 0.493535, 0.504168,\n",
       "         0.482273, 0.471792, 0.508302, 0.46291 , 0.481167],\n",
       "        [0.501667, 0.520518, 0.521487, 0.511592, 0.554994, 0.520812,\n",
       "         0.515645, 0.575567, 0.48787 , 0.472843, 0.548013]],\n",
       "\n",
       "       [[0.480634, 0.501324, 0.48249 , 0.489984, 0.434139, 0.482908,\n",
       "         0.469676, 0.505307, 0.528676, 0.498105, 0.496416],\n",
       "        [0.476805, 0.533403, 0.530798, 0.54454 , 0.511816, 0.480597,\n",
       "         0.497687, 0.487296, 0.473503, 0.477073, 0.450402],\n",
       "        [0.507428, 0.505268, 0.473393, 0.504639, 0.492401, 0.478396,\n",
       "         0.455247, 0.459401, 0.555081, 0.500056, 0.529371]]])\n",
       "Coordinates:\n",
       "  * sex_id        (sex_id) int64 1 2\n",
       "  * age_group_id  (age_group_id) int64 11 12 13\n",
       "  * year_id       (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "da.mean(\"draw\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>sex_id</th>\n",
       "      <th>age_group_id</th>\n",
       "      <th>year_id</th>\n",
       "      <th>mean</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>11</td>\n",
       "      <td>1990</td>\n",
       "      <td>0.515046</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>11</td>\n",
       "      <td>1991</td>\n",
       "      <td>0.482924</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>11</td>\n",
       "      <td>1992</td>\n",
       "      <td>0.519905</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>11</td>\n",
       "      <td>1993</td>\n",
       "      <td>0.461850</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>11</td>\n",
       "      <td>1994</td>\n",
       "      <td>0.465647</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   sex_id  age_group_id  year_id      mean\n",
       "0       1            11     1990  0.515046\n",
       "1       1            11     1991  0.482924\n",
       "2       1            11     1992  0.519905\n",
       "3       1            11     1993  0.461850\n",
       "4       1            11     1994  0.465647"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mean_df = da.mean(\"draw\").rename(\"mean\").to_dataframe().reset_index()\n",
    "mean_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 4. Converting from Pandas to Xarray"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'mean' (sex_id: 2, age_group_id: 3, year_id: 11)>\n",
       "array([[[0.515046, 0.482924, 0.519905, 0.46185 , 0.465647, 0.438642,\n",
       "         0.484231, 0.543931, 0.512703, 0.495674, 0.490514],\n",
       "        [0.479378, 0.483819, 0.535785, 0.557384, 0.493535, 0.504168,\n",
       "         0.482273, 0.471792, 0.508302, 0.46291 , 0.481167],\n",
       "        [0.501667, 0.520518, 0.521487, 0.511592, 0.554994, 0.520812,\n",
       "         0.515645, 0.575567, 0.48787 , 0.472843, 0.548013]],\n",
       "\n",
       "       [[0.480634, 0.501324, 0.48249 , 0.489984, 0.434139, 0.482908,\n",
       "         0.469676, 0.505307, 0.528676, 0.498105, 0.496416],\n",
       "        [0.476805, 0.533403, 0.530798, 0.54454 , 0.511816, 0.480597,\n",
       "         0.497687, 0.487296, 0.473503, 0.477073, 0.450402],\n",
       "        [0.507428, 0.505268, 0.473393, 0.504639, 0.492401, 0.478396,\n",
       "         0.455247, 0.459401, 0.555081, 0.500056, 0.529371]]])\n",
       "Coordinates:\n",
       "  * sex_id        (sex_id) int64 1 2\n",
       "  * age_group_id  (age_group_id) int64 11 12 13\n",
       "  * year_id       (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mean_da = mean_df.set_index([\"sex_id\", \"age_group_id\", \"year_id\"]).to_xarray()[\"mean\"]\n",
    "mean_da"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age_group_id</th>\n",
       "      <th>sex_id</th>\n",
       "      <th>year_id</th>\n",
       "      <th>draw_0</th>\n",
       "      <th>draw_1</th>\n",
       "      <th>draw_2</th>\n",
       "      <th>draw_3</th>\n",
       "      <th>draw_4</th>\n",
       "      <th>draw_5</th>\n",
       "      <th>draw_6</th>\n",
       "      <th>...</th>\n",
       "      <th>draw_90</th>\n",
       "      <th>draw_91</th>\n",
       "      <th>draw_92</th>\n",
       "      <th>draw_93</th>\n",
       "      <th>draw_94</th>\n",
       "      <th>draw_95</th>\n",
       "      <th>draw_96</th>\n",
       "      <th>draw_97</th>\n",
       "      <th>draw_98</th>\n",
       "      <th>draw_99</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>11</td>\n",
       "      <td>1</td>\n",
       "      <td>1990</td>\n",
       "      <td>0.996779</td>\n",
       "      <td>0.761632</td>\n",
       "      <td>0.350849</td>\n",
       "      <td>0.750393</td>\n",
       "      <td>0.433888</td>\n",
       "      <td>0.764425</td>\n",
       "      <td>0.122375</td>\n",
       "      <td>...</td>\n",
       "      <td>0.225819</td>\n",
       "      <td>0.836557</td>\n",
       "      <td>0.885162</td>\n",
       "      <td>0.222884</td>\n",
       "      <td>0.641429</td>\n",
       "      <td>0.393851</td>\n",
       "      <td>0.381577</td>\n",
       "      <td>0.294711</td>\n",
       "      <td>0.650573</td>\n",
       "      <td>0.193241</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>11</td>\n",
       "      <td>1</td>\n",
       "      <td>1991</td>\n",
       "      <td>0.786725</td>\n",
       "      <td>0.014690</td>\n",
       "      <td>0.184935</td>\n",
       "      <td>0.269309</td>\n",
       "      <td>0.493112</td>\n",
       "      <td>0.365666</td>\n",
       "      <td>0.573797</td>\n",
       "      <td>...</td>\n",
       "      <td>0.196058</td>\n",
       "      <td>0.190651</td>\n",
       "      <td>0.266525</td>\n",
       "      <td>0.453888</td>\n",
       "      <td>0.333859</td>\n",
       "      <td>0.377547</td>\n",
       "      <td>0.304548</td>\n",
       "      <td>0.035076</td>\n",
       "      <td>0.905141</td>\n",
       "      <td>0.262088</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>11</td>\n",
       "      <td>1</td>\n",
       "      <td>1992</td>\n",
       "      <td>0.851437</td>\n",
       "      <td>0.367362</td>\n",
       "      <td>0.736778</td>\n",
       "      <td>0.500674</td>\n",
       "      <td>0.885498</td>\n",
       "      <td>0.350236</td>\n",
       "      <td>0.837336</td>\n",
       "      <td>...</td>\n",
       "      <td>0.526494</td>\n",
       "      <td>0.398270</td>\n",
       "      <td>0.609992</td>\n",
       "      <td>0.480893</td>\n",
       "      <td>0.261509</td>\n",
       "      <td>0.537468</td>\n",
       "      <td>0.326550</td>\n",
       "      <td>0.393128</td>\n",
       "      <td>0.236991</td>\n",
       "      <td>0.239981</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>11</td>\n",
       "      <td>1</td>\n",
       "      <td>1993</td>\n",
       "      <td>0.431025</td>\n",
       "      <td>0.786596</td>\n",
       "      <td>0.385705</td>\n",
       "      <td>0.140987</td>\n",
       "      <td>0.742205</td>\n",
       "      <td>0.380742</td>\n",
       "      <td>0.247266</td>\n",
       "      <td>...</td>\n",
       "      <td>0.811025</td>\n",
       "      <td>0.964106</td>\n",
       "      <td>0.484327</td>\n",
       "      <td>0.387248</td>\n",
       "      <td>0.862704</td>\n",
       "      <td>0.320871</td>\n",
       "      <td>0.288251</td>\n",
       "      <td>0.752603</td>\n",
       "      <td>0.482269</td>\n",
       "      <td>0.423913</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>11</td>\n",
       "      <td>1</td>\n",
       "      <td>1994</td>\n",
       "      <td>0.715613</td>\n",
       "      <td>0.937093</td>\n",
       "      <td>0.276558</td>\n",
       "      <td>0.155267</td>\n",
       "      <td>0.892415</td>\n",
       "      <td>0.782576</td>\n",
       "      <td>0.620654</td>\n",
       "      <td>...</td>\n",
       "      <td>0.038820</td>\n",
       "      <td>0.025020</td>\n",
       "      <td>0.422900</td>\n",
       "      <td>0.139842</td>\n",
       "      <td>0.229250</td>\n",
       "      <td>0.092306</td>\n",
       "      <td>0.262763</td>\n",
       "      <td>0.009972</td>\n",
       "      <td>0.457518</td>\n",
       "      <td>0.653466</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 103 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   age_group_id  sex_id  year_id    draw_0    draw_1    draw_2    draw_3  \\\n",
       "0            11       1     1990  0.996779  0.761632  0.350849  0.750393   \n",
       "1            11       1     1991  0.786725  0.014690  0.184935  0.269309   \n",
       "2            11       1     1992  0.851437  0.367362  0.736778  0.500674   \n",
       "3            11       1     1993  0.431025  0.786596  0.385705  0.140987   \n",
       "4            11       1     1994  0.715613  0.937093  0.276558  0.155267   \n",
       "\n",
       "     draw_4    draw_5    draw_6  ...   draw_90   draw_91   draw_92   draw_93  \\\n",
       "0  0.433888  0.764425  0.122375  ...  0.225819  0.836557  0.885162  0.222884   \n",
       "1  0.493112  0.365666  0.573797  ...  0.196058  0.190651  0.266525  0.453888   \n",
       "2  0.885498  0.350236  0.837336  ...  0.526494  0.398270  0.609992  0.480893   \n",
       "3  0.742205  0.380742  0.247266  ...  0.811025  0.964106  0.484327  0.387248   \n",
       "4  0.892415  0.782576  0.620654  ...  0.038820  0.025020  0.422900  0.139842   \n",
       "\n",
       "    draw_94   draw_95   draw_96   draw_97   draw_98   draw_99  \n",
       "0  0.641429  0.393851  0.381577  0.294711  0.650573  0.193241  \n",
       "1  0.333859  0.377547  0.304548  0.035076  0.905141  0.262088  \n",
       "2  0.261509  0.537468  0.326550  0.393128  0.236991  0.239981  \n",
       "3  0.862704  0.320871  0.288251  0.752603  0.482269  0.423913  \n",
       "4  0.229250  0.092306  0.262763  0.009972  0.457518  0.653466  \n",
       "\n",
       "[5 rows x 103 columns]"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 11, draw: 100)>\n",
       "array([[[[0.996779, ..., 0.193241],\n",
       "         ...,\n",
       "         [0.50371 , ..., 0.654273]],\n",
       "\n",
       "        ...,\n",
       "\n",
       "        [[0.227811, ..., 0.912856],\n",
       "         ...,\n",
       "         [0.24642 , ..., 0.581184]]],\n",
       "\n",
       "\n",
       "       [[[0.072334, ..., 0.684663],\n",
       "         ...,\n",
       "         [0.628984, ..., 0.358811]],\n",
       "\n",
       "        ...,\n",
       "\n",
       "        [[0.491698, ..., 0.876439],\n",
       "         ...,\n",
       "         [0.829525, ..., 0.611719]]]])\n",
       "Coordinates:\n",
       "  * sex_id        (sex_id) int64 1 2\n",
       "  * age_group_id  (age_group_id) int64 11 12 13\n",
       "  * year_id       (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000\n",
       "  * draw          (draw) int64 0 1 2 3 4 5 6 7 8 ... 91 92 93 94 95 96 97 98 99"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "back_to_da = pd.wide_to_long(\n",
    "    df, stubnames=\"draw_\", i=[\"sex_id\", \"age_group_id\", \"year_id\"], j=\"draw\").to_xarray()[\"draw_\"].rename(\"fake_thing\")\n",
    "back_to_da"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "back_to_da.identical(da)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 5. Reading and writing Xarrays to netCDF files\n",
    "Can include metadata/attributes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "da.attrs[\"metric\"] = \"rate\"\n",
    "da.attrs[\"author\"] = \"Me\"\n",
    "da.to_netcdf(\"data.nc\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 11, draw: 100)>\n",
       "array([[[[0.996779, ..., 0.193241],\n",
       "         ...,\n",
       "         [0.50371 , ..., 0.654273]],\n",
       "\n",
       "        ...,\n",
       "\n",
       "        [[0.227811, ..., 0.912856],\n",
       "         ...,\n",
       "         [0.24642 , ..., 0.581184]]],\n",
       "\n",
       "\n",
       "       [[[0.072334, ..., 0.684663],\n",
       "         ...,\n",
       "         [0.628984, ..., 0.358811]],\n",
       "\n",
       "        ...,\n",
       "\n",
       "        [[0.491698, ..., 0.876439],\n",
       "         ...,\n",
       "         [0.829525, ..., 0.611719]]]])\n",
       "Coordinates:\n",
       "  * sex_id        (sex_id) int64 1 2\n",
       "  * age_group_id  (age_group_id) int64 11 12 13\n",
       "  * year_id       (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000\n",
       "  * draw          (draw) int64 0 1 2 3 4 5 6 7 8 ... 91 92 93 94 95 96 97 98 99\n",
       "Attributes:\n",
       "    metric:   rate\n",
       "    author:   Me"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "da"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And of course, we use Unix/Linux-based systems here, so file extensions are really just for use humans to quickly determine\n",
    "what file type a file claims to be. However, there is restriction/rule/utility from the computer's perspective."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "da.to_netcdf(\"data.csv\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "da.to_netcdf(\"data.nc_martin\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 11, draw: 100)>\n",
       "array([[[[0.996779, ..., 0.193241],\n",
       "         ...,\n",
       "         [0.50371 , ..., 0.654273]],\n",
       "\n",
       "        ...,\n",
       "\n",
       "        [[0.227811, ..., 0.912856],\n",
       "         ...,\n",
       "         [0.24642 , ..., 0.581184]]],\n",
       "\n",
       "\n",
       "       [[[0.072334, ..., 0.684663],\n",
       "         ...,\n",
       "         [0.628984, ..., 0.358811]],\n",
       "\n",
       "        ...,\n",
       "\n",
       "        [[0.491698, ..., 0.876439],\n",
       "         ...,\n",
       "         [0.829525, ..., 0.611719]]]])\n",
       "Coordinates:\n",
       "  * sex_id        (sex_id) int64 1 2\n",
       "  * age_group_id  (age_group_id) int64 11 12 13\n",
       "  * year_id       (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000\n",
       "  * draw          (draw) int64 0 1 2 3 4 5 6 7 8 ... 91 92 93 94 95 96 97 98 99\n",
       "Attributes:\n",
       "    metric:   rate\n",
       "    author:   Me"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "read_da1 = xr.open_dataarray(\"data.nc\")\n",
    "read_da1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 11, draw: 100)>\n",
       "array([[[[0.996779, ..., 0.193241],\n",
       "         ...,\n",
       "         [0.50371 , ..., 0.654273]],\n",
       "\n",
       "        ...,\n",
       "\n",
       "        [[0.227811, ..., 0.912856],\n",
       "         ...,\n",
       "         [0.24642 , ..., 0.581184]]],\n",
       "\n",
       "\n",
       "       [[[0.072334, ..., 0.684663],\n",
       "         ...,\n",
       "         [0.628984, ..., 0.358811]],\n",
       "\n",
       "        ...,\n",
       "\n",
       "        [[0.491698, ..., 0.876439],\n",
       "         ...,\n",
       "         [0.829525, ..., 0.611719]]]])\n",
       "Coordinates:\n",
       "  * sex_id        (sex_id) int64 1 2\n",
       "  * age_group_id  (age_group_id) int64 11 12 13\n",
       "  * year_id       (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000\n",
       "  * draw          (draw) int64 0 1 2 3 4 5 6 7 8 ... 91 92 93 94 95 96 97 98 99\n",
       "Attributes:\n",
       "    metric:   rate\n",
       "    author:   Me"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "read_da2 = xr.open_dataarray(\"data.csv\")\n",
    "read_da2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 11, draw: 100)>\n",
       "array([[[[0.996779, ..., 0.193241],\n",
       "         ...,\n",
       "         [0.50371 , ..., 0.654273]],\n",
       "\n",
       "        ...,\n",
       "\n",
       "        [[0.227811, ..., 0.912856],\n",
       "         ...,\n",
       "         [0.24642 , ..., 0.581184]]],\n",
       "\n",
       "\n",
       "       [[[0.072334, ..., 0.684663],\n",
       "         ...,\n",
       "         [0.628984, ..., 0.358811]],\n",
       "\n",
       "        ...,\n",
       "\n",
       "        [[0.491698, ..., 0.876439],\n",
       "         ...,\n",
       "         [0.829525, ..., 0.611719]]]])\n",
       "Coordinates:\n",
       "  * sex_id        (sex_id) int64 1 2\n",
       "  * age_group_id  (age_group_id) int64 11 12 13\n",
       "  * year_id       (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000\n",
       "  * draw          (draw) int64 0 1 2 3 4 5 6 7 8 ... 91 92 93 94 95 96 97 98 99\n",
       "Attributes:\n",
       "    metric:   rate\n",
       "    author:   Me"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "read_da3 = xr.open_dataarray(\"data.nc_martin\")\n",
    "read_da3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "I mention this for two reasons:\n",
    "\n",
    "1) You have to be careful and get used to saving files with the ``.nc`` file extension, if you're super used to saving    ``.csv``s or something else\n",
    "2) Sometimes we do save netCDFs with other file extensions than just a .nc, for example, for our risk attributable pipeline we save files partitioned over cause, draw and year, so we decided it was useful to name files in the following file name format: ``{acause}.ncdraw:range(100, 200)year_id:2017)``"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 6. Slicing and dicing data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "mean_da = da.mean(\"draw\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 1, year_id: 2)>\n",
       "array([[[0.438642, 0.484231]],\n",
       "\n",
       "       [[0.482908, 0.469676]]])\n",
       "Coordinates:\n",
       "  * sex_id        (sex_id) int64 1 2\n",
       "  * age_group_id  (age_group_id) int64 11\n",
       "  * year_id       (year_id) int64 1995 1996"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mean_da.sel(year_id=[1995, 1996], age_group_id=[11])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "point coordinates versus single coord dimensions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'fake_thing' (sex_id: 1, age_group_id: 1, year_id: 11)>\n",
       "array([[[0.480634, 0.501324, 0.48249 , 0.489984, 0.434139, 0.482908,\n",
       "         0.469676, 0.505307, 0.528676, 0.498105, 0.496416]]])\n",
       "Coordinates:\n",
       "  * sex_id        (sex_id) int64 2\n",
       "  * age_group_id  (age_group_id) int64 11\n",
       "  * year_id       (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mean_da.sel(sex_id=[2], age_group_id=[11])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'fake_thing' (year_id: 11)>\n",
       "array([0.480634, 0.501324, 0.48249 , 0.489984, 0.434139, 0.482908, 0.469676,\n",
       "       0.505307, 0.528676, 0.498105, 0.496416])\n",
       "Coordinates:\n",
       "    sex_id        int64 2\n",
       "    age_group_id  int64 11\n",
       "  * year_id       (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mean_da.sel(sex_id=2, age_group_id=11)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'fake_thing' (year_id: 11)>\n",
       "array([0.480634, 0.501324, 0.48249 , 0.489984, 0.434139, 0.482908, 0.469676,\n",
       "       0.505307, 0.528676, 0.498105, 0.496416])\n",
       "Coordinates:\n",
       "  * year_id  (year_id) int64 1990 1991 1992 1993 1994 ... 1997 1998 1999 2000"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mean_da.sel(sex_id=2, age_group_id=11, drop=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Warning:** you'll want to avoid the following. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'fake_thing' (sex_id: 2, year_id: 2)>\n",
       "array([[0.438642, 0.484231],\n",
       "       [0.482908, 0.469676]])\n",
       "Coordinates:\n",
       "  * sex_id   (sex_id) int64 1 2\n",
       "Dimensions without coordinates: year_id"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mean_da.sel(year_id=[1995, 1996], age_group_id=11, drop=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Instead do one of these:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'fake_thing' (sex_id: 2, year_id: 2)>\n",
       "array([[0.438642, 0.484231],\n",
       "       [0.482908, 0.469676]])\n",
       "Coordinates:\n",
       "  * sex_id   (sex_id) int64 1 2\n",
       "  * year_id  (year_id) int64 1995 1996"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mean_da.sel(year_id=[1995, 1996], age_group_id=11).drop(\"age_group_id\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'fake_thing' (sex_id: 2, year_id: 2)>\n",
       "array([[0.438642, 0.484231],\n",
       "       [0.482908, 0.469676]])\n",
       "Coordinates:\n",
       "  * sex_id   (sex_id) int64 1 2\n",
       "  * year_id  (year_id) int64 1995 1996"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mean_da.sel(year_id=[1995, 1996]).sel(age_group_id=11, drop=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Instead of slicing to specific coords, you can also exlcude specific coords"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 1, year_id: 11)>\n",
       "array([[[0.501667, 0.520518, 0.521487, 0.511592, 0.554994, 0.520812,\n",
       "         0.515645, 0.575567, 0.48787 , 0.472843, 0.548013]],\n",
       "\n",
       "       [[0.507428, 0.505268, 0.473393, 0.504639, 0.492401, 0.478396,\n",
       "         0.455247, 0.459401, 0.555081, 0.500056, 0.529371]]])\n",
       "Coordinates:\n",
       "  * sex_id        (sex_id) int64 1 2\n",
       "  * age_group_id  (age_group_id) int64 13\n",
       "  * year_id       (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mean_da.drop([11, 12], \"age_group_id\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 7. Changing values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Another way to slice data is with the `.loc[]` method but I recommend only doing that when you're actually\n",
    "_changing_ values for a given slice of the data, because unlike the `.sel` method, it does not return a deep copy\n",
    "-- that is it's still pointing at the original data (unless you save the slice to another variable, BUT\n",
    "still don't risk it!!!)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
    "mean_da_cp = mean_da.copy()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 1, year_id: 11)>\n",
       "array([[[0.515046, 0.482924, 0.519905, 0.46185 , 0.465647, 0.438642,\n",
       "         0.484231, 0.543931, 0.512703, 0.495674, 0.490514]],\n",
       "\n",
       "       [[0.480634, 0.501324, 0.48249 , 0.489984, 0.434139, 0.482908,\n",
       "         0.469676, 0.505307, 0.528676, 0.498105, 0.496416]]])\n",
       "Coordinates:\n",
       "  * sex_id        (sex_id) int64 1 2\n",
       "  * age_group_id  (age_group_id) int64 11\n",
       "  * year_id       (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mean_da_cp.sel(age_group_id=[11])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The next operation can't be done:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "ename": "SyntaxError",
     "evalue": "can't assign to function call (<ipython-input-29-815bf6d51d25>, line 1)",
     "output_type": "error",
     "traceback": [
      "\u001b[0;36m  File \u001b[0;32m\"<ipython-input-29-815bf6d51d25>\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m    mean_da_cp.sel(age_group_id=[11]) = mean_da_cp.sel(age_group_id=[11]) + 100\u001b[0m\n\u001b[0m                                                                               ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m can't assign to function call\n"
     ]
    }
   ],
   "source": [
    "mean_da_cp.sel(age_group_id=[11]) = mean_da_cp.sel(age_group_id=[11]) + 100"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using ``.loc``"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [],
   "source": [
    "mean_da_cp.loc[dict(age_group_id=[11])] = mean_da_cp.loc[dict(age_group_id=[11])] + 200"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 1, year_id: 11)>\n",
       "array([[[200.515046, 200.482924, 200.519905, 200.46185 , 200.465647,\n",
       "         200.438642, 200.484231, 200.543931, 200.512703, 200.495674,\n",
       "         200.490514]],\n",
       "\n",
       "       [[200.480634, 200.501324, 200.48249 , 200.489984, 200.434139,\n",
       "         200.482908, 200.469676, 200.505307, 200.528676, 200.498105,\n",
       "         200.496416]]])\n",
       "Coordinates:\n",
       "  * sex_id        (sex_id) int64 1 2\n",
       "  * age_group_id  (age_group_id) int64 11\n",
       "  * year_id       (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mean_da_cp.sel(age_group_id=[11])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 8. Data Reduction\n",
    "\n",
    "Taking the sum, mean, quantile, or product over one or more dimensions.\n",
    "\n",
    "Also taking diff, cumprod, cumsum, etc."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Summing over dimensions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 11)>\n",
       "array([[[51.504646, 48.292353, 51.99048 , 46.18502 , 46.564747, 43.864225,\n",
       "         48.423078, 54.393145, 51.270341, 49.567356, 49.051368],\n",
       "        [47.937805, 48.381889, 53.578497, 55.738359, 49.353499, 50.416803,\n",
       "         48.22734 , 47.17919 , 50.830201, 46.291025, 48.116702],\n",
       "        [50.16666 , 52.051829, 52.148661, 51.15924 , 55.499361, 52.081173,\n",
       "         51.564503, 57.55673 , 48.786959, 47.284275, 54.801283]],\n",
       "\n",
       "       [[48.063404, 50.13239 , 48.248955, 48.998371, 43.413857, 48.290827,\n",
       "         46.967581, 50.530699, 52.867631, 49.810484, 49.641557],\n",
       "        [47.680474, 53.340312, 53.079849, 54.45405 , 51.181648, 48.059735,\n",
       "         49.768662, 48.72965 , 47.350284, 47.70728 , 45.04015 ],\n",
       "        [50.742754, 50.526813, 47.339263, 50.463871, 49.240083, 47.83956 ,\n",
       "         45.524688, 45.940104, 55.508125, 50.005635, 52.937063]]])\n",
       "Coordinates:\n",
       "  * sex_id        (sex_id) int64 1 2\n",
       "  * age_group_id  (age_group_id) int64 11 12 13\n",
       "  * year_id       (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "da.sum(\"draw\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Taking means"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'fake_thing' (age_group_id: 3, year_id: 11)>\n",
       "array([[0.49784 , 0.492124, 0.501197, 0.475917, 0.449893, 0.460775, 0.476953,\n",
       "        0.524619, 0.52069 , 0.496889, 0.493465],\n",
       "       [0.478091, 0.508611, 0.533292, 0.550962, 0.502676, 0.492383, 0.48998 ,\n",
       "        0.479544, 0.490902, 0.469992, 0.465784],\n",
       "       [0.504547, 0.512893, 0.49744 , 0.508116, 0.523697, 0.499604, 0.485446,\n",
       "        0.517484, 0.521475, 0.48645 , 0.538692]])\n",
       "Coordinates:\n",
       "  * age_group_id  (age_group_id) int64 11 12 13\n",
       "  * year_id       (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "da.mean([\"draw\", \"sex_id\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'fake_thing' ()>\n",
       "array(3289.684552)"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "da.sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Taking quantiles"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'fake_thing' (quantile: 2, sex_id: 2, age_group_id: 3, year_id: 11)>\n",
       "array([[[[0.946091, 0.947006, 0.989351, 0.938826, 0.957233, 0.961054,\n",
       "          0.962361, 0.9847  , 0.991939, 0.992401, 0.931928],\n",
       "         [0.96409 , 0.989095, 0.971059, 0.932266, 0.992066, 0.995731,\n",
       "          0.961511, 0.95729 , 0.929298, 0.971658, 0.961813],\n",
       "         [0.985317, 0.95646 , 0.971972, 0.984797, 0.965313, 0.9662  ,\n",
       "          0.967894, 0.966558, 0.983227, 0.975196, 0.969915]],\n",
       "\n",
       "        [[0.968146, 0.976099, 0.977238, 0.909349, 0.948866, 0.948609,\n",
       "          0.951318, 0.960428, 0.949223, 0.983426, 0.994721],\n",
       "         [0.953117, 0.944793, 0.99642 , 0.971521, 0.951413, 0.926127,\n",
       "          0.945339, 0.983831, 0.976795, 0.989499, 0.965297],\n",
       "         [0.978185, 0.975026, 0.967635, 0.991123, 0.945305, 0.965985,\n",
       "          0.937724, 0.966592, 0.944686, 0.947914, 0.969859]]],\n",
       "\n",
       "\n",
       "       [[[0.035937, 0.032809, 0.036074, 0.054938, 0.020817, 0.016693,\n",
       "          0.010495, 0.052086, 0.032586, 0.063762, 0.043832],\n",
       "         [0.040247, 0.042332, 0.038767, 0.022039, 0.02663 , 0.073381,\n",
       "          0.016937, 0.019657, 0.063141, 0.015542, 0.033979],\n",
       "         [0.019053, 0.023236, 0.041205, 0.024052, 0.025952, 0.046415,\n",
       "          0.013963, 0.082439, 0.039138, 0.028172, 0.087198]],\n",
       "\n",
       "        [[0.022962, 0.039559, 0.045023, 0.056532, 0.038189, 0.013773,\n",
       "          0.024106, 0.024428, 0.078114, 0.020636, 0.034975],\n",
       "         [0.053146, 0.026986, 0.03948 , 0.03409 , 0.055288, 0.030514,\n",
       "          0.04947 , 0.028957, 0.035929, 0.0153  , 0.021199],\n",
       "         [0.040169, 0.051678, 0.023015, 0.046181, 0.02122 , 0.017725,\n",
       "          0.034586, 0.014844, 0.019773, 0.030668, 0.030164]]]])\n",
       "Coordinates:\n",
       "  * sex_id        (sex_id) int64 1 2\n",
       "  * age_group_id  (age_group_id) int64 11 12 13\n",
       "  * year_id       (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000\n",
       "  * quantile      (quantile) float64 0.975 0.025"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "da.quantile([0.975, 0.025], dim=\"draw\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Cummulative sums and products"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'fake_thing' (year_id: 11)>\n",
       "array([0.515046, 0.482924, 0.519905, 0.46185 , 0.465647, 0.438642, 0.484231,\n",
       "       0.543931, 0.512703, 0.495674, 0.490514])\n",
       "Coordinates:\n",
       "  * year_id  (year_id) int64 1990 1991 1992 1993 1994 ... 1997 1998 1999 2000"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data_slice_da = mean_da.sel(sex_id=1, age_group_id=11, drop=True)\n",
    "data_slice_da"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'fake_thing' (year_id: 11)>\n",
       "array([0.515046, 0.99797 , 1.517875, 1.979725, 2.445372, 2.884015, 3.368245,\n",
       "       3.912177, 4.42488 , 4.920554, 5.411068])\n",
       "Coordinates:\n",
       "  * year_id  (year_id) int64 1990 1991 1992 1993 1994 ... 1997 1998 1999 2000"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data_slice_da.cumsum(\"year_id\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'fake_thing' (year_id: 11)>\n",
       "array([5.150465e-01, 2.487281e-01, 1.293149e-01, 5.972412e-02, 2.781038e-02,\n",
       "       1.219881e-02, 5.907039e-03, 3.213024e-03, 1.647328e-03, 8.165372e-04,\n",
       "       4.005227e-04])\n",
       "Coordinates:\n",
       "  * year_id  (year_id) int64 1990 1991 1992 1993 1994 ... 1997 1998 1999 2000"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data_slice_da.cumprod(\"year_id\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 9. Vectorized operations\n",
    "\n",
    "Adding, multiplying, dividing two or more arrays"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Xarray lines up all of the dimensions and coordinates for you when performing arithmetic between two or more arrays.\n",
    "Of course, if you line things up yourself before had computation will be faster (can discuss that more later)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It's okay if they don't share dimensions -- data will automatically broadcast for dimensions that don't exist. e.g.,\n",
    "below, the right operand doesn't have a draw dimension, but the value of each slice is applied to each draw of the corresponding slice from the left operand\n",
    "\n",
    "that is\n",
    "\n",
    "``da.sel(age_group_id=11, sex_id=2, year_id=1996, draw=i)`` for all ``i`` in ``[0, 99]`` is applied to\n",
    "``mean_da.sel(age_group_id=11, sex_id=2, year_id=1996)``"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [],
   "source": [
    "result_da = da + mean_da # try with operators!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'fake_thing' (draw: 3)>\n",
       "array([0.956272, 0.969894, 1.22502 ])\n",
       "Coordinates:\n",
       "    sex_id        int64 2\n",
       "    age_group_id  int64 11\n",
       "    year_id       int64 1996\n",
       "  * draw          (draw) int64 0 1 2"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result_da.sel(age_group_id=11, sex_id=2, year_id=1996, draw=[0, 1, 2])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'fake_thing' (draw: 3)>\n",
       "array([0.486596, 0.500218, 0.755344])\n",
       "Coordinates:\n",
       "    sex_id        int64 2\n",
       "    age_group_id  int64 11\n",
       "    year_id       int64 1996\n",
       "  * draw          (draw) int64 0 1 2\n",
       "Attributes:\n",
       "    metric:   rate\n",
       "    author:   Me"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "da.sel(age_group_id=11, sex_id=2, year_id=1996, draw=[0, 1, 2])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'fake_thing' ()>\n",
       "array(0.469676)\n",
       "Coordinates:\n",
       "    sex_id        int64 2\n",
       "    age_group_id  int64 11\n",
       "    year_id       int64 1996"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mean_da.sel(age_group_id=11, sex_id=2, year_id=1996)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Xarray also defaults to taking the intersection of the coordinates from each of the operands"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 1, year_id: 11)>\n",
       "array([[[1.003333, 1.041037, 1.042973, 1.023185, 1.109987, 1.041623,\n",
       "         1.03129 , 1.151135, 0.975739, 0.945686, 1.096026]],\n",
       "\n",
       "       [[1.014855, 1.010536, 0.946785, 1.009277, 0.984802, 0.956791,\n",
       "         0.910494, 0.918802, 1.110162, 1.000113, 1.058741]]])\n",
       "Coordinates:\n",
       "  * age_group_id  (age_group_id) int64 13\n",
       "  * sex_id        (sex_id) int64 1 2\n",
       "  * year_id       (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "3 * mean_da - mean_da.sel(age_group_id=[13])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are ways around the above if you want the union of the two. Of course the output will have NaNs where the operands\n",
    "don't line up (i.e. where they don't have the same coordinates of a dimension they share)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 11)>\n",
       "array([[[     nan,      nan,      nan,      nan,      nan,      nan,\n",
       "              nan,      nan,      nan,      nan,      nan],\n",
       "        [     nan,      nan,      nan,      nan,      nan,      nan,\n",
       "              nan,      nan,      nan,      nan,      nan],\n",
       "        [1.003333, 1.041037, 1.042973, 1.023185, 1.109987, 1.041623,\n",
       "         1.03129 , 1.151135, 0.975739, 0.945686, 1.096026]],\n",
       "\n",
       "       [[     nan,      nan,      nan,      nan,      nan,      nan,\n",
       "              nan,      nan,      nan,      nan,      nan],\n",
       "        [     nan,      nan,      nan,      nan,      nan,      nan,\n",
       "              nan,      nan,      nan,      nan,      nan],\n",
       "        [1.014855, 1.010536, 0.946785, 1.009277, 0.984802, 0.956791,\n",
       "         0.910494, 0.918802, 1.110162, 1.000113, 1.058741]]])\n",
       "Coordinates:\n",
       "  * age_group_id  (age_group_id) int64 11 12 13\n",
       "  * sex_id        (sex_id) int64 1 2\n",
       "  * year_id       (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000"
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "with xr.set_options(arithmetic_join=\"outer\"):\n",
    "    result = 3 * mean_da - mean_da.sel(age_group_id=[13])\n",
    "result"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Want to point out another subtle feature of xarray I used just above: applying a scalar (float) to the data. Even this simple task is quite a bit more work in pandas, because you don't want to add 1000 to your metadata as well as your data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 11)>\n",
       "array([[[1000.515046, 1000.482924, 1000.519905, 1000.46185 , 1000.465647,\n",
       "         1000.438642, 1000.484231, 1000.543931, 1000.512703, 1000.495674,\n",
       "         1000.490514],\n",
       "        [1000.479378, 1000.483819, 1000.535785, 1000.557384, 1000.493535,\n",
       "         1000.504168, 1000.482273, 1000.471792, 1000.508302, 1000.46291 ,\n",
       "         1000.481167],\n",
       "        [1000.501667, 1000.520518, 1000.521487, 1000.511592, 1000.554994,\n",
       "         1000.520812, 1000.515645, 1000.575567, 1000.48787 , 1000.472843,\n",
       "         1000.548013]],\n",
       "\n",
       "       [[1000.480634, 1000.501324, 1000.48249 , 1000.489984, 1000.434139,\n",
       "         1000.482908, 1000.469676, 1000.505307, 1000.528676, 1000.498105,\n",
       "         1000.496416],\n",
       "        [1000.476805, 1000.533403, 1000.530798, 1000.54454 , 1000.511816,\n",
       "         1000.480597, 1000.497687, 1000.487296, 1000.473503, 1000.477073,\n",
       "         1000.450402],\n",
       "        [1000.507428, 1000.505268, 1000.473393, 1000.504639, 1000.492401,\n",
       "         1000.478396, 1000.455247, 1000.459401, 1000.555081, 1000.500056,\n",
       "         1000.529371]]])\n",
       "Coordinates:\n",
       "  * sex_id        (sex_id) int64 1 2\n",
       "  * age_group_id  (age_group_id) int64 11 12 13\n",
       "  * year_id       (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mean_da + 1000"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 10. Changing and adding coordinates/Expanding or broadcasting dimensions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [],
   "source": [
    "broadcasted_da = mean_da.expand_dims(draw=range(5), location_id=[102, 6])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'fake_thing' (draw: 5, location_id: 2)>\n",
       "array([[0.482908, 0.482908],\n",
       "       [0.482908, 0.482908],\n",
       "       [0.482908, 0.482908],\n",
       "       [0.482908, 0.482908],\n",
       "       [0.482908, 0.482908]])\n",
       "Coordinates:\n",
       "  * draw          (draw) int64 0 1 2 3 4\n",
       "  * location_id   (location_id) int64 102 6\n",
       "    sex_id        int64 2\n",
       "    age_group_id  int64 11\n",
       "    year_id       int64 1995"
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "broadcasted_da.sel(age_group_id=11, sex_id=2, year_id=1995)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Trying to limit use of our internal FHS code, but we do have a really nice/fast tool for this: expanding an existing dimension to include new coordinates."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [],
   "source": [
    "from fbd_core.etl import expand_dimensions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note: the difference between broadcasting and expanding\n",
    "* **broadcasting** here means altering the array to include dimensions it didn't previously, where each coordinate on a  given new dimension points to an identical slice\n",
    "* **Expanding** here means altering the array to include coordinates it didn't previously but on a dimension that already did exist."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 5, year_id: 11, scenario: 3)>\n",
       "array([[[[0.515046, ..., 0.515046],\n",
       "         ...,\n",
       "         [0.490514, ..., 0.490514]],\n",
       "\n",
       "        ...,\n",
       "\n",
       "        [[     nan, ...,      nan],\n",
       "         ...,\n",
       "         [     nan, ...,      nan]]],\n",
       "\n",
       "\n",
       "       [[[0.480634, ..., 0.480634],\n",
       "         ...,\n",
       "         [0.496416, ..., 0.496416]],\n",
       "\n",
       "        ...,\n",
       "\n",
       "        [[     nan, ...,      nan],\n",
       "         ...,\n",
       "         [     nan, ...,      nan]]]])\n",
       "Coordinates:\n",
       "  * age_group_id  (age_group_id) int64 11 12 13 14 15\n",
       "  * sex_id        (sex_id) int64 1 2\n",
       "  * year_id       (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000\n",
       "  * scenario      (scenario) int64 0 1 -1"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "expand_dimensions(mean_da, age_group_id=[14, 15], scenario=[0, 1, -1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 5, year_id: 11, scenario: 3)>\n",
       "array([[[[5.150465e-01, ..., 5.150465e-01],\n",
       "         ...,\n",
       "         [4.905137e-01, ..., 4.905137e-01]],\n",
       "\n",
       "        ...,\n",
       "\n",
       "        [[9.999000e+03, ..., 9.999000e+03],\n",
       "         ...,\n",
       "         [9.999000e+03, ..., 9.999000e+03]]],\n",
       "\n",
       "\n",
       "       [[[4.806340e-01, ..., 4.806340e-01],\n",
       "         ...,\n",
       "         [4.964156e-01, ..., 4.964156e-01]],\n",
       "\n",
       "        ...,\n",
       "\n",
       "        [[9.999000e+03, ..., 9.999000e+03],\n",
       "         ...,\n",
       "         [9.999000e+03, ..., 9.999000e+03]]]])\n",
       "Coordinates:\n",
       "  * age_group_id  (age_group_id) int64 11 12 13 14 15\n",
       "  * sex_id        (sex_id) int64 1 2\n",
       "  * year_id       (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000\n",
       "  * scenario      (scenario) int64 0 1 -1"
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "expand_dimensions(mean_da, age_group_id=[14, 15], scenario=[0, 1, -1], fill_value=9999)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Changing coordinate or dimension labels"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [],
   "source": [
    "renamed_da = mean_da.rename({\"sex_id\": \"sex_name\", \"age_group_id\": \"age_group_name\"})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'fake_thing' (sex_name: 2, age_group_name: 3, year_id: 11)>\n",
       "array([[[0.515046, 0.482924, 0.519905, 0.46185 , 0.465647, 0.438642,\n",
       "         0.484231, 0.543931, 0.512703, 0.495674, 0.490514],\n",
       "        [0.479378, 0.483819, 0.535785, 0.557384, 0.493535, 0.504168,\n",
       "         0.482273, 0.471792, 0.508302, 0.46291 , 0.481167],\n",
       "        [0.501667, 0.520518, 0.521487, 0.511592, 0.554994, 0.520812,\n",
       "         0.515645, 0.575567, 0.48787 , 0.472843, 0.548013]],\n",
       "\n",
       "       [[0.480634, 0.501324, 0.48249 , 0.489984, 0.434139, 0.482908,\n",
       "         0.469676, 0.505307, 0.528676, 0.498105, 0.496416],\n",
       "        [0.476805, 0.533403, 0.530798, 0.54454 , 0.511816, 0.480597,\n",
       "         0.497687, 0.487296, 0.473503, 0.477073, 0.450402],\n",
       "        [0.507428, 0.505268, 0.473393, 0.504639, 0.492401, 0.478396,\n",
       "         0.455247, 0.459401, 0.555081, 0.500056, 0.529371]]])\n",
       "Coordinates:\n",
       "  * sex_name        (sex_name) <U6 'Male' 'Female'\n",
       "  * age_group_name  (age_group_name) int64 11 12 13\n",
       "  * year_id         (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000"
      ]
     },
     "execution_count": 55,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "renamed_da.assign_coords(sex_name=[\"Male\", \"Female\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Concatenating multiple dataarrays into one dataarray"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [],
   "source": [
    "future_da = xr.DataArray(\n",
    "    data=np.random.random([2, 3, 4]),\n",
    "    dims=[\"sex_id\", \"age_group_id\", \"year_id\"],\n",
    "    coords={\n",
    "        \"sex_id\": [1, 2],\n",
    "        \"age_group_id\": [11, 12, 13],\n",
    "        \"year_id\": range(2001, 2005),\n",
    "        },\n",
    "    name=\"fake_thing\"\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 15)>\n",
       "array([[[0.515046, 0.482924, 0.519905, 0.46185 , 0.465647, 0.438642,\n",
       "         0.484231, 0.543931, 0.512703, 0.495674, 0.490514, 0.997692,\n",
       "         0.001631, 0.115671, 0.44467 ],\n",
       "        [0.479378, 0.483819, 0.535785, 0.557384, 0.493535, 0.504168,\n",
       "         0.482273, 0.471792, 0.508302, 0.46291 , 0.481167, 0.8696  ,\n",
       "         0.813101, 0.781387, 0.394234],\n",
       "        [0.501667, 0.520518, 0.521487, 0.511592, 0.554994, 0.520812,\n",
       "         0.515645, 0.575567, 0.48787 , 0.472843, 0.548013, 0.811391,\n",
       "         0.491746, 0.697288, 0.206781]],\n",
       "\n",
       "       [[0.480634, 0.501324, 0.48249 , 0.489984, 0.434139, 0.482908,\n",
       "         0.469676, 0.505307, 0.528676, 0.498105, 0.496416, 0.277114,\n",
       "         0.925087, 0.867205, 0.221227],\n",
       "        [0.476805, 0.533403, 0.530798, 0.54454 , 0.511816, 0.480597,\n",
       "         0.497687, 0.487296, 0.473503, 0.477073, 0.450402, 0.508261,\n",
       "         0.63895 , 0.727136, 0.085397],\n",
       "        [0.507428, 0.505268, 0.473393, 0.504639, 0.492401, 0.478396,\n",
       "         0.455247, 0.459401, 0.555081, 0.500056, 0.529371, 0.610459,\n",
       "         0.217386, 0.575926, 0.349022]]])\n",
       "Coordinates:\n",
       "  * sex_id        (sex_id) int64 1 2\n",
       "  * age_group_id  (age_group_id) int64 11 12 13\n",
       "  * year_id       (year_id) int64 1990 1991 1992 1993 ... 2001 2002 2003 2004"
      ]
     },
     "execution_count": 57,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "xr.concat([mean_da, future_da], dim=\"year_id\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 15)>\n",
       "array([[[0.515046, 0.482924, 0.519905, 0.46185 , 0.465647, 0.438642,\n",
       "         0.484231, 0.543931, 0.512703, 0.495674, 0.490514, 0.997692,\n",
       "         0.001631, 0.115671, 0.44467 ],\n",
       "        [0.479378, 0.483819, 0.535785, 0.557384, 0.493535, 0.504168,\n",
       "         0.482273, 0.471792, 0.508302, 0.46291 , 0.481167, 0.8696  ,\n",
       "         0.813101, 0.781387, 0.394234],\n",
       "        [     nan,      nan,      nan,      nan,      nan,      nan,\n",
       "              nan,      nan,      nan,      nan,      nan, 0.811391,\n",
       "         0.491746, 0.697288, 0.206781]],\n",
       "\n",
       "       [[0.480634, 0.501324, 0.48249 , 0.489984, 0.434139, 0.482908,\n",
       "         0.469676, 0.505307, 0.528676, 0.498105, 0.496416, 0.277114,\n",
       "         0.925087, 0.867205, 0.221227],\n",
       "        [0.476805, 0.533403, 0.530798, 0.54454 , 0.511816, 0.480597,\n",
       "         0.497687, 0.487296, 0.473503, 0.477073, 0.450402, 0.508261,\n",
       "         0.63895 , 0.727136, 0.085397],\n",
       "        [     nan,      nan,      nan,      nan,      nan,      nan,\n",
       "              nan,      nan,      nan,      nan,      nan, 0.610459,\n",
       "         0.217386, 0.575926, 0.349022]]])\n",
       "Coordinates:\n",
       "  * age_group_id  (age_group_id) int64 11 12 13\n",
       "  * sex_id        (sex_id) int64 1 2\n",
       "  * year_id       (year_id) int64 1990 1991 1992 1993 ... 2001 2002 2003 2004"
      ]
     },
     "execution_count": 58,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "xr.concat([mean_da.sel(age_group_id=[11, 12]), future_da], dim=\"year_id\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [],
   "source": [
    "summary_da = xr.concat([\n",
    "    da.mean(\"draw\").assign_coords(summary_val=\"mean\"),\n",
    "    da.quantile([0.975, 0.025], \"draw\").rename({\"quantile\": \"summary_val\"}).assign_coords(summary_val=[\"upper\", \"lower\"])\n",
    "    ], dim=\"summary_val\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 11, summary_val: 3)>\n",
       "array([[[[0.515046, 0.946091, 0.035937],\n",
       "         [0.482924, 0.947006, 0.032809],\n",
       "         [0.519905, 0.989351, 0.036074],\n",
       "         [0.46185 , 0.938826, 0.054938],\n",
       "         [0.465647, 0.957233, 0.020817],\n",
       "         [0.438642, 0.961054, 0.016693],\n",
       "         [0.484231, 0.962361, 0.010495],\n",
       "         [0.543931, 0.9847  , 0.052086],\n",
       "         [0.512703, 0.991939, 0.032586],\n",
       "         [0.495674, 0.992401, 0.063762],\n",
       "         [0.490514, 0.931928, 0.043832]],\n",
       "\n",
       "        [[0.479378, 0.96409 , 0.040247],\n",
       "         [0.483819, 0.989095, 0.042332],\n",
       "         [0.535785, 0.971059, 0.038767],\n",
       "         [0.557384, 0.932266, 0.022039],\n",
       "         [0.493535, 0.992066, 0.02663 ],\n",
       "         [0.504168, 0.995731, 0.073381],\n",
       "         [0.482273, 0.961511, 0.016937],\n",
       "         [0.471792, 0.95729 , 0.019657],\n",
       "         [0.508302, 0.929298, 0.063141],\n",
       "         [0.46291 , 0.971658, 0.015542],\n",
       "         [0.481167, 0.961813, 0.033979]],\n",
       "\n",
       "        [[0.501667, 0.985317, 0.019053],\n",
       "         [0.520518, 0.95646 , 0.023236],\n",
       "         [0.521487, 0.971972, 0.041205],\n",
       "         [0.511592, 0.984797, 0.024052],\n",
       "         [0.554994, 0.965313, 0.025952],\n",
       "         [0.520812, 0.9662  , 0.046415],\n",
       "         [0.515645, 0.967894, 0.013963],\n",
       "         [0.575567, 0.966558, 0.082439],\n",
       "         [0.48787 , 0.983227, 0.039138],\n",
       "         [0.472843, 0.975196, 0.028172],\n",
       "         [0.548013, 0.969915, 0.087198]]],\n",
       "\n",
       "\n",
       "       [[[0.480634, 0.968146, 0.022962],\n",
       "         [0.501324, 0.976099, 0.039559],\n",
       "         [0.48249 , 0.977238, 0.045023],\n",
       "         [0.489984, 0.909349, 0.056532],\n",
       "         [0.434139, 0.948866, 0.038189],\n",
       "         [0.482908, 0.948609, 0.013773],\n",
       "         [0.469676, 0.951318, 0.024106],\n",
       "         [0.505307, 0.960428, 0.024428],\n",
       "         [0.528676, 0.949223, 0.078114],\n",
       "         [0.498105, 0.983426, 0.020636],\n",
       "         [0.496416, 0.994721, 0.034975]],\n",
       "\n",
       "        [[0.476805, 0.953117, 0.053146],\n",
       "         [0.533403, 0.944793, 0.026986],\n",
       "         [0.530798, 0.99642 , 0.03948 ],\n",
       "         [0.54454 , 0.971521, 0.03409 ],\n",
       "         [0.511816, 0.951413, 0.055288],\n",
       "         [0.480597, 0.926127, 0.030514],\n",
       "         [0.497687, 0.945339, 0.04947 ],\n",
       "         [0.487296, 0.983831, 0.028957],\n",
       "         [0.473503, 0.976795, 0.035929],\n",
       "         [0.477073, 0.989499, 0.0153  ],\n",
       "         [0.450402, 0.965297, 0.021199]],\n",
       "\n",
       "        [[0.507428, 0.978185, 0.040169],\n",
       "         [0.505268, 0.975026, 0.051678],\n",
       "         [0.473393, 0.967635, 0.023015],\n",
       "         [0.504639, 0.991123, 0.046181],\n",
       "         [0.492401, 0.945305, 0.02122 ],\n",
       "         [0.478396, 0.965985, 0.017725],\n",
       "         [0.455247, 0.937724, 0.034586],\n",
       "         [0.459401, 0.966592, 0.014844],\n",
       "         [0.555081, 0.944686, 0.019773],\n",
       "         [0.500056, 0.947914, 0.030668],\n",
       "         [0.529371, 0.969859, 0.030164]]]])\n",
       "Coordinates:\n",
       "  * sex_id        (sex_id) int64 1 2\n",
       "  * age_group_id  (age_group_id) int64 11 12 13\n",
       "  * year_id       (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000\n",
       "  * summary_val   (summary_val) <U5 'mean' 'upper' 'lower'"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "summary_da"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "``combine_first`` is another tool, which similar but has a nice additional feature -- if the array you're appending\n",
    "data for coords that already exist original array, then only the data will be used."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 15)>\n",
       "array([[[0.515046, 0.482924, 0.519905, 0.46185 , 0.465647, 0.438642,\n",
       "         0.484231, 0.543931, 0.512703, 0.495674, 0.490514, 0.997692,\n",
       "         0.001631, 0.115671, 0.44467 ],\n",
       "        [0.479378, 0.483819, 0.535785, 0.557384, 0.493535, 0.504168,\n",
       "         0.482273, 0.471792, 0.508302, 0.46291 , 0.481167, 0.8696  ,\n",
       "         0.813101, 0.781387, 0.394234],\n",
       "        [0.501667, 0.520518, 0.521487, 0.511592, 0.554994, 0.520812,\n",
       "         0.515645, 0.575567, 0.48787 , 0.472843, 0.548013, 0.811391,\n",
       "         0.491746, 0.697288, 0.206781]],\n",
       "\n",
       "       [[0.480634, 0.501324, 0.48249 , 0.489984, 0.434139, 0.482908,\n",
       "         0.469676, 0.505307, 0.528676, 0.498105, 0.496416, 0.277114,\n",
       "         0.925087, 0.867205, 0.221227],\n",
       "        [0.476805, 0.533403, 0.530798, 0.54454 , 0.511816, 0.480597,\n",
       "         0.497687, 0.487296, 0.473503, 0.477073, 0.450402, 0.508261,\n",
       "         0.63895 , 0.727136, 0.085397],\n",
       "        [0.507428, 0.505268, 0.473393, 0.504639, 0.492401, 0.478396,\n",
       "         0.455247, 0.459401, 0.555081, 0.500056, 0.529371, 0.610459,\n",
       "         0.217386, 0.575926, 0.349022]]])\n",
       "Coordinates:\n",
       "  * year_id       (year_id) int64 1990 1991 1992 1993 ... 2001 2002 2003 2004\n",
       "  * sex_id        (sex_id) int64 1 2\n",
       "  * age_group_id  (age_group_id) int64 11 12 13"
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mean_da.combine_first(future_da)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 6)>\n",
       "array([[[-5.000000e+00, -5.000000e+00,  9.976917e-01,  1.631312e-03,\n",
       "          1.156705e-01,  4.446702e-01],\n",
       "        [-5.000000e+00, -5.000000e+00,  8.696002e-01,  8.131013e-01,\n",
       "          7.813870e-01,  3.942344e-01],\n",
       "        [-5.000000e+00, -5.000000e+00,  8.113905e-01,  4.917461e-01,\n",
       "          6.972879e-01,  2.067813e-01]],\n",
       "\n",
       "       [[-5.000000e+00, -5.000000e+00,  2.771139e-01,  9.250868e-01,\n",
       "          8.672051e-01,  2.212266e-01],\n",
       "        [-5.000000e+00, -5.000000e+00,  5.082614e-01,  6.389501e-01,\n",
       "          7.271363e-01,  8.539651e-02],\n",
       "        [-5.000000e+00, -5.000000e+00,  6.104589e-01,  2.173860e-01,\n",
       "          5.759261e-01,  3.490221e-01]]])\n",
       "Coordinates:\n",
       "  * year_id       (year_id) int64 1999 2000 2001 2002 2003 2004\n",
       "  * sex_id        (sex_id) int64 1 2\n",
       "  * age_group_id  (age_group_id) int64 11 12 13"
      ]
     },
     "execution_count": 62,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "future_with_past = expand_dimensions(future_da, year_id=[1999, 2000], fill_value=-5)\n",
    "future_with_past"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'fake_thing' (sex_id: 2, age_group_id: 3, year_id: 15)>\n",
       "array([[[0.515046, 0.482924, 0.519905, 0.46185 , 0.465647, 0.438642,\n",
       "         0.484231, 0.543931, 0.512703, 0.495674, 0.490514, 0.997692,\n",
       "         0.001631, 0.115671, 0.44467 ],\n",
       "        [0.479378, 0.483819, 0.535785, 0.557384, 0.493535, 0.504168,\n",
       "         0.482273, 0.471792, 0.508302, 0.46291 , 0.481167, 0.8696  ,\n",
       "         0.813101, 0.781387, 0.394234],\n",
       "        [0.501667, 0.520518, 0.521487, 0.511592, 0.554994, 0.520812,\n",
       "         0.515645, 0.575567, 0.48787 , 0.472843, 0.548013, 0.811391,\n",
       "         0.491746, 0.697288, 0.206781]],\n",
       "\n",
       "       [[0.480634, 0.501324, 0.48249 , 0.489984, 0.434139, 0.482908,\n",
       "         0.469676, 0.505307, 0.528676, 0.498105, 0.496416, 0.277114,\n",
       "         0.925087, 0.867205, 0.221227],\n",
       "        [0.476805, 0.533403, 0.530798, 0.54454 , 0.511816, 0.480597,\n",
       "         0.497687, 0.487296, 0.473503, 0.477073, 0.450402, 0.508261,\n",
       "         0.63895 , 0.727136, 0.085397],\n",
       "        [0.507428, 0.505268, 0.473393, 0.504639, 0.492401, 0.478396,\n",
       "         0.455247, 0.459401, 0.555081, 0.500056, 0.529371, 0.610459,\n",
       "         0.217386, 0.575926, 0.349022]]])\n",
       "Coordinates:\n",
       "  * year_id       (year_id) int64 1990 1991 1992 1993 ... 2001 2002 2003 2004\n",
       "  * sex_id        (sex_id) int64 1 2\n",
       "  * age_group_id  (age_group_id) int64 11 12 13"
      ]
     },
     "execution_count": 63,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mean_da.combine_first(future_with_past)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 11. Datasets\n",
    "\n",
    "what are they. when they're useful. How to make them.\n",
    "\n",
    "Datasets are very useful in some situations."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A typical use case is when you want to keep several related data variables in one data structure but when they have inconsistent dimensions. For example, you want to store SDI and mortality in a dataset together, but mortality has age-group and sex dimensions, while SDI does not. They do however share year and location as dimensions."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Merging two or more dataarrays into one dataset: Note that dataarrays have to be named"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "ds = xr.merge([mean_da.rename(\"mean\"), da.rename(\"draws\")])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.Dataset>\n",
       "Dimensions:       (age_group_id: 3, sex_id: 2, year_id: 11)\n",
       "Coordinates:\n",
       "  * sex_id        (sex_id) int64 1 2\n",
       "  * age_group_id  (age_group_id) int64 11 12 13\n",
       "  * year_id       (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000\n",
       "Data variables:\n",
       "    draw_0        (sex_id, age_group_id, year_id) float64 0.9968 ... 0.8295\n",
       "    draw_1        (sex_id, age_group_id, year_id) float64 0.7616 ... 0.01835\n",
       "    draw_2        (sex_id, age_group_id, year_id) float64 0.3508 ... 0.6153\n",
       "    draw_3        (sex_id, age_group_id, year_id) float64 0.7504 ... 0.6143\n",
       "    draw_4        (sex_id, age_group_id, year_id) float64 0.4339 ... 0.4588\n",
       "    draw_5        (sex_id, age_group_id, year_id) float64 0.7644 ... 0.995\n",
       "    draw_6        (sex_id, age_group_id, year_id) float64 0.1224 ... 0.0544\n",
       "    draw_7        (sex_id, age_group_id, year_id) float64 0.5721 ... 0.6756\n",
       "    draw_8        (sex_id, age_group_id, year_id) float64 0.6604 ... 0.04322\n",
       "    draw_9        (sex_id, age_group_id, year_id) float64 0.7618 ... 0.6469\n",
       "    draw_10       (sex_id, age_group_id, year_id) float64 0.319 ... 0.9751\n",
       "    draw_11       (sex_id, age_group_id, year_id) float64 0.524 ... 0.7128\n",
       "    draw_12       (sex_id, age_group_id, year_id) float64 0.6512 ... 0.3474\n",
       "    draw_13       (sex_id, age_group_id, year_id) float64 0.562 0.619 ... 0.5368\n",
       "    draw_14       (sex_id, age_group_id, year_id) float64 0.2113 ... 0.5242\n",
       "    draw_15       (sex_id, age_group_id, year_id) float64 0.4212 ... 0.7537\n",
       "    draw_16       (sex_id, age_group_id, year_id) float64 0.9361 ... 0.6975\n",
       "    draw_17       (sex_id, age_group_id, year_id) float64 0.2602 ... 0.4517\n",
       "    draw_18       (sex_id, age_group_id, year_id) float64 0.9366 ... 0.5681\n",
       "    draw_19       (sex_id, age_group_id, year_id) float64 0.2167 ... 0.6634\n",
       "    draw_20       (sex_id, age_group_id, year_id) float64 0.4639 ... 0.1377\n",
       "    draw_21       (sex_id, age_group_id, year_id) float64 0.8358 ... 0.9574\n",
       "    draw_22       (sex_id, age_group_id, year_id) float64 0.8781 ... 0.8257\n",
       "    draw_23       (sex_id, age_group_id, year_id) float64 0.646 ... 0.7914\n",
       "    draw_24       (sex_id, age_group_id, year_id) float64 0.7519 ... 0.9622\n",
       "    draw_25       (sex_id, age_group_id, year_id) float64 0.1052 ... 0.411\n",
       "    draw_26       (sex_id, age_group_id, year_id) float64 0.453 ... 0.6014\n",
       "    draw_27       (sex_id, age_group_id, year_id) float64 0.97 0.3091 ... 0.6086\n",
       "    draw_28       (sex_id, age_group_id, year_id) float64 0.4998 ... 0.4429\n",
       "    draw_29       (sex_id, age_group_id, year_id) float64 0.05933 ... 0.06421\n",
       "    draw_30       (sex_id, age_group_id, year_id) float64 0.4804 ... 0.06488\n",
       "    draw_31       (sex_id, age_group_id, year_id) float64 0.8592 ... 0.8677\n",
       "    draw_32       (sex_id, age_group_id, year_id) float64 0.7863 ... 0.3912\n",
       "    draw_33       (sex_id, age_group_id, year_id) float64 0.8053 ... 0.6144\n",
       "    draw_34       (sex_id, age_group_id, year_id) float64 0.4348 ... 0.4366\n",
       "    draw_35       (sex_id, age_group_id, year_id) float64 0.06214 ... 0.9133\n",
       "    draw_36       (sex_id, age_group_id, year_id) float64 0.2246 ... 0.3688\n",
       "    draw_37       (sex_id, age_group_id, year_id) float64 0.8678 ... 0.09394\n",
       "    draw_38       (sex_id, age_group_id, year_id) float64 0.07461 ... 0.3763\n",
       "    draw_39       (sex_id, age_group_id, year_id) float64 0.07531 ... 0.3003\n",
       "    draw_40       (sex_id, age_group_id, year_id) float64 0.693 ... 0.9037\n",
       "    draw_41       (sex_id, age_group_id, year_id) float64 0.1258 ... 0.0871\n",
       "    draw_42       (sex_id, age_group_id, year_id) float64 0.8201 ... 0.1401\n",
       "    draw_43       (sex_id, age_group_id, year_id) float64 0.0862 ... 0.09171\n",
       "    draw_44       (sex_id, age_group_id, year_id) float64 0.0325 ... 0.8637\n",
       "    draw_45       (sex_id, age_group_id, year_id) float64 0.891 ... 0.7595\n",
       "    draw_46       (sex_id, age_group_id, year_id) float64 0.7296 ... 0.4517\n",
       "    draw_47       (sex_id, age_group_id, year_id) float64 0.2217 ... 0.8892\n",
       "    draw_48       (sex_id, age_group_id, year_id) float64 0.9126 ... 0.7594\n",
       "    draw_49       (sex_id, age_group_id, year_id) float64 0.2288 ... 0.07483\n",
       "    draw_50       (sex_id, age_group_id, year_id) float64 0.04443 ... 0.04533\n",
       "    draw_51       (sex_id, age_group_id, year_id) float64 0.9039 ... 0.2432\n",
       "    draw_52       (sex_id, age_group_id, year_id) float64 0.7614 ... 0.3947\n",
       "    draw_53       (sex_id, age_group_id, year_id) float64 0.4993 ... 0.6609\n",
       "    draw_54       (sex_id, age_group_id, year_id) float64 0.2617 ... 0.5335\n",
       "    draw_55       (sex_id, age_group_id, year_id) float64 0.4381 ... 0.233\n",
       "    draw_56       (sex_id, age_group_id, year_id) float64 0.8289 ... 0.3777\n",
       "    draw_57       (sex_id, age_group_id, year_id) float64 0.891 0.321 ... 0.5784\n",
       "    draw_58       (sex_id, age_group_id, year_id) float64 0.4284 ... 0.9219\n",
       "    draw_59       (sex_id, age_group_id, year_id) float64 0.8465 ... 0.3912\n",
       "    draw_60       (sex_id, age_group_id, year_id) float64 0.355 ... 0.9802\n",
       "    draw_61       (sex_id, age_group_id, year_id) float64 0.02395 ... 0.8892\n",
       "    draw_62       (sex_id, age_group_id, year_id) float64 0.2704 ... 0.5783\n",
       "    draw_63       (sex_id, age_group_id, year_id) float64 0.5877 ... 0.8256\n",
       "    draw_64       (sex_id, age_group_id, year_id) float64 0.8006 ... 0.3911\n",
       "    draw_65       (sex_id, age_group_id, year_id) float64 0.9209 ... 0.964\n",
       "    draw_66       (sex_id, age_group_id, year_id) float64 0.8785 ... 0.1937\n",
       "    draw_67       (sex_id, age_group_id, year_id) float64 0.6067 ... 0.9608\n",
       "    draw_68       (sex_id, age_group_id, year_id) float64 0.03497 ... 0.3151\n",
       "    draw_69       (sex_id, age_group_id, year_id) float64 0.5407 ... 0.8746\n",
       "    draw_70       (sex_id, age_group_id, year_id) float64 0.2832 ... 0.5551\n",
       "    draw_71       (sex_id, age_group_id, year_id) float64 0.2203 ... 0.117\n",
       "    draw_72       (sex_id, age_group_id, year_id) float64 0.7725 ... 0.6808\n",
       "    draw_73       (sex_id, age_group_id, year_id) float64 0.6947 ... 0.1008\n",
       "    draw_74       (sex_id, age_group_id, year_id) float64 0.6666 ... 0.6387\n",
       "    draw_75       (sex_id, age_group_id, year_id) float64 0.5149 ... 0.005526\n",
       "    draw_76       (sex_id, age_group_id, year_id) float64 0.9547 ... 0.7345\n",
       "    draw_77       (sex_id, age_group_id, year_id) float64 0.3364 ... 0.005635\n",
       "    draw_78       (sex_id, age_group_id, year_id) float64 0.572 ... 0.8498\n",
       "    draw_79       (sex_id, age_group_id, year_id) float64 0.4541 ... 0.646\n",
       "    draw_80       (sex_id, age_group_id, year_id) float64 0.2224 ... 0.9263\n",
       "    draw_81       (sex_id, age_group_id, year_id) float64 0.1243 ... 0.4103\n",
       "    draw_82       (sex_id, age_group_id, year_id) float64 0.7503 ... 0.1586\n",
       "    draw_83       (sex_id, age_group_id, year_id) float64 0.1514 ... 0.8157\n",
       "    draw_84       (sex_id, age_group_id, year_id) float64 0.7995 ... 0.05216\n",
       "    draw_85       (sex_id, age_group_id, year_id) float64 0.4792 ... 0.424\n",
       "    draw_86       (sex_id, age_group_id, year_id) float64 0.09267 ... 0.6082\n",
       "    draw_87       (sex_id, age_group_id, year_id) float64 0.5104 ... 0.6742\n",
       "    draw_88       (sex_id, age_group_id, year_id) float64 0.9314 ... 0.3405\n",
       "    draw_89       (sex_id, age_group_id, year_id) float64 0.037 ... 0.8594\n",
       "    draw_90       (sex_id, age_group_id, year_id) float64 0.2258 ... 0.835\n",
       "    draw_91       (sex_id, age_group_id, year_id) float64 0.8366 ... 0.4034\n",
       "    draw_92       (sex_id, age_group_id, year_id) float64 0.8852 ... 0.8596\n",
       "    draw_93       (sex_id, age_group_id, year_id) float64 0.2229 ... 0.9087\n",
       "    draw_94       (sex_id, age_group_id, year_id) float64 0.6414 ... 0.6106\n",
       "    draw_95       (sex_id, age_group_id, year_id) float64 0.3939 ... 0.2564\n",
       "    draw_96       (sex_id, age_group_id, year_id) float64 0.3816 ... 0.3988\n",
       "    draw_97       (sex_id, age_group_id, year_id) float64 0.2947 ... 0.2535\n",
       "    draw_98       (sex_id, age_group_id, year_id) float64 0.6506 ... 0.4103\n",
       "    draw_99       (sex_id, age_group_id, year_id) float64 0.1932 ... 0.6117"
      ]
     },
     "execution_count": 64,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ds"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is similar to merging in pandas actually, except it defaults to outer merge which is the union of the arrays,\n",
    "but alternatively you can take inner merge/or intersection of the arrays."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.Dataset>\n",
       "Dimensions:       (age_group_id: 3, draw: 100, sex_id: 1, year_id: 11)\n",
       "Coordinates:\n",
       "  * sex_id        (sex_id) int64 2\n",
       "  * age_group_id  (age_group_id) int64 11 12 13\n",
       "  * year_id       (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000\n",
       "  * draw          (draw) int64 0 1 2 3 4 5 6 7 8 ... 91 92 93 94 95 96 97 98 99\n",
       "Data variables:\n",
       "    mean          (sex_id, age_group_id, year_id) float64 0.4806 ... 0.5294\n",
       "    draws         (sex_id, age_group_id, year_id, draw) float64 0.07233 ... 0.6117"
      ]
     },
     "execution_count": 66,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "only_females_ds = xr.merge([mean_da.rename(\"mean\").sel(sex_id=[2]), da.rename(\"draws\")], join=\"inner\")\n",
    "only_females_ds"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.Dataset>\n",
       "Dimensions:       (age_group_id: 3, draw: 100, sex_id: 2, year_id: 11)\n",
       "Coordinates:\n",
       "  * sex_id        (sex_id) int64 1 2\n",
       "  * age_group_id  (age_group_id) int64 11 12 13\n",
       "  * year_id       (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000\n",
       "  * draw          (draw) int64 0 1 2 3 4 5 6 7 8 ... 91 92 93 94 95 96 97 98 99\n",
       "Data variables:\n",
       "    mean          (sex_id, age_group_id, year_id) float64 nan nan ... 0.5294\n",
       "    draws         (sex_id, age_group_id, year_id, draw) float64 0.9968 ... 0.6117"
      ]
     },
     "execution_count": 67,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "with_nans_da = xr.merge([mean_da.rename(\"mean\").sel(sex_id=[2]), da.rename(\"draws\")], join=\"outer\")\n",
    "with_nans_da"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'mean' (sex_id: 2, age_group_id: 3, year_id: 11)>\n",
       "array([[[     nan,      nan,      nan,      nan,      nan,      nan,\n",
       "              nan,      nan,      nan,      nan,      nan],\n",
       "        [     nan,      nan,      nan,      nan,      nan,      nan,\n",
       "              nan,      nan,      nan,      nan,      nan],\n",
       "        [     nan,      nan,      nan,      nan,      nan,      nan,\n",
       "              nan,      nan,      nan,      nan,      nan]],\n",
       "\n",
       "       [[0.480634, 0.501324, 0.48249 , 0.489984, 0.434139, 0.482908,\n",
       "         0.469676, 0.505307, 0.528676, 0.498105, 0.496416],\n",
       "        [0.476805, 0.533403, 0.530798, 0.54454 , 0.511816, 0.480597,\n",
       "         0.497687, 0.487296, 0.473503, 0.477073, 0.450402],\n",
       "        [0.507428, 0.505268, 0.473393, 0.504639, 0.492401, 0.478396,\n",
       "         0.455247, 0.459401, 0.555081, 0.500056, 0.529371]]])\n",
       "Coordinates:\n",
       "  * sex_id        (sex_id) int64 1 2\n",
       "  * age_group_id  (age_group_id) int64 11 12 13\n",
       "  * year_id       (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000"
      ]
     },
     "execution_count": 68,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "with_nans_da[\"mean\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'draws' (sex_id: 2, age_group_id: 3, year_id: 11, draw: 100)>\n",
       "array([[[[0.996779, ..., 0.193241],\n",
       "         ...,\n",
       "         [0.50371 , ..., 0.654273]],\n",
       "\n",
       "        ...,\n",
       "\n",
       "        [[0.227811, ..., 0.912856],\n",
       "         ...,\n",
       "         [0.24642 , ..., 0.581184]]],\n",
       "\n",
       "\n",
       "       [[[0.072334, ..., 0.684663],\n",
       "         ...,\n",
       "         [0.628984, ..., 0.358811]],\n",
       "\n",
       "        ...,\n",
       "\n",
       "        [[0.491698, ..., 0.876439],\n",
       "         ...,\n",
       "         [0.829525, ..., 0.611719]]]])\n",
       "Coordinates:\n",
       "  * sex_id        (sex_id) int64 1 2\n",
       "  * age_group_id  (age_group_id) int64 11 12 13\n",
       "  * year_id       (year_id) int64 1990 1991 1992 1993 ... 1997 1998 1999 2000\n",
       "  * draw          (draw) int64 0 1 2 3 4 5 6 7 8 ... 91 92 93 94 95 96 97 98 99\n",
       "Attributes:\n",
       "    metric:   rate\n",
       "    author:   Me"
      ]
     },
     "execution_count": 69,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "with_nans_da[\"draws\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}