aaronspring/require_all_on.ipynb

## require_all_on.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Conda debug"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "#!conda info"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {},
   "outputs": [],
   "source": [
    "#!which jupyter"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [],
   "source": [
    "#!which jupyter-lab"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "#!jupyter-troubleshoot"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### get resources"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of CPUs: 48, number of threads: 8, number of workers: 6\n"
     ]
    }
   ],
   "source": [
    "from dask.distributed import Client\n",
    "import multiprocessing\n",
    "ncpu = multiprocessing.cpu_count()\n",
    "threads = 8\n",
    "nworker = ncpu//threads\n",
    "print(f'Number of CPUs: {ncpu}, number of threads: {threads}, number of workers: {nworker}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table style=\"border: 2px solid white;\">\n",
       "<tr>\n",
       "<td style=\"vertical-align: top; border: 0px solid white\">\n",
       "<h3 style=\"text-align: left;\">Client</h3>\n",
       "<ul style=\"text-align: left; list-style: none; margin: 0; padding: 0;\">\n",
       "  <li><b>Scheduler: </b>inproc://136.172.50.56/13728/1</li>\n",
       "  <li><b>Dashboard: </b><a href='http://localhost:8888/proxy/8787/status' target='_blank'>http://localhost:8888/proxy/8787/status</a>\n",
       "</ul>\n",
       "</td>\n",
       "<td style=\"vertical-align: top; border: 0px solid white\">\n",
       "<h3 style=\"text-align: left;\">Cluster</h3>\n",
       "<ul style=\"text-align: left; list-style:none; margin: 0; padding: 0;\">\n",
       "  <li><b>Workers: </b>6</li>\n",
       "  <li><b>Cores: </b>48</li>\n",
       "  <li><b>Memory: </b>16.11 GB</li>\n",
       "</ul>\n",
       "</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<Client: 'inproc://136.172.50.56/13728/1' processes=6 threads=48, memory=16.11 GB>"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "client = Client(processes=False, threads_per_worker=threads, n_workers=nworker, memory_limit='256GB')\n",
    "client"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you want to use the `dask labextension dashboard`, please install you own conda and then intake-esm."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Intake to load CMIP data\n",
    "\n",
    "### Using intake-esm on mistral\n",
    "\n",
    "- install intake-esm: https://intake-esm.readthedocs.io/en/latest/installation.html\n",
    "- check the already built catalogs: `/home/mpim/m300524/intake-esm-datastore/catalogs` or `https://github.com/NCAR/intake-esm-datastore/` and skip long catalog building process of running `/home/mpim/m300524/intake-esm-datastore/builders/*.ipynb`\n",
    "\n",
    "Available catalogs:\n",
    "- CMIP6: `/home/mpim/m300524/intake-esm-datastore/catalogs/mistral-cmip6.json`\n",
    "- CMIP5: `/home/mpim/m300524/intake-esm-datastore/catalogs/mistral-cmip5.json`\n",
    "- MPI Grand Ensemble: `/home/mpim/m300524/intake-esm-datastore/catalogs/mistral-MPI-GE.json`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "aws-cesm1-le.csv.gz  mistral-cmip5.csv.gz   mistral-miklip.json\n",
      "glade-cmip5.csv.gz   mistral-cmip5.json     mistral-MPI-GE.csv.gz\n",
      "glade-cmip5.json     mistral-cmip6.csv.gz   mistral-MPI-GE.json\n",
      "glade-cmip6.csv.gz   mistral-cmip6.json     pangeo-cmip6.json\n",
      "glade-cmip6.json     mistral-miklip.csv.gz\n"
     ]
    }
   ],
   "source": [
    "# available catalogs: combination of .json and .csv\n",
    "!ls /home/mpim/m300524/intake-esm-datastore/catalogs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "You may see some warnings below. About tornado and bokeh. I will try to fix this. This does not happen when taking the default `-p shared`, but then we often have too little memory for the analysis. Therefore, I recommend for now using `-p compute`\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "import intake\n",
    "import xarray as xr\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "import pprint\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "#import warnings\n",
    "#warnings.simplefilter(\"ignore\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'2020.3.16'"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# should be >= 2019.12.13\n",
    "import intake_esm\n",
    "intake_esm.__version__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# CMIP6"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "# json file contains the rules for concat and merge as well as the location for the catalog .csv file\n",
    "col_url = \"/home/mpim/m300524/intake-esm-datastore/catalogs/mistral-cmip6.json\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "#!cat /home/mpim/m300524/intake-esm-datastore/catalogs/mistral-cmip6.json"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>activity_id</th>\n",
       "      <th>institution_id</th>\n",
       "      <th>source_id</th>\n",
       "      <th>experiment_id</th>\n",
       "      <th>member_id</th>\n",
       "      <th>table_id</th>\n",
       "      <th>variable_id</th>\n",
       "      <th>grid_label</th>\n",
       "      <th>dcpp_init_year</th>\n",
       "      <th>version</th>\n",
       "      <th>time_range</th>\n",
       "      <th>path</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>AerChemMIP</td>\n",
       "      <td>HAMMOZ-Consortium</td>\n",
       "      <td>MPI-ESM-1-2-HAM</td>\n",
       "      <td>ssp370-lowNTCF</td>\n",
       "      <td>r1i1p1f1</td>\n",
       "      <td>Lmon</td>\n",
       "      <td>npp</td>\n",
       "      <td>gn</td>\n",
       "      <td>NaN</td>\n",
       "      <td>v20190627</td>\n",
       "      <td>203501-205412</td>\n",
       "      <td>/work/ik1017/CMIP6/data/CMIP6/AerChemMIP/HAMMO...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>AerChemMIP</td>\n",
       "      <td>HAMMOZ-Consortium</td>\n",
       "      <td>MPI-ESM-1-2-HAM</td>\n",
       "      <td>ssp370-lowNTCF</td>\n",
       "      <td>r1i1p1f1</td>\n",
       "      <td>Lmon</td>\n",
       "      <td>npp</td>\n",
       "      <td>gn</td>\n",
       "      <td>NaN</td>\n",
       "      <td>v20190627</td>\n",
       "      <td>201501-203412</td>\n",
       "      <td>/work/ik1017/CMIP6/data/CMIP6/AerChemMIP/HAMMO...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>AerChemMIP</td>\n",
       "      <td>HAMMOZ-Consortium</td>\n",
       "      <td>MPI-ESM-1-2-HAM</td>\n",
       "      <td>ssp370-lowNTCF</td>\n",
       "      <td>r1i1p1f1</td>\n",
       "      <td>Lmon</td>\n",
       "      <td>npp</td>\n",
       "      <td>gn</td>\n",
       "      <td>NaN</td>\n",
       "      <td>v20190627</td>\n",
       "      <td>205501-205512</td>\n",
       "      <td>/work/ik1017/CMIP6/data/CMIP6/AerChemMIP/HAMMO...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>AerChemMIP</td>\n",
       "      <td>HAMMOZ-Consortium</td>\n",
       "      <td>MPI-ESM-1-2-HAM</td>\n",
       "      <td>ssp370-lowNTCF</td>\n",
       "      <td>r1i1p1f1</td>\n",
       "      <td>Lmon</td>\n",
       "      <td>tsl</td>\n",
       "      <td>gn</td>\n",
       "      <td>NaN</td>\n",
       "      <td>v20190627</td>\n",
       "      <td>205501-205512</td>\n",
       "      <td>/work/ik1017/CMIP6/data/CMIP6/AerChemMIP/HAMMO...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>AerChemMIP</td>\n",
       "      <td>HAMMOZ-Consortium</td>\n",
       "      <td>MPI-ESM-1-2-HAM</td>\n",
       "      <td>ssp370-lowNTCF</td>\n",
       "      <td>r1i1p1f1</td>\n",
       "      <td>Lmon</td>\n",
       "      <td>tsl</td>\n",
       "      <td>gn</td>\n",
       "      <td>NaN</td>\n",
       "      <td>v20190627</td>\n",
       "      <td>201501-203412</td>\n",
       "      <td>/work/ik1017/CMIP6/data/CMIP6/AerChemMIP/HAMMO...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  activity_id     institution_id        source_id   experiment_id member_id  \\\n",
       "0  AerChemMIP  HAMMOZ-Consortium  MPI-ESM-1-2-HAM  ssp370-lowNTCF  r1i1p1f1   \n",
       "1  AerChemMIP  HAMMOZ-Consortium  MPI-ESM-1-2-HAM  ssp370-lowNTCF  r1i1p1f1   \n",
       "2  AerChemMIP  HAMMOZ-Consortium  MPI-ESM-1-2-HAM  ssp370-lowNTCF  r1i1p1f1   \n",
       "3  AerChemMIP  HAMMOZ-Consortium  MPI-ESM-1-2-HAM  ssp370-lowNTCF  r1i1p1f1   \n",
       "4  AerChemMIP  HAMMOZ-Consortium  MPI-ESM-1-2-HAM  ssp370-lowNTCF  r1i1p1f1   \n",
       "\n",
       "  table_id variable_id grid_label  dcpp_init_year    version     time_range  \\\n",
       "0     Lmon         npp         gn             NaN  v20190627  203501-205412   \n",
       "1     Lmon         npp         gn             NaN  v20190627  201501-203412   \n",
       "2     Lmon         npp         gn             NaN  v20190627  205501-205512   \n",
       "3     Lmon         tsl         gn             NaN  v20190627  205501-205512   \n",
       "4     Lmon         tsl         gn             NaN  v20190627  201501-203412   \n",
       "\n",
       "                                                path  \n",
       "0  /work/ik1017/CMIP6/data/CMIP6/AerChemMIP/HAMMO...  \n",
       "1  /work/ik1017/CMIP6/data/CMIP6/AerChemMIP/HAMMO...  \n",
       "2  /work/ik1017/CMIP6/data/CMIP6/AerChemMIP/HAMMO...  \n",
       "3  /work/ik1017/CMIP6/data/CMIP6/AerChemMIP/HAMMO...  \n",
       "4  /work/ik1017/CMIP6/data/CMIP6/AerChemMIP/HAMMO...  "
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "col = intake.open_esm_datastore(col_url)\n",
    "\n",
    "# col.df is a pandas.DataFrame: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html#pandas.DataFrame\n",
    "col.df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## many experiments"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>activity_id</th>\n",
       "      <th>institution_id</th>\n",
       "      <th>source_id</th>\n",
       "      <th>experiment_id</th>\n",
       "      <th>member_id</th>\n",
       "      <th>table_id</th>\n",
       "      <th>variable_id</th>\n",
       "      <th>grid_label</th>\n",
       "      <th>dcpp_init_year</th>\n",
       "      <th>version</th>\n",
       "      <th>time_range</th>\n",
       "      <th>path</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>CMIP</td>\n",
       "      <td>NCAR</td>\n",
       "      <td>CESM2</td>\n",
       "      <td>piControl</td>\n",
       "      <td>r1i1p1f1</td>\n",
       "      <td>Amon</td>\n",
       "      <td>tas</td>\n",
       "      <td>gn</td>\n",
       "      <td>NaN</td>\n",
       "      <td>v20190320</td>\n",
       "      <td>110001-120012</td>\n",
       "      <td>/work/ik1017/CMIP6/data/CMIP6/CMIP/NCAR/CESM2/...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>CMIP</td>\n",
       "      <td>NCAR</td>\n",
       "      <td>CESM2</td>\n",
       "      <td>piControl</td>\n",
       "      <td>r1i1p1f1</td>\n",
       "      <td>Amon</td>\n",
       "      <td>tas</td>\n",
       "      <td>gn</td>\n",
       "      <td>NaN</td>\n",
       "      <td>v20190320</td>\n",
       "      <td>010001-019912</td>\n",
       "      <td>/work/ik1017/CMIP6/data/CMIP6/CMIP/NCAR/CESM2/...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>CMIP</td>\n",
       "      <td>NCAR</td>\n",
       "      <td>CESM2</td>\n",
       "      <td>piControl</td>\n",
       "      <td>r1i1p1f1</td>\n",
       "      <td>Amon</td>\n",
       "      <td>tas</td>\n",
       "      <td>gn</td>\n",
       "      <td>NaN</td>\n",
       "      <td>v20190320</td>\n",
       "      <td>070001-079912</td>\n",
       "      <td>/work/ik1017/CMIP6/data/CMIP6/CMIP/NCAR/CESM2/...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>CMIP</td>\n",
       "      <td>NCAR</td>\n",
       "      <td>CESM2</td>\n",
       "      <td>piControl</td>\n",
       "      <td>r1i1p1f1</td>\n",
       "      <td>Amon</td>\n",
       "      <td>tas</td>\n",
       "      <td>gn</td>\n",
       "      <td>NaN</td>\n",
       "      <td>v20190320</td>\n",
       "      <td>080001-089912</td>\n",
       "      <td>/work/ik1017/CMIP6/data/CMIP6/CMIP/NCAR/CESM2/...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>CMIP</td>\n",
       "      <td>NCAR</td>\n",
       "      <td>CESM2</td>\n",
       "      <td>piControl</td>\n",
       "      <td>r1i1p1f1</td>\n",
       "      <td>Amon</td>\n",
       "      <td>tas</td>\n",
       "      <td>gn</td>\n",
       "      <td>NaN</td>\n",
       "      <td>v20190320</td>\n",
       "      <td>060001-069912</td>\n",
       "      <td>/work/ik1017/CMIP6/data/CMIP6/CMIP/NCAR/CESM2/...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  activity_id institution_id source_id experiment_id member_id table_id  \\\n",
       "0        CMIP           NCAR     CESM2     piControl  r1i1p1f1     Amon   \n",
       "1        CMIP           NCAR     CESM2     piControl  r1i1p1f1     Amon   \n",
       "2        CMIP           NCAR     CESM2     piControl  r1i1p1f1     Amon   \n",
       "3        CMIP           NCAR     CESM2     piControl  r1i1p1f1     Amon   \n",
       "4        CMIP           NCAR     CESM2     piControl  r1i1p1f1     Amon   \n",
       "\n",
       "  variable_id grid_label  dcpp_init_year    version     time_range  \\\n",
       "0         tas         gn             NaN  v20190320  110001-120012   \n",
       "1         tas         gn             NaN  v20190320  010001-019912   \n",
       "2         tas         gn             NaN  v20190320  070001-079912   \n",
       "3         tas         gn             NaN  v20190320  080001-089912   \n",
       "4         tas         gn             NaN  v20190320  060001-069912   \n",
       "\n",
       "                                                path  \n",
       "0  /work/ik1017/CMIP6/data/CMIP6/CMIP/NCAR/CESM2/...  \n",
       "1  /work/ik1017/CMIP6/data/CMIP6/CMIP/NCAR/CESM2/...  \n",
       "2  /work/ik1017/CMIP6/data/CMIP6/CMIP/NCAR/CESM2/...  \n",
       "3  /work/ik1017/CMIP6/data/CMIP6/CMIP/NCAR/CESM2/...  \n",
       "4  /work/ik1017/CMIP6/data/CMIP6/CMIP/NCAR/CESM2/...  "
      ]
     },
     "execution_count": 78,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = dict(experiment_id=['historical','piControl'],\n",
    "             source_id='CESM2',\n",
    "             variable_id='tas', table_id='Amon'\n",
    "            )\n",
    "cat = col.search(**query)\n",
    "cat.df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['piControl', 'historical'], dtype=object)"
      ]
     },
     "execution_count": 79,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# two experiments are there\n",
    "cat.df.experiment_id.unique()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>activity_id</th>\n",
       "      <th>institution_id</th>\n",
       "      <th>source_id</th>\n",
       "      <th>experiment_id</th>\n",
       "      <th>member_id</th>\n",
       "      <th>table_id</th>\n",
       "      <th>variable_id</th>\n",
       "      <th>grid_label</th>\n",
       "      <th>dcpp_init_year</th>\n",
       "      <th>version</th>\n",
       "      <th>time_range</th>\n",
       "      <th>path</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Empty DataFrame\n",
       "Columns: [activity_id, institution_id, source_id, experiment_id, member_id, table_id, variable_id, grid_label, dcpp_init_year, version, time_range, path]\n",
       "Index: []"
      ]
     },
     "execution_count": 81,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# but do not match with require_all_on='source_id'\n",
    "query = dict(experiment_id=['historical','piControl'], source_id='CESM2',\n",
    "             require_all_on='source_id',\n",
    "             variable_id='tas', table_id='Amon'\n",
    "            )\n",
    "cat = col.search(**query)\n",
    "cat.df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'2020.3.16'"
      ]
     },
     "execution_count": 82,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "intake_esm.__version__"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "intake-esm",
   "language": "python",
   "name": "intake-esm"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"#### Conda debug"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 7,
	"metadata": {},
	"outputs": [],
	"source": [
	"#!conda info"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 67,
	"metadata": {},
	"outputs": [],
	"source": [
	"#!which jupyter"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 66,
	"metadata": {},
	"outputs": [],
	"source": [
	"#!which jupyter-lab"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 3,
	"metadata": {},
	"outputs": [],
	"source": [
	"#!jupyter-troubleshoot"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"#### get resources"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 1,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"Number of CPUs: 48, number of threads: 8, number of workers: 6\n"
	]
	}
	],
	"source": [
	"from dask.distributed import Client\n",
	"import multiprocessing\n",
	"ncpu = multiprocessing.cpu_count()\n",
	"threads = 8\n",
	"nworker = ncpu//threads\n",
	"print(f'Number of CPUs: {ncpu}, number of threads: {threads}, number of workers: {nworker}')"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 2,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/html": [
	"<table style=\"border: 2px solid white;\">\n",
	"<tr>\n",
	"<td style=\"vertical-align: top; border: 0px solid white\">\n",
	"<h3 style=\"text-align: left;\">Client</h3>\n",
	"<ul style=\"text-align: left; list-style: none; margin: 0; padding: 0;\">\n",
	" <li><b>Scheduler: </b>inproc://136.172.50.56/13728/1</li>\n",
	" <li><b>Dashboard: </b><a href='http://localhost:8888/proxy/8787/status' target='_blank'>http://localhost:8888/proxy/8787/status</a>\n",
	"</ul>\n",
	"</td>\n",
	"<td style=\"vertical-align: top; border: 0px solid white\">\n",
	"<h3 style=\"text-align: left;\">Cluster</h3>\n",
	"<ul style=\"text-align: left; list-style:none; margin: 0; padding: 0;\">\n",
	" <li><b>Workers: </b>6</li>\n",
	" <li><b>Cores: </b>48</li>\n",
	" <li><b>Memory: </b>16.11 GB</li>\n",
	"</ul>\n",
	"</td>\n",
	"</tr>\n",
	"</table>"
	],
	"text/plain": [
	"<Client: 'inproc://136.172.50.56/13728/1' processes=6 threads=48, memory=16.11 GB>"
	]
	},
	"execution_count": 2,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"client = Client(processes=False, threads_per_worker=threads, n_workers=nworker, memory_limit='256GB')\n",
	"client"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"If you want to use the `dask labextension dashboard`, please install you own conda and then intake-esm."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"# Intake to load CMIP data\n",
	"\n",
	"### Using intake-esm on mistral\n",
	"\n",
	"- install intake-esm: https://intake-esm.readthedocs.io/en/latest/installation.html\n",
	"- check the already built catalogs: `/home/mpim/m300524/intake-esm-datastore/catalogs` or `https://github.com/NCAR/intake-esm-datastore/` and skip long catalog building process of running `/home/mpim/m300524/intake-esm-datastore/builders/*.ipynb`\n",
	"\n",
	"Available catalogs:\n",
	"- CMIP6: `/home/mpim/m300524/intake-esm-datastore/catalogs/mistral-cmip6.json`\n",
	"- CMIP5: `/home/mpim/m300524/intake-esm-datastore/catalogs/mistral-cmip5.json`\n",
	"- MPI Grand Ensemble: `/home/mpim/m300524/intake-esm-datastore/catalogs/mistral-MPI-GE.json`"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 3,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"aws-cesm1-le.csv.gz mistral-cmip5.csv.gz mistral-miklip.json\n",
	"glade-cmip5.csv.gz mistral-cmip5.json mistral-MPI-GE.csv.gz\n",
	"glade-cmip5.json mistral-cmip6.csv.gz mistral-MPI-GE.json\n",
	"glade-cmip6.csv.gz mistral-cmip6.json pangeo-cmip6.json\n",
	"glade-cmip6.json mistral-miklip.csv.gz\n"
	]
	}
	],
	"source": [
	"# available catalogs: combination of .json and .csv\n",
	"!ls /home/mpim/m300524/intake-esm-datastore/catalogs"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"---\n",
	"You may see some warnings below. About tornado and bokeh. I will try to fix this. This does not happen when taking the default `-p shared`, but then we often have too little memory for the analysis. Therefore, I recommend for now using `-p compute`\n",
	"\n",
	"---"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 4,
	"metadata": {},
	"outputs": [],
	"source": [
	"import intake\n",
	"import xarray as xr\n",
	"import numpy as np\n",
	"import pandas as pd\n",
	"import matplotlib.pyplot as plt\n",
	"import pprint\n",
	"%matplotlib inline"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 5,
	"metadata": {},
	"outputs": [],
	"source": [
	"#import warnings\n",
	"#warnings.simplefilter(\"ignore\")"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 6,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"'2020.3.16'"
	]
	},
	"execution_count": 6,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"# should be >= 2019.12.13\n",
	"import intake_esm\n",
	"intake_esm.__version__"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"# CMIP6"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 7,
	"metadata": {},
	"outputs": [],
	"source": [
	"# json file contains the rules for concat and merge as well as the location for the catalog .csv file\n",
	"col_url = \"/home/mpim/m300524/intake-esm-datastore/catalogs/mistral-cmip6.json\""
	]
	},
	{
	"cell_type": "code",
	"execution_count": 8,
	"metadata": {},
	"outputs": [],
	"source": [
	"#!cat /home/mpim/m300524/intake-esm-datastore/catalogs/mistral-cmip6.json"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 9,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/html": [
	"<div>\n",
	"<style scoped>\n",
	" .dataframe tbody tr th:only-of-type {\n",
	" vertical-align: middle;\n",
	" }\n",
	"\n",
	" .dataframe tbody tr th {\n",
	" vertical-align: top;\n",
	" }\n",
	"\n",
	" .dataframe thead th {\n",
	" text-align: right;\n",
	" }\n",
	"</style>\n",
	"<table border=\"1\" class=\"dataframe\">\n",
	" <thead>\n",
	" <tr style=\"text-align: right;\">\n",
	" <th></th>\n",
	" <th>activity_id</th>\n",
	" <th>institution_id</th>\n",
	" <th>source_id</th>\n",
	" <th>experiment_id</th>\n",
	" <th>member_id</th>\n",
	" <th>table_id</th>\n",
	" <th>variable_id</th>\n",
	" <th>grid_label</th>\n",
	" <th>dcpp_init_year</th>\n",
	" <th>version</th>\n",
	" <th>time_range</th>\n",
	" <th>path</th>\n",
	" </tr>\n",
	" </thead>\n",
	" <tbody>\n",
	" <tr>\n",
	" <th>0</th>\n",
	" <td>AerChemMIP</td>\n",
	" <td>HAMMOZ-Consortium</td>\n",
	" <td>MPI-ESM-1-2-HAM</td>\n",
	" <td>ssp370-lowNTCF</td>\n",
	" <td>r1i1p1f1</td>\n",
	" <td>Lmon</td>\n",
	" <td>npp</td>\n",
	" <td>gn</td>\n",
	" <td>NaN</td>\n",
	" <td>v20190627</td>\n",
	" <td>203501-205412</td>\n",
	" <td>/work/ik1017/CMIP6/data/CMIP6/AerChemMIP/HAMMO...</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>1</th>\n",
	" <td>AerChemMIP</td>\n",
	" <td>HAMMOZ-Consortium</td>\n",
	" <td>MPI-ESM-1-2-HAM</td>\n",
	" <td>ssp370-lowNTCF</td>\n",
	" <td>r1i1p1f1</td>\n",
	" <td>Lmon</td>\n",
	" <td>npp</td>\n",
	" <td>gn</td>\n",
	" <td>NaN</td>\n",
	" <td>v20190627</td>\n",
	" <td>201501-203412</td>\n",
	" <td>/work/ik1017/CMIP6/data/CMIP6/AerChemMIP/HAMMO...</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>2</th>\n",
	" <td>AerChemMIP</td>\n",
	" <td>HAMMOZ-Consortium</td>\n",
	" <td>MPI-ESM-1-2-HAM</td>\n",
	" <td>ssp370-lowNTCF</td>\n",
	" <td>r1i1p1f1</td>\n",
	" <td>Lmon</td>\n",
	" <td>npp</td>\n",
	" <td>gn</td>\n",
	" <td>NaN</td>\n",
	" <td>v20190627</td>\n",
	" <td>205501-205512</td>\n",
	" <td>/work/ik1017/CMIP6/data/CMIP6/AerChemMIP/HAMMO...</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>3</th>\n",
	" <td>AerChemMIP</td>\n",
	" <td>HAMMOZ-Consortium</td>\n",
	" <td>MPI-ESM-1-2-HAM</td>\n",
	" <td>ssp370-lowNTCF</td>\n",
	" <td>r1i1p1f1</td>\n",
	" <td>Lmon</td>\n",
	" <td>tsl</td>\n",
	" <td>gn</td>\n",
	" <td>NaN</td>\n",
	" <td>v20190627</td>\n",
	" <td>205501-205512</td>\n",
	" <td>/work/ik1017/CMIP6/data/CMIP6/AerChemMIP/HAMMO...</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>4</th>\n",
	" <td>AerChemMIP</td>\n",
	" <td>HAMMOZ-Consortium</td>\n",
	" <td>MPI-ESM-1-2-HAM</td>\n",
	" <td>ssp370-lowNTCF</td>\n",
	" <td>r1i1p1f1</td>\n",
	" <td>Lmon</td>\n",
	" <td>tsl</td>\n",
	" <td>gn</td>\n",
	" <td>NaN</td>\n",
	" <td>v20190627</td>\n",
	" <td>201501-203412</td>\n",
	" <td>/work/ik1017/CMIP6/data/CMIP6/AerChemMIP/HAMMO...</td>\n",
	" </tr>\n",
	" </tbody>\n",
	"</table>\n",
	"</div>"
	],
	"text/plain": [
	" activity_id institution_id source_id experiment_id member_id \\\n",
	"0 AerChemMIP HAMMOZ-Consortium MPI-ESM-1-2-HAM ssp370-lowNTCF r1i1p1f1 \n",
	"1 AerChemMIP HAMMOZ-Consortium MPI-ESM-1-2-HAM ssp370-lowNTCF r1i1p1f1 \n",
	"2 AerChemMIP HAMMOZ-Consortium MPI-ESM-1-2-HAM ssp370-lowNTCF r1i1p1f1 \n",
	"3 AerChemMIP HAMMOZ-Consortium MPI-ESM-1-2-HAM ssp370-lowNTCF r1i1p1f1 \n",
	"4 AerChemMIP HAMMOZ-Consortium MPI-ESM-1-2-HAM ssp370-lowNTCF r1i1p1f1 \n",
	"\n",
	" table_id variable_id grid_label dcpp_init_year version time_range \\\n",
	"0 Lmon npp gn NaN v20190627 203501-205412 \n",
	"1 Lmon npp gn NaN v20190627 201501-203412 \n",
	"2 Lmon npp gn NaN v20190627 205501-205512 \n",
	"3 Lmon tsl gn NaN v20190627 205501-205512 \n",
	"4 Lmon tsl gn NaN v20190627 201501-203412 \n",
	"\n",
	" path \n",
	"0 /work/ik1017/CMIP6/data/CMIP6/AerChemMIP/HAMMO... \n",
	"1 /work/ik1017/CMIP6/data/CMIP6/AerChemMIP/HAMMO... \n",
	"2 /work/ik1017/CMIP6/data/CMIP6/AerChemMIP/HAMMO... \n",
	"3 /work/ik1017/CMIP6/data/CMIP6/AerChemMIP/HAMMO... \n",
	"4 /work/ik1017/CMIP6/data/CMIP6/AerChemMIP/HAMMO... "
	]
	},
	"execution_count": 9,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"col = intake.open_esm_datastore(col_url)\n",
	"\n",
	"# col.df is a pandas.DataFrame: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html#pandas.DataFrame\n",
	"col.df.head()"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## many experiments"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 78,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/html": [
	"<div>\n",
	"<style scoped>\n",
	" .dataframe tbody tr th:only-of-type {\n",
	" vertical-align: middle;\n",
	" }\n",
	"\n",
	" .dataframe tbody tr th {\n",
	" vertical-align: top;\n",
	" }\n",
	"\n",
	" .dataframe thead th {\n",
	" text-align: right;\n",
	" }\n",
	"</style>\n",
	"<table border=\"1\" class=\"dataframe\">\n",
	" <thead>\n",
	" <tr style=\"text-align: right;\">\n",
	" <th></th>\n",
	" <th>activity_id</th>\n",
	" <th>institution_id</th>\n",
	" <th>source_id</th>\n",
	" <th>experiment_id</th>\n",
	" <th>member_id</th>\n",
	" <th>table_id</th>\n",
	" <th>variable_id</th>\n",
	" <th>grid_label</th>\n",
	" <th>dcpp_init_year</th>\n",
	" <th>version</th>\n",
	" <th>time_range</th>\n",
	" <th>path</th>\n",
	" </tr>\n",
	" </thead>\n",
	" <tbody>\n",
	" <tr>\n",
	" <th>0</th>\n",
	" <td>CMIP</td>\n",
	" <td>NCAR</td>\n",
	" <td>CESM2</td>\n",
	" <td>piControl</td>\n",
	" <td>r1i1p1f1</td>\n",
	" <td>Amon</td>\n",
	" <td>tas</td>\n",
	" <td>gn</td>\n",
	" <td>NaN</td>\n",
	" <td>v20190320</td>\n",
	" <td>110001-120012</td>\n",
	" <td>/work/ik1017/CMIP6/data/CMIP6/CMIP/NCAR/CESM2/...</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>1</th>\n",
	" <td>CMIP</td>\n",
	" <td>NCAR</td>\n",
	" <td>CESM2</td>\n",
	" <td>piControl</td>\n",
	" <td>r1i1p1f1</td>\n",
	" <td>Amon</td>\n",
	" <td>tas</td>\n",
	" <td>gn</td>\n",
	" <td>NaN</td>\n",
	" <td>v20190320</td>\n",
	" <td>010001-019912</td>\n",
	" <td>/work/ik1017/CMIP6/data/CMIP6/CMIP/NCAR/CESM2/...</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>2</th>\n",
	" <td>CMIP</td>\n",
	" <td>NCAR</td>\n",
	" <td>CESM2</td>\n",
	" <td>piControl</td>\n",
	" <td>r1i1p1f1</td>\n",
	" <td>Amon</td>\n",
	" <td>tas</td>\n",
	" <td>gn</td>\n",
	" <td>NaN</td>\n",
	" <td>v20190320</td>\n",
	" <td>070001-079912</td>\n",
	" <td>/work/ik1017/CMIP6/data/CMIP6/CMIP/NCAR/CESM2/...</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>3</th>\n",
	" <td>CMIP</td>\n",
	" <td>NCAR</td>\n",
	" <td>CESM2</td>\n",
	" <td>piControl</td>\n",
	" <td>r1i1p1f1</td>\n",
	" <td>Amon</td>\n",
	" <td>tas</td>\n",
	" <td>gn</td>\n",
	" <td>NaN</td>\n",
	" <td>v20190320</td>\n",
	" <td>080001-089912</td>\n",
	" <td>/work/ik1017/CMIP6/data/CMIP6/CMIP/NCAR/CESM2/...</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>4</th>\n",
	" <td>CMIP</td>\n",
	" <td>NCAR</td>\n",
	" <td>CESM2</td>\n",
	" <td>piControl</td>\n",
	" <td>r1i1p1f1</td>\n",
	" <td>Amon</td>\n",
	" <td>tas</td>\n",
	" <td>gn</td>\n",
	" <td>NaN</td>\n",
	" <td>v20190320</td>\n",
	" <td>060001-069912</td>\n",
	" <td>/work/ik1017/CMIP6/data/CMIP6/CMIP/NCAR/CESM2/...</td>\n",
	" </tr>\n",
	" </tbody>\n",
	"</table>\n",
	"</div>"
	],
	"text/plain": [
	" activity_id institution_id source_id experiment_id member_id table_id \\\n",
	"0 CMIP NCAR CESM2 piControl r1i1p1f1 Amon \n",
	"1 CMIP NCAR CESM2 piControl r1i1p1f1 Amon \n",
	"2 CMIP NCAR CESM2 piControl r1i1p1f1 Amon \n",
	"3 CMIP NCAR CESM2 piControl r1i1p1f1 Amon \n",
	"4 CMIP NCAR CESM2 piControl r1i1p1f1 Amon \n",
	"\n",
	" variable_id grid_label dcpp_init_year version time_range \\\n",
	"0 tas gn NaN v20190320 110001-120012 \n",
	"1 tas gn NaN v20190320 010001-019912 \n",
	"2 tas gn NaN v20190320 070001-079912 \n",
	"3 tas gn NaN v20190320 080001-089912 \n",
	"4 tas gn NaN v20190320 060001-069912 \n",
	"\n",
	" path \n",
	"0 /work/ik1017/CMIP6/data/CMIP6/CMIP/NCAR/CESM2/... \n",
	"1 /work/ik1017/CMIP6/data/CMIP6/CMIP/NCAR/CESM2/... \n",
	"2 /work/ik1017/CMIP6/data/CMIP6/CMIP/NCAR/CESM2/... \n",
	"3 /work/ik1017/CMIP6/data/CMIP6/CMIP/NCAR/CESM2/... \n",
	"4 /work/ik1017/CMIP6/data/CMIP6/CMIP/NCAR/CESM2/... "
	]
	},
	"execution_count": 78,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"query = dict(experiment_id=['historical','piControl'],\n",
	" source_id='CESM2',\n",
	" variable_id='tas', table_id='Amon'\n",
	" )\n",
	"cat = col.search(**query)\n",
	"cat.df.head()"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 79,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"array(['piControl', 'historical'], dtype=object)"
	]
	},
	"execution_count": 79,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"# two experiments are there\n",
	"cat.df.experiment_id.unique()"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": []
	},
	{
	"cell_type": "code",
	"execution_count": 81,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/html": [
	"<div>\n",
	"<style scoped>\n",
	" .dataframe tbody tr th:only-of-type {\n",
	" vertical-align: middle;\n",
	" }\n",
	"\n",
	" .dataframe tbody tr th {\n",
	" vertical-align: top;\n",
	" }\n",
	"\n",
	" .dataframe thead th {\n",
	" text-align: right;\n",
	" }\n",
	"</style>\n",
	"<table border=\"1\" class=\"dataframe\">\n",
	" <thead>\n",
	" <tr style=\"text-align: right;\">\n",
	" <th></th>\n",
	" <th>activity_id</th>\n",
	" <th>institution_id</th>\n",
	" <th>source_id</th>\n",
	" <th>experiment_id</th>\n",
	" <th>member_id</th>\n",
	" <th>table_id</th>\n",
	" <th>variable_id</th>\n",
	" <th>grid_label</th>\n",
	" <th>dcpp_init_year</th>\n",
	" <th>version</th>\n",
	" <th>time_range</th>\n",
	" <th>path</th>\n",
	" </tr>\n",
	" </thead>\n",
	" <tbody>\n",
	" </tbody>\n",
	"</table>\n",
	"</div>"
	],
	"text/plain": [
	"Empty DataFrame\n",
	"Columns: [activity_id, institution_id, source_id, experiment_id, member_id, table_id, variable_id, grid_label, dcpp_init_year, version, time_range, path]\n",
	"Index: []"
	]
	},
	"execution_count": 81,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"# but do not match with require_all_on='source_id'\n",
	"query = dict(experiment_id=['historical','piControl'], source_id='CESM2',\n",
	" require_all_on='source_id',\n",
	" variable_id='tas', table_id='Amon'\n",
	" )\n",
	"cat = col.search(**query)\n",
	"cat.df.head()"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": []
	},
	{
	"cell_type": "code",
	"execution_count": 82,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"'2020.3.16'"
	]
	},
	"execution_count": 82,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"intake_esm.__version__"
	]
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "intake-esm",
	"language": "python",
	"name": "intake-esm"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.6.10"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 4
	}