NelsonMinar/Pandas-MultiIndex-slicing-demo.ipynb

## Pandas-MultiIndex-slicing-demo.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## A demonstration of simple uses of MultiIndex\n",
    "\n",
    "Pandas Dataframes generally have an \"index\", one column of a dataset that gives the name for each row. It works like a primary key in a database table. But Pandas also supports a [MultiIndex](http://pandas.pydata.org/pandas-docs/stable/advanced.html), in which the index for a row is some composite key of several columns. It's quite confusing at first, here's a simple demo of creating a multi-indexed DataFrame and then querying subsets with various syntax."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import pandas, io"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create an unindexed DataFrame from a CSV file"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Fruit</th>\n",
       "      <th>Color</th>\n",
       "      <th>Count</th>\n",
       "      <th>Price</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Apple</td>\n",
       "      <td>Red</td>\n",
       "      <td>3</td>\n",
       "      <td>$1.29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Apple</td>\n",
       "      <td>Green</td>\n",
       "      <td>9</td>\n",
       "      <td>$0.99</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Pear</td>\n",
       "      <td>Red</td>\n",
       "      <td>25</td>\n",
       "      <td>$2.59</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Pear</td>\n",
       "      <td>Green</td>\n",
       "      <td>26</td>\n",
       "      <td>$2.79</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Lime</td>\n",
       "      <td>Green</td>\n",
       "      <td>99</td>\n",
       "      <td>$0.39</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Fruit  Color  Count  Price\n",
       "0  Apple    Red      3  $1.29\n",
       "1  Apple  Green      9  $0.99\n",
       "2   Pear    Red     25  $2.59\n",
       "3   Pear  Green     26  $2.79\n",
       "4   Lime  Green     99  $0.39"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = io.StringIO('''Fruit,Color,Count,Price\n",
    "Apple,Red,3,$1.29\n",
    "Apple,Green,9,$0.99\n",
    "Pear,Red,25,$2.59\n",
    "Pear,Green,26,$2.79\n",
    "Lime,Green,99,$0.39\n",
    "''')\n",
    "df_unindexed = pandas.read_csv(data)\n",
    "df_unindexed"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Add a multi-index based on two columns\n",
    "\n",
    "Note, `set_index()` creates a new DataFrame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>Count</th>\n",
       "      <th>Price</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Fruit</th>\n",
       "      <th>Color</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Apple</th>\n",
       "      <th>Red</th>\n",
       "      <td>3</td>\n",
       "      <td>$1.29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Green</th>\n",
       "      <td>9</td>\n",
       "      <td>$0.99</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Pear</th>\n",
       "      <th>Red</th>\n",
       "      <td>25</td>\n",
       "      <td>$2.59</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Green</th>\n",
       "      <td>26</td>\n",
       "      <td>$2.79</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Lime</th>\n",
       "      <th>Green</th>\n",
       "      <td>99</td>\n",
       "      <td>$0.39</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Count  Price\n",
       "Fruit Color              \n",
       "Apple Red        3  $1.29\n",
       "      Green      9  $0.99\n",
       "Pear  Red       25  $2.59\n",
       "      Green     26  $2.79\n",
       "Lime  Green     99  $0.39"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = df_unindexed.set_index(['Fruit', 'Color'])\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Querying the data frame with `.xs()`\n",
    "\n",
    "This works pretty simply, but the resulting DataFrames no longer have the multi-index. Also `.xs()` is not the most powerful way to subset a DataFrame.\n",
    "\n",
    "Find all Apples"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Count</th>\n",
       "      <th>Price</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Color</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Red</th>\n",
       "      <td>3</td>\n",
       "      <td>$1.29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Green</th>\n",
       "      <td>9</td>\n",
       "      <td>$0.99</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       Count  Price\n",
       "Color              \n",
       "Red        3  $1.29\n",
       "Green      9  $0.99"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.xs('Apple')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Find all red fruits"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Count</th>\n",
       "      <th>Price</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Fruit</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Apple</th>\n",
       "      <td>3</td>\n",
       "      <td>$1.29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Pear</th>\n",
       "      <td>25</td>\n",
       "      <td>$2.59</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       Count  Price\n",
       "Fruit              \n",
       "Apple      3  $1.29\n",
       "Pear      25  $2.59"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.xs('Red', level='Color')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Querying the data frame with `.loc[]` for single values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`.loc[]` is the preferred way to subset a DataFrame based on labels.\n",
    "\n",
    "Find all rows with the label \"Apple\". Extract all columns\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Count</th>\n",
       "      <th>Price</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Color</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Red</th>\n",
       "      <td>3</td>\n",
       "      <td>$1.29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Green</th>\n",
       "      <td>9</td>\n",
       "      <td>$0.99</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       Count  Price\n",
       "Color              \n",
       "Red        3  $1.29\n",
       "Green      9  $0.99"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc['Apple', :]        # the , : means \"all columns\"; we could name a subset of columns here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Find all red apples, using a tuple"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Count        3\n",
       "Price    $1.29\n",
       "Name: (Apple, Red), dtype: object"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[('Apple', 'Red'), :]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Find all apples using a tuple. This prints a warning in Pandas 0.18.1, see below"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/nelson/src/lol-winrate-by-tier/venv/lib/python3.4/site-packages/pandas/core/indexing.py:1294: PerformanceWarning: indexing past lexsort depth may impact performance.\n",
      "  return self._getitem_tuple(key)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Count</th>\n",
       "      <th>Price</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Color</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Red</th>\n",
       "      <td>3</td>\n",
       "      <td>$1.29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Green</th>\n",
       "      <td>9</td>\n",
       "      <td>$0.99</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       Count  Price\n",
       "Color              \n",
       "Red        3  $1.29\n",
       "Green      9  $0.99"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[('Apple', ), :]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Lexsorting\n",
    "\n",
    "Unfortunately we've been cheating with our MultiIndex all along. Pandas really wants your DataFrame to be sorted if you are doing complicated queries with a MultiIndex. Slice queries require sorting. This is documented as [the need for sortedness with MultiIndex](http://pandas.pydata.org/pandas-docs/stable/advanced.html#the-need-for-sortedness-with-multiindex). Sometimes this is called \"lexsorting\", because lexicographic sorting of the index keys is common.\n",
    "\n",
    "If your DataFrame is not sorted, you will see several possible errors or warnings.\n",
    "\n",
    "`PerformanceWarning: indexing past lexsort depth may impact performance.`  \n",
    "`KeyError: 'Key length (1) was greater than MultiIndex lexsort depth (0)'`  \n",
    "`KeyError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (2), lexsort depth (1)'`\n",
    "\n",
    "The simple solution is to sort the DataFrame using [sortlevel()](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sortlevel.html), which sorts lexicographically. If you don't like that sorting order, more control is available with [sort_index()](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_index.html#pandas.DataFrame.sort_index). I believe all that Pandas cares about is that your DataFrame was sorted; it tracks whether that happened with one flag per MultiIndex level. It doesn't care what order they were actually sorted in, although presumably it affects the semantics of label slicing.\n",
    "\n",
    "Anyway, let's sort our DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "df.sortlevel(inplace=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 💥 Ta-Da!💥\n",
    "\n",
    "Note that the dataframe is now reordered, with Lime in the middle and Green before Red."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>Count</th>\n",
       "      <th>Price</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Fruit</th>\n",
       "      <th>Color</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Apple</th>\n",
       "      <th>Green</th>\n",
       "      <td>9</td>\n",
       "      <td>$0.99</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Red</th>\n",
       "      <td>3</td>\n",
       "      <td>$1.29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Lime</th>\n",
       "      <th>Green</th>\n",
       "      <td>99</td>\n",
       "      <td>$0.39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Pear</th>\n",
       "      <th>Green</th>\n",
       "      <td>26</td>\n",
       "      <td>$2.79</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Red</th>\n",
       "      <td>25</td>\n",
       "      <td>$2.59</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Count  Price\n",
       "Fruit Color              \n",
       "Apple Green      9  $0.99\n",
       "      Red        3  $1.29\n",
       "Lime  Green     99  $0.39\n",
       "Pear  Green     26  $2.79\n",
       "      Red       25  $2.59"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now querying with a tuple won't print a warning about lexsort"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Count</th>\n",
       "      <th>Price</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Color</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Green</th>\n",
       "      <td>9</td>\n",
       "      <td>$0.99</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Red</th>\n",
       "      <td>3</td>\n",
       "      <td>$1.29</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       Count  Price\n",
       "Color              \n",
       "Green      9  $0.99\n",
       "Red        3  $1.29"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[('Apple', ), :]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Querying the data frame with .loc[] for ranges of values\n",
    "\n",
    "We can uses slices to find ranges of things in our DataFrame. Note the full MultiIndex is preserved in the result sets, which is nice.\n",
    "\n",
    "Slicing with index labels is a bit weird. As the docs say, \"contrary to usual python slices, both the start and the stop are included!\" Also the order implied by slicing is a bit ambiguous (see discussion above about Lexsorting)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Find all Apples with an explicit slice construction"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>Count</th>\n",
       "      <th>Price</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Fruit</th>\n",
       "      <th>Color</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Apple</th>\n",
       "      <th>Green</th>\n",
       "      <td>9</td>\n",
       "      <td>$0.99</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Red</th>\n",
       "      <td>3</td>\n",
       "      <td>$1.29</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Count  Price\n",
       "Fruit Color              \n",
       "Apple Green      9  $0.99\n",
       "      Red        3  $1.29"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[slice('Apple', 'Apple'), :]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Find everything between Limes and Pears"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>Count</th>\n",
       "      <th>Price</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Fruit</th>\n",
       "      <th>Color</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Lime</th>\n",
       "      <th>Green</th>\n",
       "      <td>99</td>\n",
       "      <td>$0.39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Pear</th>\n",
       "      <th>Green</th>\n",
       "      <td>26</td>\n",
       "      <td>$2.79</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Red</th>\n",
       "      <td>25</td>\n",
       "      <td>$2.59</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Count  Price\n",
       "Fruit Color              \n",
       "Lime  Green     99  $0.39\n",
       "Pear  Green     26  $2.79\n",
       "      Red       25  $2.59"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[slice('Lime', 'Pear'), :]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So far we've only been querying by the first level of the index. How do we query by the second, the color?  \n",
    "First, let's make the \"all apples\" a bit more explicit, using a tuple of two slices."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>Count</th>\n",
       "      <th>Price</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Fruit</th>\n",
       "      <th>Color</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Apple</th>\n",
       "      <th>Green</th>\n",
       "      <td>9</td>\n",
       "      <td>$0.99</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Red</th>\n",
       "      <td>3</td>\n",
       "      <td>$1.29</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Count  Price\n",
       "Fruit Color              \n",
       "Apple Green      9  $0.99\n",
       "      Red        3  $1.29"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[(slice('Apple', 'Apple'), slice(None)), :]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`slice(None)` is a wildcard of sorts. How about all red fruits?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>Count</th>\n",
       "      <th>Price</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Fruit</th>\n",
       "      <th>Color</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Apple</th>\n",
       "      <th>Red</th>\n",
       "      <td>3</td>\n",
       "      <td>$1.29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Pear</th>\n",
       "      <th>Red</th>\n",
       "      <td>25</td>\n",
       "      <td>$2.59</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Count  Price\n",
       "Fruit Color              \n",
       "Apple Red        3  $1.29\n",
       "Pear  Red       25  $2.59"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[(slice(None), slice('Red', 'Red')), :]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Typing `slice()` a lot is tedious, so Pandas has a (poorly documented) helper called `IndexSlice` that allows for some syntactic sugar. Here's all red fruits using that syntax."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>Count</th>\n",
       "      <th>Price</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Fruit</th>\n",
       "      <th>Color</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Apple</th>\n",
       "      <th>Red</th>\n",
       "      <td>3</td>\n",
       "      <td>$1.29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Pear</th>\n",
       "      <th>Red</th>\n",
       "      <td>25</td>\n",
       "      <td>$2.59</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Count  Price\n",
       "Fruit Color              \n",
       "Apple Red        3  $1.29\n",
       "Pear  Red       25  $2.59"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "idx = pandas.IndexSlice\n",
    "\n",
    "df.loc[idx[:,['Red']], :]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And finally all green Limes and Pears"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>Count</th>\n",
       "      <th>Price</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Fruit</th>\n",
       "      <th>Color</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Lime</th>\n",
       "      <th>Green</th>\n",
       "      <td>99</td>\n",
       "      <td>$0.39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Pear</th>\n",
       "      <th>Green</th>\n",
       "      <td>26</td>\n",
       "      <td>$2.79</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Count  Price\n",
       "Fruit Color              \n",
       "Lime  Green     99  $0.39\n",
       "Pear  Green     26  $2.79"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[idx['Lime':'Pear','Green'],:]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Slicing DataFrames can get quite complex. For more examples, see [the Pandas docs](http://pandas.pydata.org/pandas-docs/stable/advanced.html?highlight=indexslice#using-slicers)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.4.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## A demonstration of simple uses of MultiIndex\n",
	"\n",
	"Pandas Dataframes generally have an \"index\", one column of a dataset that gives the name for each row. It works like a primary key in a database table. But Pandas also supports a [MultiIndex](http://pandas.pydata.org/pandas-docs/stable/advanced.html), in which the index for a row is some composite key of several columns. It's quite confusing at first, here's a simple demo of creating a multi-indexed DataFrame and then querying subsets with various syntax."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 1,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"import pandas, io"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Create an unindexed DataFrame from a CSV file"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 2,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"data": {
	"text/html": [
	"<div>\n",
	"<table border=\"1\" class=\"dataframe\">\n",
	" <thead>\n",
	" <tr style=\"text-align: right;\">\n",
	" <th></th>\n",
	" <th>Fruit</th>\n",
	" <th>Color</th>\n",
	" <th>Count</th>\n",
	" <th>Price</th>\n",
	" </tr>\n",
	" </thead>\n",
	" <tbody>\n",
	" <tr>\n",
	" <th>0</th>\n",
	" <td>Apple</td>\n",
	" <td>Red</td>\n",
	" <td>3</td>\n",
	" <td>$1.29</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>1</th>\n",
	" <td>Apple</td>\n",
	" <td>Green</td>\n",
	" <td>9</td>\n",
	" <td>$0.99</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>2</th>\n",
	" <td>Pear</td>\n",
	" <td>Red</td>\n",
	" <td>25</td>\n",
	" <td>$2.59</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>3</th>\n",
	" <td>Pear</td>\n",
	" <td>Green</td>\n",
	" <td>26</td>\n",
	" <td>$2.79</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>4</th>\n",
	" <td>Lime</td>\n",
	" <td>Green</td>\n",
	" <td>99</td>\n",
	" <td>$0.39</td>\n",
	" </tr>\n",
	" </tbody>\n",
	"</table>\n",
	"</div>"
	],
	"text/plain": [
	" Fruit Color Count Price\n",
	"0 Apple Red 3 $1.29\n",
	"1 Apple Green 9 $0.99\n",
	"2 Pear Red 25 $2.59\n",
	"3 Pear Green 26 $2.79\n",
	"4 Lime Green 99 $0.39"
	]
	},
	"execution_count": 2,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"data = io.StringIO('''Fruit,Color,Count,Price\n",
	"Apple,Red,3,$1.29\n",
	"Apple,Green,9,$0.99\n",
	"Pear,Red,25,$2.59\n",
	"Pear,Green,26,$2.79\n",
	"Lime,Green,99,$0.39\n",
	"''')\n",
	"df_unindexed = pandas.read_csv(data)\n",
	"df_unindexed"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Add a multi-index based on two columns\n",
	"\n",
	"Note, `set_index()` creates a new DataFrame"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 3,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"data": {
	"text/html": [
	"<div>\n",
	"<table border=\"1\" class=\"dataframe\">\n",
	" <thead>\n",
	" <tr style=\"text-align: right;\">\n",
	" <th></th>\n",
	" <th></th>\n",
	" <th>Count</th>\n",
	" <th>Price</th>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>Fruit</th>\n",
	" <th>Color</th>\n",
	" <th></th>\n",
	" <th></th>\n",
	" </tr>\n",
	" </thead>\n",
	" <tbody>\n",
	" <tr>\n",
	" <th rowspan=\"2\" valign=\"top\">Apple</th>\n",
	" <th>Red</th>\n",
	" <td>3</td>\n",
	" <td>$1.29</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>Green</th>\n",
	" <td>9</td>\n",
	" <td>$0.99</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th rowspan=\"2\" valign=\"top\">Pear</th>\n",
	" <th>Red</th>\n",
	" <td>25</td>\n",
	" <td>$2.59</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>Green</th>\n",
	" <td>26</td>\n",
	" <td>$2.79</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>Lime</th>\n",
	" <th>Green</th>\n",
	" <td>99</td>\n",
	" <td>$0.39</td>\n",
	" </tr>\n",
	" </tbody>\n",
	"</table>\n",
	"</div>"
	],
	"text/plain": [
	" Count Price\n",
	"Fruit Color \n",
	"Apple Red 3 $1.29\n",
	" Green 9 $0.99\n",
	"Pear Red 25 $2.59\n",
	" Green 26 $2.79\n",
	"Lime Green 99 $0.39"
	]
	},
	"execution_count": 3,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"df = df_unindexed.set_index(['Fruit', 'Color'])\n",
	"df"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Querying the data frame with `.xs()`\n",
	"\n",
	"This works pretty simply, but the resulting DataFrames no longer have the multi-index. Also `.xs()` is not the most powerful way to subset a DataFrame.\n",
	"\n",
	"Find all Apples"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 4,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"data": {
	"text/html": [
	"<div>\n",
	"<table border=\"1\" class=\"dataframe\">\n",
	" <thead>\n",
	" <tr style=\"text-align: right;\">\n",
	" <th></th>\n",
	" <th>Count</th>\n",
	" <th>Price</th>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>Color</th>\n",
	" <th></th>\n",
	" <th></th>\n",
	" </tr>\n",
	" </thead>\n",
	" <tbody>\n",
	" <tr>\n",
	" <th>Red</th>\n",
	" <td>3</td>\n",
	" <td>$1.29</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>Green</th>\n",
	" <td>9</td>\n",
	" <td>$0.99</td>\n",
	" </tr>\n",
	" </tbody>\n",
	"</table>\n",
	"</div>"
	],
	"text/plain": [
	" Count Price\n",
	"Color \n",
	"Red 3 $1.29\n",
	"Green 9 $0.99"
	]
	},
	"execution_count": 4,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"df.xs('Apple')"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Find all red fruits"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 5,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"data": {
	"text/html": [
	"<div>\n",
	"<table border=\"1\" class=\"dataframe\">\n",
	" <thead>\n",
	" <tr style=\"text-align: right;\">\n",
	" <th></th>\n",
	" <th>Count</th>\n",
	" <th>Price</th>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>Fruit</th>\n",
	" <th></th>\n",
	" <th></th>\n",
	" </tr>\n",
	" </thead>\n",
	" <tbody>\n",
	" <tr>\n",
	" <th>Apple</th>\n",
	" <td>3</td>\n",
	" <td>$1.29</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>Pear</th>\n",
	" <td>25</td>\n",
	" <td>$2.59</td>\n",
	" </tr>\n",
	" </tbody>\n",
	"</table>\n",
	"</div>"
	],
	"text/plain": [
	" Count Price\n",
	"Fruit \n",
	"Apple 3 $1.29\n",
	"Pear 25 $2.59"
	]
	},
	"execution_count": 5,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"df.xs('Red', level='Color')"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Querying the data frame with `.loc[]` for single values"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"`.loc[]` is the preferred way to subset a DataFrame based on labels.\n",
	"\n",
	"Find all rows with the label \"Apple\". Extract all columns\n"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 6,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"data": {
	"text/html": [
	"<div>\n",
	"<table border=\"1\" class=\"dataframe\">\n",
	" <thead>\n",
	" <tr style=\"text-align: right;\">\n",
	" <th></th>\n",
	" <th>Count</th>\n",
	" <th>Price</th>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>Color</th>\n",
	" <th></th>\n",
	" <th></th>\n",
	" </tr>\n",
	" </thead>\n",
	" <tbody>\n",
	" <tr>\n",
	" <th>Red</th>\n",
	" <td>3</td>\n",
	" <td>$1.29</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>Green</th>\n",
	" <td>9</td>\n",
	" <td>$0.99</td>\n",
	" </tr>\n",
	" </tbody>\n",
	"</table>\n",
	"</div>"
	],
	"text/plain": [
	" Count Price\n",
	"Color \n",
	"Red 3 $1.29\n",
	"Green 9 $0.99"
	]
	},
	"execution_count": 6,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"df.loc['Apple', :] # the , : means \"all columns\"; we could name a subset of columns here"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Find all red apples, using a tuple"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 7,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"Count 3\n",
	"Price $1.29\n",
	"Name: (Apple, Red), dtype: object"
	]
	},
	"execution_count": 7,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"df.loc[('Apple', 'Red'), :]"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Find all apples using a tuple. This prints a warning in Pandas 0.18.1, see below"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 8,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"name": "stderr",
	"output_type": "stream",
	"text": [
	"/home/nelson/src/lol-winrate-by-tier/venv/lib/python3.4/site-packages/pandas/core/indexing.py:1294: PerformanceWarning: indexing past lexsort depth may impact performance.\n",
	" return self._getitem_tuple(key)\n"
	]
	},
	{
	"data": {
	"text/html": [
	"<div>\n",
	"<table border=\"1\" class=\"dataframe\">\n",
	" <thead>\n",
	" <tr style=\"text-align: right;\">\n",
	" <th></th>\n",
	" <th>Count</th>\n",
	" <th>Price</th>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>Color</th>\n",
	" <th></th>\n",
	" <th></th>\n",
	" </tr>\n",
	" </thead>\n",
	" <tbody>\n",
	" <tr>\n",
	" <th>Red</th>\n",
	" <td>3</td>\n",
	" <td>$1.29</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>Green</th>\n",
	" <td>9</td>\n",
	" <td>$0.99</td>\n",
	" </tr>\n",
	" </tbody>\n",
	"</table>\n",
	"</div>"
	],
	"text/plain": [
	" Count Price\n",
	"Color \n",
	"Red 3 $1.29\n",
	"Green 9 $0.99"
	]
	},
	"execution_count": 8,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"df.loc[('Apple', ), :]"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Lexsorting\n",
	"\n",
	"Unfortunately we've been cheating with our MultiIndex all along. Pandas really wants your DataFrame to be sorted if you are doing complicated queries with a MultiIndex. Slice queries require sorting. This is documented as [the need for sortedness with MultiIndex](http://pandas.pydata.org/pandas-docs/stable/advanced.html#the-need-for-sortedness-with-multiindex). Sometimes this is called \"lexsorting\", because lexicographic sorting of the index keys is common.\n",
	"\n",
	"If your DataFrame is not sorted, you will see several possible errors or warnings.\n",
	"\n",
	"`PerformanceWarning: indexing past lexsort depth may impact performance.` \n",
	"`KeyError: 'Key length (1) was greater than MultiIndex lexsort depth (0)'` \n",
	"`KeyError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (2), lexsort depth (1)'`\n",
	"\n",
	"The simple solution is to sort the DataFrame using [sortlevel()](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sortlevel.html), which sorts lexicographically. If you don't like that sorting order, more control is available with [sort_index()](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_index.html#pandas.DataFrame.sort_index). I believe all that Pandas cares about is that your DataFrame was sorted; it tracks whether that happened with one flag per MultiIndex level. It doesn't care what order they were actually sorted in, although presumably it affects the semantics of label slicing.\n",
	"\n",
	"Anyway, let's sort our DataFrame."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 9,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"df.sortlevel(inplace=True)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### 💥 Ta-Da!💥\n",
	"\n",
	"Note that the dataframe is now reordered, with Lime in the middle and Green before Red."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 10,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"data": {
	"text/html": [
	"<div>\n",
	"<table border=\"1\" class=\"dataframe\">\n",
	" <thead>\n",
	" <tr style=\"text-align: right;\">\n",
	" <th></th>\n",
	" <th></th>\n",
	" <th>Count</th>\n",
	" <th>Price</th>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>Fruit</th>\n",
	" <th>Color</th>\n",
	" <th></th>\n",
	" <th></th>\n",
	" </tr>\n",
	" </thead>\n",
	" <tbody>\n",
	" <tr>\n",
	" <th rowspan=\"2\" valign=\"top\">Apple</th>\n",
	" <th>Green</th>\n",
	" <td>9</td>\n",
	" <td>$0.99</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>Red</th>\n",
	" <td>3</td>\n",
	" <td>$1.29</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>Lime</th>\n",
	" <th>Green</th>\n",
	" <td>99</td>\n",
	" <td>$0.39</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th rowspan=\"2\" valign=\"top\">Pear</th>\n",
	" <th>Green</th>\n",
	" <td>26</td>\n",
	" <td>$2.79</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>Red</th>\n",
	" <td>25</td>\n",
	" <td>$2.59</td>\n",
	" </tr>\n",
	" </tbody>\n",
	"</table>\n",
	"</div>"
	],
	"text/plain": [
	" Count Price\n",
	"Fruit Color \n",
	"Apple Green 9 $0.99\n",
	" Red 3 $1.29\n",
	"Lime Green 99 $0.39\n",
	"Pear Green 26 $2.79\n",
	" Red 25 $2.59"
	]
	},
	"execution_count": 10,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"df"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Now querying with a tuple won't print a warning about lexsort"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 11,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"data": {
	"text/html": [
	"<div>\n",
	"<table border=\"1\" class=\"dataframe\">\n",
	" <thead>\n",
	" <tr style=\"text-align: right;\">\n",
	" <th></th>\n",
	" <th>Count</th>\n",
	" <th>Price</th>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>Color</th>\n",
	" <th></th>\n",
	" <th></th>\n",
	" </tr>\n",
	" </thead>\n",
	" <tbody>\n",
	" <tr>\n",
	" <th>Green</th>\n",
	" <td>9</td>\n",
	" <td>$0.99</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>Red</th>\n",
	" <td>3</td>\n",
	" <td>$1.29</td>\n",
	" </tr>\n",
	" </tbody>\n",
	"</table>\n",
	"</div>"
	],
	"text/plain": [
	" Count Price\n",
	"Color \n",
	"Green 9 $0.99\n",
	"Red 3 $1.29"
	]
	},
	"execution_count": 11,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"df.loc[('Apple', ), :]"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Querying the data frame with .loc[] for ranges of values\n",
	"\n",
	"We can uses slices to find ranges of things in our DataFrame. Note the full MultiIndex is preserved in the result sets, which is nice.\n",
	"\n",
	"Slicing with index labels is a bit weird. As the docs say, \"contrary to usual python slices, both the start and the stop are included!\" Also the order implied by slicing is a bit ambiguous (see discussion above about Lexsorting)."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Find all Apples with an explicit slice construction"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 12,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"data": {
	"text/html": [
	"<div>\n",
	"<table border=\"1\" class=\"dataframe\">\n",
	" <thead>\n",
	" <tr style=\"text-align: right;\">\n",
	" <th></th>\n",
	" <th></th>\n",
	" <th>Count</th>\n",
	" <th>Price</th>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>Fruit</th>\n",
	" <th>Color</th>\n",
	" <th></th>\n",
	" <th></th>\n",
	" </tr>\n",
	" </thead>\n",
	" <tbody>\n",
	" <tr>\n",
	" <th rowspan=\"2\" valign=\"top\">Apple</th>\n",
	" <th>Green</th>\n",
	" <td>9</td>\n",
	" <td>$0.99</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>Red</th>\n",
	" <td>3</td>\n",
	" <td>$1.29</td>\n",
	" </tr>\n",
	" </tbody>\n",
	"</table>\n",
	"</div>"
	],
	"text/plain": [
	" Count Price\n",
	"Fruit Color \n",
	"Apple Green 9 $0.99\n",
	" Red 3 $1.29"
	]
	},
	"execution_count": 12,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"df.loc[slice('Apple', 'Apple'), :]"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Find everything between Limes and Pears"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 13,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"data": {
	"text/html": [
	"<div>\n",
	"<table border=\"1\" class=\"dataframe\">\n",
	" <thead>\n",
	" <tr style=\"text-align: right;\">\n",
	" <th></th>\n",
	" <th></th>\n",
	" <th>Count</th>\n",
	" <th>Price</th>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>Fruit</th>\n",
	" <th>Color</th>\n",
	" <th></th>\n",
	" <th></th>\n",
	" </tr>\n",
	" </thead>\n",
	" <tbody>\n",
	" <tr>\n",
	" <th>Lime</th>\n",
	" <th>Green</th>\n",
	" <td>99</td>\n",
	" <td>$0.39</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th rowspan=\"2\" valign=\"top\">Pear</th>\n",
	" <th>Green</th>\n",
	" <td>26</td>\n",
	" <td>$2.79</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>Red</th>\n",
	" <td>25</td>\n",
	" <td>$2.59</td>\n",
	" </tr>\n",
	" </tbody>\n",
	"</table>\n",
	"</div>"
	],
	"text/plain": [
	" Count Price\n",
	"Fruit Color \n",
	"Lime Green 99 $0.39\n",
	"Pear Green 26 $2.79\n",
	" Red 25 $2.59"
	]
	},
	"execution_count": 13,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"df.loc[slice('Lime', 'Pear'), :]"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"So far we've only been querying by the first level of the index. How do we query by the second, the color? \n",
	"First, let's make the \"all apples\" a bit more explicit, using a tuple of two slices."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 14,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"data": {
	"text/html": [
	"<div>\n",
	"<table border=\"1\" class=\"dataframe\">\n",
	" <thead>\n",
	" <tr style=\"text-align: right;\">\n",
	" <th></th>\n",
	" <th></th>\n",
	" <th>Count</th>\n",
	" <th>Price</th>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>Fruit</th>\n",
	" <th>Color</th>\n",
	" <th></th>\n",
	" <th></th>\n",
	" </tr>\n",
	" </thead>\n",
	" <tbody>\n",
	" <tr>\n",
	" <th rowspan=\"2\" valign=\"top\">Apple</th>\n",
	" <th>Green</th>\n",
	" <td>9</td>\n",
	" <td>$0.99</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>Red</th>\n",
	" <td>3</td>\n",
	" <td>$1.29</td>\n",
	" </tr>\n",
	" </tbody>\n",
	"</table>\n",
	"</div>"
	],
	"text/plain": [
	" Count Price\n",
	"Fruit Color \n",
	"Apple Green 9 $0.99\n",
	" Red 3 $1.29"
	]
	},
	"execution_count": 14,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"df.loc[(slice('Apple', 'Apple'), slice(None)), :]"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"`slice(None)` is a wildcard of sorts. How about all red fruits?"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 15,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"data": {
	"text/html": [
	"<div>\n",
	"<table border=\"1\" class=\"dataframe\">\n",
	" <thead>\n",
	" <tr style=\"text-align: right;\">\n",
	" <th></th>\n",
	" <th></th>\n",
	" <th>Count</th>\n",
	" <th>Price</th>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>Fruit</th>\n",
	" <th>Color</th>\n",
	" <th></th>\n",
	" <th></th>\n",
	" </tr>\n",
	" </thead>\n",
	" <tbody>\n",
	" <tr>\n",
	" <th>Apple</th>\n",
	" <th>Red</th>\n",
	" <td>3</td>\n",
	" <td>$1.29</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>Pear</th>\n",
	" <th>Red</th>\n",
	" <td>25</td>\n",
	" <td>$2.59</td>\n",
	" </tr>\n",
	" </tbody>\n",
	"</table>\n",
	"</div>"
	],
	"text/plain": [
	" Count Price\n",
	"Fruit Color \n",
	"Apple Red 3 $1.29\n",
	"Pear Red 25 $2.59"
	]
	},
	"execution_count": 15,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"df.loc[(slice(None), slice('Red', 'Red')), :]"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Typing `slice()` a lot is tedious, so Pandas has a (poorly documented) helper called `IndexSlice` that allows for some syntactic sugar. Here's all red fruits using that syntax."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 16,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"data": {
	"text/html": [
	"<div>\n",
	"<table border=\"1\" class=\"dataframe\">\n",
	" <thead>\n",
	" <tr style=\"text-align: right;\">\n",
	" <th></th>\n",
	" <th></th>\n",
	" <th>Count</th>\n",
	" <th>Price</th>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>Fruit</th>\n",
	" <th>Color</th>\n",
	" <th></th>\n",
	" <th></th>\n",
	" </tr>\n",
	" </thead>\n",
	" <tbody>\n",
	" <tr>\n",
	" <th>Apple</th>\n",
	" <th>Red</th>\n",
	" <td>3</td>\n",
	" <td>$1.29</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>Pear</th>\n",
	" <th>Red</th>\n",
	" <td>25</td>\n",
	" <td>$2.59</td>\n",
	" </tr>\n",
	" </tbody>\n",
	"</table>\n",
	"</div>"
	],
	"text/plain": [
	" Count Price\n",
	"Fruit Color \n",
	"Apple Red 3 $1.29\n",
	"Pear Red 25 $2.59"
	]
	},
	"execution_count": 16,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"idx = pandas.IndexSlice\n",
	"\n",
	"df.loc[idx[:,['Red']], :]"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"And finally all green Limes and Pears"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 17,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"data": {
	"text/html": [
	"<div>\n",
	"<table border=\"1\" class=\"dataframe\">\n",
	" <thead>\n",
	" <tr style=\"text-align: right;\">\n",
	" <th></th>\n",
	" <th></th>\n",
	" <th>Count</th>\n",
	" <th>Price</th>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>Fruit</th>\n",
	" <th>Color</th>\n",
	" <th></th>\n",
	" <th></th>\n",
	" </tr>\n",
	" </thead>\n",
	" <tbody>\n",
	" <tr>\n",
	" <th>Lime</th>\n",
	" <th>Green</th>\n",
	" <td>99</td>\n",
	" <td>$0.39</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>Pear</th>\n",
	" <th>Green</th>\n",
	" <td>26</td>\n",
	" <td>$2.79</td>\n",
	" </tr>\n",
	" </tbody>\n",
	"</table>\n",
	"</div>"
	],
	"text/plain": [
	" Count Price\n",
	"Fruit Color \n",
	"Lime Green 99 $0.39\n",
	"Pear Green 26 $2.79"
	]
	},
	"execution_count": 17,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"df.loc[idx['Lime':'Pear','Green'],:]"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Slicing DataFrames can get quite complex. For more examples, see [the Pandas docs](http://pandas.pydata.org/pandas-docs/stable/advanced.html?highlight=indexslice#using-slicers)"
	]
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.4.3"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 1
	}