Skip to content

Instantly share code, notes, and snippets.

@ktprezes
Last active August 15, 2023 12:28
Show Gist options
  • Save ktprezes/22d96ee39cc2e9654a7cc13846dc88d3 to your computer and use it in GitHub Desktop.
Save ktprezes/22d96ee39cc2e9654a7cc13846dc88d3 to your computer and use it in GitHub Desktop.
Python - Pandas - MultiIndex DataFrame selections
import pandas as pd
# suppose we have a simple 'stocks' dataframe:
stocks = pd.read_csv('http://bit.ly/smallstocks')
"""# stocks
Date Close Volume Symbol
0 2016-10-03 31.50 14070500 CSCO
1 2016-10-03 112.52 21701800 AAPL
2 2016-10-03 57.42 19189500 MSFT
3 2016-10-04 113.00 29736800 AAPL
4 2016-10-04 57.24 20085900 MSFT
5 2016-10-04 31.35 18460400 CSCO
6 2016-10-05 57.64 16726400 MSFT
7 2016-10-05 31.59 11808600 CSCO
8 2016-10-05 113.05 21453100 AAPL
"""
# and we set MultiIndex:
stocks_mi = stocks.set_index(['Symbol','Date'])
"""# stocks_mi
Close Volume
Symbol Date
CSCO 2016-10-03 31.50 14070500
AAPL 2016-10-03 112.52 21701800
MSFT 2016-10-03 57.42 19189500
AAPL 2016-10-04 113.00 29736800
MSFT 2016-10-04 57.24 20085900
CSCO 2016-10-04 31.35 18460400
MSFT 2016-10-05 57.64 16726400
CSCO 2016-10-05 31.59 11808600
AAPL 2016-10-05 113.05 21453100
"""
# selecting rows according to the first (0th actually) MultiIndex level values is easy:
stocks_mi.loc["CSCO"]
"""
Close Volume
Date
2016-10-03 31.50 14070500
2016-10-04 31.35 18460400
2016-10-05 31.59 11808600
"""
# but when it comes to selecting rows according to values of deeper levels of MultiIndex,
# it doesn't work :(
# these don't work either:
# stocks_mi.loc[(,'2016-10-04'),:]
# SyntaxError: invalid syntax
# stocks_mi.loc[(:,'2016-10-04'),:]
# SyntaxError: invalid syntax
# stocks_mi.loc[(::,'2016-10-04'),:]
# SyntaxError: invalid syntax
# we can do it that way:
stocks_mi.loc[(slice(None),'2016-10-04'),:]
"""
Close Volume
Symbol Date
AAPL 2016-10-04 113.00 29736800
MSFT 2016-10-04 57.24 20085900
CSCO 2016-10-04 31.35 18460400
"""
# but we can also do it that way:
# selection by one value:
stocks_mi[stocks_mi.index.get_level_values(1) == '2016-10-04']
# and selection by values from list:
stocks_mi[stocks_mi.index.get_level_values(1).isin(['2016-10-03', '2016-10-04'])]
# and when the MultiIndex levels have their names (like here: 'Symbol', 'Date'),
# we can use these names in the 'get_level_values' method instead of numbers, e.g.:
stocks.index.get_level_values('Date")
# with this approach, we can create logical expressions,
# and select rows according to values from different levels of MultiIndex
# for example:
stocks_mi[
stocks_mi.index.get_level_values('Date').isin(['2016-10-04','2016-10-03']) &
stocks_mi.index.get_level_values('Symbol').isin(["CSCO", "AAPL"])
].sort_index()
"""
Close Volume
Symbol Date
AAPL 2016-10-03 112.52 21701800
2016-10-04 113.00 29736800
CSCO 2016-10-03 31.50 14070500
2016-10-04 31.35 18460400
"""
# we can also use the '.xs' method:
stocks_mi.xs('2016-10-04', level='Date')
"""
Close Volume
Symbol
AAPL 113.00 29736800
MSFT 57.24 20085900
CSCO 31.35 18460400
"""
# and for tuples of values from different levels we can use:
stocks_mi.xs(('AAPL', '2016-10-04'), level=['Symbol', 'Date'])
"""
Close Volume
Symbol Date
AAPL 2016-10-04 113.0 29736800
"""
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment