Skip to content

Instantly share code, notes, and snippets.

@jing-jin-mc
Created December 2, 2020 20:46
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jing-jin-mc/426d4d4a7a5f9499201d260a85131d41 to your computer and use it in GitHub Desktop.
Save jing-jin-mc/426d4d4a7a5f9499201d260a85131d41 to your computer and use it in GitHub Desktop.
select_data.ipynb
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"metadata": {},
"cell_type": "markdown",
"source": "## Select data\n### Functions:\n* iloc\n* sample\n* isin\n* where\n* query\n* not\n* get\n* lookup"
},
{
"metadata": {
"trusted": false,
"collapsed": true
},
"cell_type": "code",
"source": "import pandas as pd ",
"execution_count": 1,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "#### create a test dataset to demo functions"
},
{
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"trusted": false
},
"cell_type": "code",
"source": "df = pd.DataFrame(data = {'date':['week1','week1','week2','week3','week3','week3'],\n 'sales':[50,20,35,10,20,40],\n 'category':['a','b','a','a','b','a'],\n 'revenue':[14,8,16,None,7,9]\n })\ndf.date = df.date.astype('string')\ndf.category = df.category.astype('string')\ndf",
"execution_count": 2,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>sales</th>\n <th>category</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>week1</td>\n <td>50</td>\n <td>a</td>\n <td>14.0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>week1</td>\n <td>20</td>\n <td>b</td>\n <td>8.0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>week2</td>\n <td>35</td>\n <td>a</td>\n <td>16.0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>week3</td>\n <td>10</td>\n <td>a</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>4</th>\n <td>week3</td>\n <td>20</td>\n <td>b</td>\n <td>7.0</td>\n </tr>\n <tr>\n <th>5</th>\n <td>week3</td>\n <td>40</td>\n <td>a</td>\n <td>9.0</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " date sales category revenue\n0 week1 50 a 14.0\n1 week1 20 b 8.0\n2 week2 35 a 16.0\n3 week3 10 a NaN\n4 week3 20 b 7.0\n5 week3 40 a 9.0"
},
"metadata": {},
"execution_count": 2
}
]
},
{
"metadata": {
"trusted": false
},
"cell_type": "code",
"source": "df.info()",
"execution_count": 3,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": "<class 'pandas.core.frame.DataFrame'>\nRangeIndex: 6 entries, 0 to 5\nData columns (total 4 columns):\n # Column Non-Null Count Dtype \n--- ------ -------------- ----- \n 0 date 6 non-null string \n 1 sales 6 non-null int64 \n 2 category 6 non-null string \n 3 revenue 5 non-null float64\ndtypes: float64(1), int64(1), string(2)\nmemory usage: 320.0 bytes\n"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### DataFrame.iloc[x]\n\nPurely integer-location based indexing for selection by position\n\nHere you are using square brackets since what you put in [ ] is list-like data, e.g 1:3 means index from 1 to 3\n\n#### example 1: \nselect value by integer index"
},
{
"metadata": {
"trusted": false
},
"cell_type": "code",
"source": "df.iloc[0]",
"execution_count": 4,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": "date week1\nsales 50\ncategory a\nrevenue 14\nName: 0, dtype: object"
},
"metadata": {},
"execution_count": 4
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "#### example 2:\nrows from n to m (m is not included)"
},
{
"metadata": {
"trusted": false
},
"cell_type": "code",
"source": "df.iloc[1:3]",
"execution_count": 5,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>sales</th>\n <th>category</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>1</th>\n <td>week1</td>\n <td>20</td>\n <td>b</td>\n <td>8.0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>week2</td>\n <td>35</td>\n <td>a</td>\n <td>16.0</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " date sales category revenue\n1 week1 20 b 8.0\n2 week2 35 a 16.0"
},
"metadata": {},
"execution_count": 5
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "#### excercise: \nSelect the third row (index:2) to the last row (not include the last row). The last index can use -1 when you don't know how long the dataset"
},
{
"metadata": {
"trusted": false
},
"cell_type": "code",
"source": "df.iloc[2:-1]",
"execution_count": 6,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>sales</th>\n <th>category</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>2</th>\n <td>week2</td>\n <td>35</td>\n <td>a</td>\n <td>16.0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>week3</td>\n <td>10</td>\n <td>a</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>4</th>\n <td>week3</td>\n <td>20</td>\n <td>b</td>\n <td>7.0</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " date sales category revenue\n2 week2 35 a 16.0\n3 week3 10 a NaN\n4 week3 20 b 7.0"
},
"metadata": {},
"execution_count": 6
}
]
},
{
"metadata": {
"trusted": false
},
"cell_type": "code",
"source": "df.iloc[2:]",
"execution_count": 7,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>sales</th>\n <th>category</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>2</th>\n <td>week2</td>\n <td>35</td>\n <td>a</td>\n <td>16.0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>week3</td>\n <td>10</td>\n <td>a</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>4</th>\n <td>week3</td>\n <td>20</td>\n <td>b</td>\n <td>7.0</td>\n </tr>\n <tr>\n <th>5</th>\n <td>week3</td>\n <td>40</td>\n <td>a</td>\n <td>9.0</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " date sales category revenue\n2 week2 35 a 16.0\n3 week3 10 a NaN\n4 week3 20 b 7.0\n5 week3 40 a 9.0"
},
"metadata": {},
"execution_count": 7
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### FYI: Using . after a dataframe means you are accessing attributes of a DataFrame object in Pandas.\n* All the following functions are attribute fucntions of DataFrame. \n* All the functions in python are defined in a format: \n * function_name(Parameters)\n * Parameters can be optional"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### DataFrame.sample( n, frac, replace, weights, random_state, axis)\n\nReturn a random sample of items from an axis of object \n\nParameters:\n\n* n: int, optional. Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.\n* frac: float, optioanl. Fraction of axis items to return. Cannot be used with n.\n* replace: bool, default False. Allow or disallow sampling of the same row more than once.\n* weights: str or ndarray-like, optional\n* random_state: int, array-like, BitGenerator, np.random.RandomState, optional. Used for reproducibility.\n* axis: {0 or ‘index’, 1 or ‘columns’, None}, default None\n\n#### example 1: \nGet n random rows"
},
{
"metadata": {
"trusted": false
},
"cell_type": "code",
"source": "df.sample(n=3, random_state = 1)",
"execution_count": 8,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>sales</th>\n <th>category</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>2</th>\n <td>week2</td>\n <td>35</td>\n <td>a</td>\n <td>16.0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>week1</td>\n <td>20</td>\n <td>b</td>\n <td>8.0</td>\n </tr>\n <tr>\n <th>4</th>\n <td>week3</td>\n <td>20</td>\n <td>b</td>\n <td>7.0</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " date sales category revenue\n2 week2 35 a 16.0\n1 week1 20 b 8.0\n4 week3 20 b 7.0"
},
"metadata": {},
"execution_count": 8
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "#### example 2: \nGet 50% sample rows"
},
{
"metadata": {
"trusted": false
},
"cell_type": "code",
"source": "df.sample(frac=0.5, random_state = 1)",
"execution_count": 9,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>sales</th>\n <th>category</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>2</th>\n <td>week2</td>\n <td>35</td>\n <td>a</td>\n <td>16.0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>week1</td>\n <td>20</td>\n <td>b</td>\n <td>8.0</td>\n </tr>\n <tr>\n <th>4</th>\n <td>week3</td>\n <td>20</td>\n <td>b</td>\n <td>7.0</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " date sales category revenue\n2 week2 35 a 16.0\n1 week1 20 b 8.0\n4 week3 20 b 7.0"
},
"metadata": {},
"execution_count": 9
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "#### example 3: \nUsing one column in the dataframe as weights. Rows with larger value in that column are more likely to be sampled."
},
{
"metadata": {
"trusted": false
},
"cell_type": "code",
"source": "df.sample(n=3, random_state = 1, weights = 'revenue')",
"execution_count": 10,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>sales</th>\n <th>category</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>2</th>\n <td>week2</td>\n <td>35</td>\n <td>a</td>\n <td>16.0</td>\n </tr>\n <tr>\n <th>4</th>\n <td>week3</td>\n <td>20</td>\n <td>b</td>\n <td>7.0</td>\n </tr>\n <tr>\n <th>0</th>\n <td>week1</td>\n <td>50</td>\n <td>a</td>\n <td>14.0</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " date sales category revenue\n2 week2 35 a 16.0\n4 week3 20 b 7.0\n0 week1 50 a 14.0"
},
"metadata": {},
"execution_count": 10
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "#### excercise: \nuse random_state =1 and get 2 sample rows from the dataset."
},
{
"metadata": {
"trusted": false
},
"cell_type": "code",
"source": "df.sample(n=2)",
"execution_count": 11,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>sales</th>\n <th>category</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>3</th>\n <td>week3</td>\n <td>10</td>\n <td>a</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>5</th>\n <td>week3</td>\n <td>40</td>\n <td>a</td>\n <td>9.0</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " date sales category revenue\n3 week3 10 a NaN\n5 week3 40 a 9.0"
},
"metadata": {},
"execution_count": 11
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### DataFrame.isin(values)\n\nTo check whether each element in the dataframe is contained in values\n\nParamters: values\n* series or dict\n\nReturns: \n* dataframe of booleans"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "#### example 1: values in a series"
},
{
"metadata": {
"trusted": false
},
"cell_type": "code",
"source": "df.isin([10,20])",
"execution_count": 12,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>sales</th>\n <th>category</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>1</th>\n <td>False</td>\n <td>True</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>2</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>3</th>\n <td>False</td>\n <td>True</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>4</th>\n <td>False</td>\n <td>True</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>5</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " date sales category revenue\n0 False False False False\n1 False True False False\n2 False False False False\n3 False True False False\n4 False True False False\n5 False False False False"
},
"metadata": {},
"execution_count": 12
}
]
},
{
"metadata": {
"trusted": false
},
"cell_type": "code",
"source": "df.category.isin(['a','b'])",
"execution_count": 13,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": "0 True\n1 True\n2 True\n3 True\n4 True\n5 True\nName: category, dtype: bool"
},
"metadata": {},
"execution_count": 13
}
]
},
{
"metadata": {
"trusted": false
},
"cell_type": "code",
"source": "pd.DataFrame(df['category'].isin(['a','b']))",
"execution_count": 14,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>category</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>True</td>\n </tr>\n <tr>\n <th>1</th>\n <td>True</td>\n </tr>\n <tr>\n <th>2</th>\n <td>True</td>\n </tr>\n <tr>\n <th>3</th>\n <td>True</td>\n </tr>\n <tr>\n <th>4</th>\n <td>True</td>\n </tr>\n <tr>\n <th>5</th>\n <td>True</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " category\n0 True\n1 True\n2 True\n3 True\n4 True\n5 True"
},
"metadata": {},
"execution_count": 14
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "#### example 2: values in a dict"
},
{
"metadata": {
"trusted": false
},
"cell_type": "code",
"source": "df",
"execution_count": 15,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>sales</th>\n <th>category</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>week1</td>\n <td>50</td>\n <td>a</td>\n <td>14.0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>week1</td>\n <td>20</td>\n <td>b</td>\n <td>8.0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>week2</td>\n <td>35</td>\n <td>a</td>\n <td>16.0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>week3</td>\n <td>10</td>\n <td>a</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>4</th>\n <td>week3</td>\n <td>20</td>\n <td>b</td>\n <td>7.0</td>\n </tr>\n <tr>\n <th>5</th>\n <td>week3</td>\n <td>40</td>\n <td>a</td>\n <td>9.0</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " date sales category revenue\n0 week1 50 a 14.0\n1 week1 20 b 8.0\n2 week2 35 a 16.0\n3 week3 10 a NaN\n4 week3 20 b 7.0\n5 week3 40 a 9.0"
},
"metadata": {},
"execution_count": 15
}
]
},
{
"metadata": {
"trusted": false
},
"cell_type": "code",
"source": "df.isin({'sales':[10,20],\n 'category':['c','d']\n }\n )",
"execution_count": 16,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>sales</th>\n <th>category</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>1</th>\n <td>False</td>\n <td>True</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>2</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>3</th>\n <td>False</td>\n <td>True</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>4</th>\n <td>False</td>\n <td>True</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>5</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " date sales category revenue\n0 False False False False\n1 False True False False\n2 False False False False\n3 False True False False\n4 False True False False\n5 False False False False"
},
"metadata": {},
"execution_count": 16
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "#### excercise: \ncheck whether the data is before week5. ['week1','week2','week3','week4']"
},
{
"metadata": {
"trusted": false
},
"cell_type": "code",
"source": "df.isin({'date':['week1','week2','week3','week4']\n }\n )",
"execution_count": 17,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>sales</th>\n <th>category</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>True</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>1</th>\n <td>True</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>2</th>\n <td>True</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>3</th>\n <td>True</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>4</th>\n <td>True</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>5</th>\n <td>True</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " date sales category revenue\n0 True False False False\n1 True False False False\n2 True False False False\n3 True False False False\n4 True False False False\n5 True False False False"
},
"metadata": {},
"execution_count": 17
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### DataFrame.where( cond, other = nan, ...)\n\nReplace values where the condition is False. This fucntion is an application of the if-then idiom.\n\nParameters:\n\n* cond:bool Series/DataFrame, array-like, or callable. Where cond is True, keep the original value. Where False, replace with corresponding value from other.\n* other: scalar, Series/DataFrame, or callable\n\n#### example 1: \nfind rows meet the condition and replace with 0"
},
{
"metadata": {
"trusted": false
},
"cell_type": "code",
"source": "df.info()",
"execution_count": 18,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": "<class 'pandas.core.frame.DataFrame'>\nRangeIndex: 6 entries, 0 to 5\nData columns (total 4 columns):\n # Column Non-Null Count Dtype \n--- ------ -------------- ----- \n 0 date 6 non-null string \n 1 sales 6 non-null int64 \n 2 category 6 non-null string \n 3 revenue 5 non-null float64\ndtypes: float64(1), int64(1), string(2)\nmemory usage: 320.0 bytes\n"
}
]
},
{
"metadata": {
"trusted": false
},
"cell_type": "code",
"source": "df.sales.where(df['sales']>20,0)",
"execution_count": 19,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": "0 50\n1 0\n2 35\n3 0\n4 0\n5 40\nName: sales, dtype: int64"
},
"metadata": {},
"execution_count": 19
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "#### excercise: \nfind data for 'a' category and set data as None for other categories "
},
{
"metadata": {
"trusted": false,
"collapsed": true
},
"cell_type": "code",
"source": "df_eg = df.copy()",
"execution_count": 20,
"outputs": []
},
{
"metadata": {
"trusted": false
},
"cell_type": "code",
"source": "df_eg['category'] = df['category'].where(df['category']=='a',None)\ndf_eg",
"execution_count": 21,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>sales</th>\n <th>category</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>week1</td>\n <td>50</td>\n <td>a</td>\n <td>14.0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>week1</td>\n <td>20</td>\n <td>&lt;NA&gt;</td>\n <td>8.0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>week2</td>\n <td>35</td>\n <td>a</td>\n <td>16.0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>week3</td>\n <td>10</td>\n <td>a</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>4</th>\n <td>week3</td>\n <td>20</td>\n <td>&lt;NA&gt;</td>\n <td>7.0</td>\n </tr>\n <tr>\n <th>5</th>\n <td>week3</td>\n <td>40</td>\n <td>a</td>\n <td>9.0</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " date sales category revenue\n0 week1 50 a 14.0\n1 week1 20 <NA> 8.0\n2 week2 35 a 16.0\n3 week3 10 a NaN\n4 week3 20 <NA> 7.0\n5 week3 40 a 9.0"
},
"metadata": {},
"execution_count": 21
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### DataFrame.query(expr)\nUsed to query the columns of a Dataframe with a boolean expression\n\nParameters: expr\n* query expresion in string format\n\nReturns: \n* DataFrame resulting from the provided query expression\n\n#### example 1:"
},
{
"metadata": {
"trusted": false
},
"cell_type": "code",
"source": "df.query('sales > revenue')",
"execution_count": 22,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>sales</th>\n <th>category</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>week1</td>\n <td>50</td>\n <td>a</td>\n <td>14.0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>week1</td>\n <td>20</td>\n <td>b</td>\n <td>8.0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>week2</td>\n <td>35</td>\n <td>a</td>\n <td>16.0</td>\n </tr>\n <tr>\n <th>4</th>\n <td>week3</td>\n <td>20</td>\n <td>b</td>\n <td>7.0</td>\n </tr>\n <tr>\n <th>5</th>\n <td>week3</td>\n <td>40</td>\n <td>a</td>\n <td>9.0</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " date sales category revenue\n0 week1 50 a 14.0\n1 week1 20 b 8.0\n2 week2 35 a 16.0\n4 week3 20 b 7.0\n5 week3 40 a 9.0"
},
"metadata": {},
"execution_count": 22
}
]
},
{
"metadata": {
"trusted": false
},
"cell_type": "code",
"source": "df[df.sales>df.revenue]",
"execution_count": 23,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>sales</th>\n <th>category</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>week1</td>\n <td>50</td>\n <td>a</td>\n <td>14.0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>week1</td>\n <td>20</td>\n <td>b</td>\n <td>8.0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>week2</td>\n <td>35</td>\n <td>a</td>\n <td>16.0</td>\n </tr>\n <tr>\n <th>4</th>\n <td>week3</td>\n <td>20</td>\n <td>b</td>\n <td>7.0</td>\n </tr>\n <tr>\n <th>5</th>\n <td>week3</td>\n <td>40</td>\n <td>a</td>\n <td>9.0</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " date sales category revenue\n0 week1 50 a 14.0\n1 week1 20 b 8.0\n2 week2 35 a 16.0\n4 week3 20 b 7.0\n5 week3 40 a 9.0"
},
"metadata": {},
"execution_count": 23
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "#### excercise: \nselect out data for category 'a' in week1. Symbol & is used for 'and' operation. | is for 'or'"
},
{
"metadata": {
"trusted": false
},
"cell_type": "code",
"source": "df[(df.category =='a') & (df.date =='week1')]",
"execution_count": 24,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>sales</th>\n <th>category</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>week1</td>\n <td>50</td>\n <td>a</td>\n <td>14.0</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " date sales category revenue\n0 week1 50 a 14.0"
},
"metadata": {},
"execution_count": 24
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### DataFrame.notna()\n\nDetect existing (non-missing) values. opposite function is .isna()\n\nReturn a boolean same-sized object indicating if the values are not NA. Non-missing values get mapped to True. \n\nIn python, pandas.dataframe.notna() and pandas.dataframe.isna() are doing exactly same thing as pandas.dataframe.notnull() and pandas.dataframe.isnull()"
},
{
"metadata": {
"trusted": false
},
"cell_type": "code",
"source": "df.notna()",
"execution_count": 25,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>sales</th>\n <th>category</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n </tr>\n <tr>\n <th>1</th>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n </tr>\n <tr>\n <th>2</th>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n </tr>\n <tr>\n <th>3</th>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n <td>False</td>\n </tr>\n <tr>\n <th>4</th>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n </tr>\n <tr>\n <th>5</th>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " date sales category revenue\n0 True True True True\n1 True True True True\n2 True True True True\n3 True True True False\n4 True True True True\n5 True True True True"
},
"metadata": {},
"execution_count": 25
}
]
},
{
"metadata": {
"trusted": false
},
"cell_type": "code",
"source": "df.isna()",
"execution_count": 26,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>sales</th>\n <th>category</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>1</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>2</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>3</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>True</td>\n </tr>\n <tr>\n <th>4</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>5</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " date sales category revenue\n0 False False False False\n1 False False False False\n2 False False False False\n3 False False False True\n4 False False False False\n5 False False False False"
},
"metadata": {},
"execution_count": 26
}
]
},
{
"metadata": {
"trusted": false
},
"cell_type": "code",
"source": "df.isnull()",
"execution_count": 27,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>sales</th>\n <th>category</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>1</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>2</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>3</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>True</td>\n </tr>\n <tr>\n <th>4</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>5</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " date sales category revenue\n0 False False False False\n1 False False False False\n2 False False False False\n3 False False False True\n4 False False False False\n5 False False False False"
},
"metadata": {},
"execution_count": 27
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### Pandas.notna(obj)\n\nDetect non-missing values for an array-like object. Opposite fucntion pandas.isna(obj)\n\nsimilar function like pandas.isnull(), pandas.notnull()"
},
{
"metadata": {
"trusted": false
},
"cell_type": "code",
"source": "pd.notna(df)",
"execution_count": 28,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>sales</th>\n <th>category</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n </tr>\n <tr>\n <th>1</th>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n </tr>\n <tr>\n <th>2</th>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n </tr>\n <tr>\n <th>3</th>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n <td>False</td>\n </tr>\n <tr>\n <th>4</th>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n </tr>\n <tr>\n <th>5</th>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " date sales category revenue\n0 True True True True\n1 True True True True\n2 True True True True\n3 True True True False\n4 True True True True\n5 True True True True"
},
"metadata": {},
"execution_count": 28
}
]
},
{
"metadata": {
"trusted": false
},
"cell_type": "code",
"source": "pd.isna(df)",
"execution_count": 29,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>sales</th>\n <th>category</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>1</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>2</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>3</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>True</td>\n </tr>\n <tr>\n <th>4</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>5</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " date sales category revenue\n0 False False False False\n1 False False False False\n2 False False False False\n3 False False False True\n4 False False False False\n5 False False False False"
},
"metadata": {},
"execution_count": 29
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### DataFrame.get(key)\nGet item from object for given key, same function as df.['key'] and df.key\n\nParameters: key\n* query expresion in string format\n\nReturns: \n* value same type as items contained in object\n\n#### example :"
},
{
"metadata": {
"trusted": false
},
"cell_type": "code",
"source": "df.get('sales')",
"execution_count": 30,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": "0 50\n1 20\n2 35\n3 10\n4 20\n5 40\nName: sales, dtype: int64"
},
"metadata": {},
"execution_count": 30
}
]
},
{
"metadata": {
"trusted": false
},
"cell_type": "code",
"source": "df.get(['sales','revenue'])",
"execution_count": 31,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>sales</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>50</td>\n <td>14.0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>20</td>\n <td>8.0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>35</td>\n <td>16.0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>10</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>4</th>\n <td>20</td>\n <td>7.0</td>\n </tr>\n <tr>\n <th>5</th>\n <td>40</td>\n <td>9.0</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " sales revenue\n0 50 14.0\n1 20 8.0\n2 35 16.0\n3 10 NaN\n4 20 7.0\n5 40 9.0"
},
"metadata": {},
"execution_count": 31
}
]
},
{
"metadata": {
"trusted": false
},
"cell_type": "code",
"source": "df[['sales','revenue','date']]",
"execution_count": 32,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>sales</th>\n <th>revenue</th>\n <th>date</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>50</td>\n <td>14.0</td>\n <td>week1</td>\n </tr>\n <tr>\n <th>1</th>\n <td>20</td>\n <td>8.0</td>\n <td>week1</td>\n </tr>\n <tr>\n <th>2</th>\n <td>35</td>\n <td>16.0</td>\n <td>week2</td>\n </tr>\n <tr>\n <th>3</th>\n <td>10</td>\n <td>NaN</td>\n <td>week3</td>\n </tr>\n <tr>\n <th>4</th>\n <td>20</td>\n <td>7.0</td>\n <td>week3</td>\n </tr>\n <tr>\n <th>5</th>\n <td>40</td>\n <td>9.0</td>\n <td>week3</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " sales revenue date\n0 50 14.0 week1\n1 20 8.0 week1\n2 35 16.0 week2\n3 10 NaN week3\n4 20 7.0 week3\n5 40 9.0 week3"
},
"metadata": {},
"execution_count": 32
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### DataFrame.lookup()\n\n'fancy indexing' function for DataFrame\n\nGiven equal-length arrays of row and column labels, return an array of the values corresponding to each (row, col) pair\n\n#### example"
},
{
"metadata": {
"trusted": false
},
"cell_type": "code",
"source": "df",
"execution_count": 33,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>sales</th>\n <th>category</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>week1</td>\n <td>50</td>\n <td>a</td>\n <td>14.0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>week1</td>\n <td>20</td>\n <td>b</td>\n <td>8.0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>week2</td>\n <td>35</td>\n <td>a</td>\n <td>16.0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>week3</td>\n <td>10</td>\n <td>a</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>4</th>\n <td>week3</td>\n <td>20</td>\n <td>b</td>\n <td>7.0</td>\n </tr>\n <tr>\n <th>5</th>\n <td>week3</td>\n <td>40</td>\n <td>a</td>\n <td>9.0</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " date sales category revenue\n0 week1 50 a 14.0\n1 week1 20 b 8.0\n2 week2 35 a 16.0\n3 week3 10 a NaN\n4 week3 20 b 7.0\n5 week3 40 a 9.0"
},
"metadata": {},
"execution_count": 33
}
]
},
{
"metadata": {
"trusted": false
},
"cell_type": "code",
"source": "df.lookup([0,1,2],['sales','category','revenue'])",
"execution_count": 34,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": "array([50, 'b', 16.0], dtype=object)"
},
"metadata": {},
"execution_count": 34
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## End\n\nThey are only a means to the end of learning how to do good data analysis "
}
],
"metadata": {
"kernelspec": {
"name": "python3",
"display_name": "Python 3",
"language": "python"
},
"gist": {
"id": "",
"data": {
"description": "select_data.ipynb",
"public": true
}
},
"language_info": {
"file_extension": ".py",
"mimetype": "text/x-python",
"version": "3.5.4",
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment