Created
December 2, 2020 20:46
-
-
Save jing-jin-mc/426d4d4a7a5f9499201d260a85131d41 to your computer and use it in GitHub Desktop.
select_data.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "## Select data\n### Functions:\n* iloc\n* sample\n* isin\n* where\n* query\n* not\n* get\n* lookup" | |
}, | |
{ | |
"metadata": { | |
"trusted": false, | |
"collapsed": true | |
}, | |
"cell_type": "code", | |
"source": "import pandas as pd ", | |
"execution_count": 1, | |
"outputs": [] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "#### create a test dataset to demo functions" | |
}, | |
{ | |
"metadata": { | |
"slideshow": { | |
"slide_type": "slide" | |
}, | |
"trusted": false | |
}, | |
"cell_type": "code", | |
"source": "df = pd.DataFrame(data = {'date':['week1','week1','week2','week3','week3','week3'],\n 'sales':[50,20,35,10,20,40],\n 'category':['a','b','a','a','b','a'],\n 'revenue':[14,8,16,None,7,9]\n })\ndf.date = df.date.astype('string')\ndf.category = df.category.astype('string')\ndf", | |
"execution_count": 2, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>sales</th>\n <th>category</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>week1</td>\n <td>50</td>\n <td>a</td>\n <td>14.0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>week1</td>\n <td>20</td>\n <td>b</td>\n <td>8.0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>week2</td>\n <td>35</td>\n <td>a</td>\n <td>16.0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>week3</td>\n <td>10</td>\n <td>a</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>4</th>\n <td>week3</td>\n <td>20</td>\n <td>b</td>\n <td>7.0</td>\n </tr>\n <tr>\n <th>5</th>\n <td>week3</td>\n <td>40</td>\n <td>a</td>\n <td>9.0</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " date sales category revenue\n0 week1 50 a 14.0\n1 week1 20 b 8.0\n2 week2 35 a 16.0\n3 week3 10 a NaN\n4 week3 20 b 7.0\n5 week3 40 a 9.0" | |
}, | |
"metadata": {}, | |
"execution_count": 2 | |
} | |
] | |
}, | |
{ | |
"metadata": { | |
"trusted": false | |
}, | |
"cell_type": "code", | |
"source": "df.info()", | |
"execution_count": 3, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"name": "stdout", | |
"text": "<class 'pandas.core.frame.DataFrame'>\nRangeIndex: 6 entries, 0 to 5\nData columns (total 4 columns):\n # Column Non-Null Count Dtype \n--- ------ -------------- ----- \n 0 date 6 non-null string \n 1 sales 6 non-null int64 \n 2 category 6 non-null string \n 3 revenue 5 non-null float64\ndtypes: float64(1), int64(1), string(2)\nmemory usage: 320.0 bytes\n" | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "### DataFrame.iloc[x]\n\nPurely integer-location based indexing for selection by position\n\nHere you are using square brackets since what you put in [ ] is list-like data, e.g 1:3 means index from 1 to 3\n\n#### example 1: \nselect value by integer index" | |
}, | |
{ | |
"metadata": { | |
"trusted": false | |
}, | |
"cell_type": "code", | |
"source": "df.iloc[0]", | |
"execution_count": 4, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/plain": "date week1\nsales 50\ncategory a\nrevenue 14\nName: 0, dtype: object" | |
}, | |
"metadata": {}, | |
"execution_count": 4 | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "#### example 2:\nrows from n to m (m is not included)" | |
}, | |
{ | |
"metadata": { | |
"trusted": false | |
}, | |
"cell_type": "code", | |
"source": "df.iloc[1:3]", | |
"execution_count": 5, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>sales</th>\n <th>category</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>1</th>\n <td>week1</td>\n <td>20</td>\n <td>b</td>\n <td>8.0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>week2</td>\n <td>35</td>\n <td>a</td>\n <td>16.0</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " date sales category revenue\n1 week1 20 b 8.0\n2 week2 35 a 16.0" | |
}, | |
"metadata": {}, | |
"execution_count": 5 | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "#### excercise: \nSelect the third row (index:2) to the last row (not include the last row). The last index can use -1 when you don't know how long the dataset" | |
}, | |
{ | |
"metadata": { | |
"trusted": false | |
}, | |
"cell_type": "code", | |
"source": "df.iloc[2:-1]", | |
"execution_count": 6, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>sales</th>\n <th>category</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>2</th>\n <td>week2</td>\n <td>35</td>\n <td>a</td>\n <td>16.0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>week3</td>\n <td>10</td>\n <td>a</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>4</th>\n <td>week3</td>\n <td>20</td>\n <td>b</td>\n <td>7.0</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " date sales category revenue\n2 week2 35 a 16.0\n3 week3 10 a NaN\n4 week3 20 b 7.0" | |
}, | |
"metadata": {}, | |
"execution_count": 6 | |
} | |
] | |
}, | |
{ | |
"metadata": { | |
"trusted": false | |
}, | |
"cell_type": "code", | |
"source": "df.iloc[2:]", | |
"execution_count": 7, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>sales</th>\n <th>category</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>2</th>\n <td>week2</td>\n <td>35</td>\n <td>a</td>\n <td>16.0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>week3</td>\n <td>10</td>\n <td>a</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>4</th>\n <td>week3</td>\n <td>20</td>\n <td>b</td>\n <td>7.0</td>\n </tr>\n <tr>\n <th>5</th>\n <td>week3</td>\n <td>40</td>\n <td>a</td>\n <td>9.0</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " date sales category revenue\n2 week2 35 a 16.0\n3 week3 10 a NaN\n4 week3 20 b 7.0\n5 week3 40 a 9.0" | |
}, | |
"metadata": {}, | |
"execution_count": 7 | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "### FYI: Using . after a dataframe means you are accessing attributes of a DataFrame object in Pandas.\n* All the following functions are attribute fucntions of DataFrame. \n* All the functions in python are defined in a format: \n * function_name(Parameters)\n * Parameters can be optional" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "### DataFrame.sample( n, frac, replace, weights, random_state, axis)\n\nReturn a random sample of items from an axis of object \n\nParameters:\n\n* n: int, optional. Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.\n* frac: float, optioanl. Fraction of axis items to return. Cannot be used with n.\n* replace: bool, default False. Allow or disallow sampling of the same row more than once.\n* weights: str or ndarray-like, optional\n* random_state: int, array-like, BitGenerator, np.random.RandomState, optional. Used for reproducibility.\n* axis: {0 or ‘index’, 1 or ‘columns’, None}, default None\n\n#### example 1: \nGet n random rows" | |
}, | |
{ | |
"metadata": { | |
"trusted": false | |
}, | |
"cell_type": "code", | |
"source": "df.sample(n=3, random_state = 1)", | |
"execution_count": 8, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>sales</th>\n <th>category</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>2</th>\n <td>week2</td>\n <td>35</td>\n <td>a</td>\n <td>16.0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>week1</td>\n <td>20</td>\n <td>b</td>\n <td>8.0</td>\n </tr>\n <tr>\n <th>4</th>\n <td>week3</td>\n <td>20</td>\n <td>b</td>\n <td>7.0</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " date sales category revenue\n2 week2 35 a 16.0\n1 week1 20 b 8.0\n4 week3 20 b 7.0" | |
}, | |
"metadata": {}, | |
"execution_count": 8 | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "#### example 2: \nGet 50% sample rows" | |
}, | |
{ | |
"metadata": { | |
"trusted": false | |
}, | |
"cell_type": "code", | |
"source": "df.sample(frac=0.5, random_state = 1)", | |
"execution_count": 9, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>sales</th>\n <th>category</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>2</th>\n <td>week2</td>\n <td>35</td>\n <td>a</td>\n <td>16.0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>week1</td>\n <td>20</td>\n <td>b</td>\n <td>8.0</td>\n </tr>\n <tr>\n <th>4</th>\n <td>week3</td>\n <td>20</td>\n <td>b</td>\n <td>7.0</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " date sales category revenue\n2 week2 35 a 16.0\n1 week1 20 b 8.0\n4 week3 20 b 7.0" | |
}, | |
"metadata": {}, | |
"execution_count": 9 | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "#### example 3: \nUsing one column in the dataframe as weights. Rows with larger value in that column are more likely to be sampled." | |
}, | |
{ | |
"metadata": { | |
"trusted": false | |
}, | |
"cell_type": "code", | |
"source": "df.sample(n=3, random_state = 1, weights = 'revenue')", | |
"execution_count": 10, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>sales</th>\n <th>category</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>2</th>\n <td>week2</td>\n <td>35</td>\n <td>a</td>\n <td>16.0</td>\n </tr>\n <tr>\n <th>4</th>\n <td>week3</td>\n <td>20</td>\n <td>b</td>\n <td>7.0</td>\n </tr>\n <tr>\n <th>0</th>\n <td>week1</td>\n <td>50</td>\n <td>a</td>\n <td>14.0</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " date sales category revenue\n2 week2 35 a 16.0\n4 week3 20 b 7.0\n0 week1 50 a 14.0" | |
}, | |
"metadata": {}, | |
"execution_count": 10 | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "#### excercise: \nuse random_state =1 and get 2 sample rows from the dataset." | |
}, | |
{ | |
"metadata": { | |
"trusted": false | |
}, | |
"cell_type": "code", | |
"source": "df.sample(n=2)", | |
"execution_count": 11, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>sales</th>\n <th>category</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>3</th>\n <td>week3</td>\n <td>10</td>\n <td>a</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>5</th>\n <td>week3</td>\n <td>40</td>\n <td>a</td>\n <td>9.0</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " date sales category revenue\n3 week3 10 a NaN\n5 week3 40 a 9.0" | |
}, | |
"metadata": {}, | |
"execution_count": 11 | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "### DataFrame.isin(values)\n\nTo check whether each element in the dataframe is contained in values\n\nParamters: values\n* series or dict\n\nReturns: \n* dataframe of booleans" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "#### example 1: values in a series" | |
}, | |
{ | |
"metadata": { | |
"trusted": false | |
}, | |
"cell_type": "code", | |
"source": "df.isin([10,20])", | |
"execution_count": 12, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>sales</th>\n <th>category</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>1</th>\n <td>False</td>\n <td>True</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>2</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>3</th>\n <td>False</td>\n <td>True</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>4</th>\n <td>False</td>\n <td>True</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>5</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " date sales category revenue\n0 False False False False\n1 False True False False\n2 False False False False\n3 False True False False\n4 False True False False\n5 False False False False" | |
}, | |
"metadata": {}, | |
"execution_count": 12 | |
} | |
] | |
}, | |
{ | |
"metadata": { | |
"trusted": false | |
}, | |
"cell_type": "code", | |
"source": "df.category.isin(['a','b'])", | |
"execution_count": 13, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/plain": "0 True\n1 True\n2 True\n3 True\n4 True\n5 True\nName: category, dtype: bool" | |
}, | |
"metadata": {}, | |
"execution_count": 13 | |
} | |
] | |
}, | |
{ | |
"metadata": { | |
"trusted": false | |
}, | |
"cell_type": "code", | |
"source": "pd.DataFrame(df['category'].isin(['a','b']))", | |
"execution_count": 14, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>category</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>True</td>\n </tr>\n <tr>\n <th>1</th>\n <td>True</td>\n </tr>\n <tr>\n <th>2</th>\n <td>True</td>\n </tr>\n <tr>\n <th>3</th>\n <td>True</td>\n </tr>\n <tr>\n <th>4</th>\n <td>True</td>\n </tr>\n <tr>\n <th>5</th>\n <td>True</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " category\n0 True\n1 True\n2 True\n3 True\n4 True\n5 True" | |
}, | |
"metadata": {}, | |
"execution_count": 14 | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "#### example 2: values in a dict" | |
}, | |
{ | |
"metadata": { | |
"trusted": false | |
}, | |
"cell_type": "code", | |
"source": "df", | |
"execution_count": 15, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>sales</th>\n <th>category</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>week1</td>\n <td>50</td>\n <td>a</td>\n <td>14.0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>week1</td>\n <td>20</td>\n <td>b</td>\n <td>8.0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>week2</td>\n <td>35</td>\n <td>a</td>\n <td>16.0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>week3</td>\n <td>10</td>\n <td>a</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>4</th>\n <td>week3</td>\n <td>20</td>\n <td>b</td>\n <td>7.0</td>\n </tr>\n <tr>\n <th>5</th>\n <td>week3</td>\n <td>40</td>\n <td>a</td>\n <td>9.0</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " date sales category revenue\n0 week1 50 a 14.0\n1 week1 20 b 8.0\n2 week2 35 a 16.0\n3 week3 10 a NaN\n4 week3 20 b 7.0\n5 week3 40 a 9.0" | |
}, | |
"metadata": {}, | |
"execution_count": 15 | |
} | |
] | |
}, | |
{ | |
"metadata": { | |
"trusted": false | |
}, | |
"cell_type": "code", | |
"source": "df.isin({'sales':[10,20],\n 'category':['c','d']\n }\n )", | |
"execution_count": 16, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>sales</th>\n <th>category</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>1</th>\n <td>False</td>\n <td>True</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>2</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>3</th>\n <td>False</td>\n <td>True</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>4</th>\n <td>False</td>\n <td>True</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>5</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " date sales category revenue\n0 False False False False\n1 False True False False\n2 False False False False\n3 False True False False\n4 False True False False\n5 False False False False" | |
}, | |
"metadata": {}, | |
"execution_count": 16 | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "#### excercise: \ncheck whether the data is before week5. ['week1','week2','week3','week4']" | |
}, | |
{ | |
"metadata": { | |
"trusted": false | |
}, | |
"cell_type": "code", | |
"source": "df.isin({'date':['week1','week2','week3','week4']\n }\n )", | |
"execution_count": 17, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>sales</th>\n <th>category</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>True</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>1</th>\n <td>True</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>2</th>\n <td>True</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>3</th>\n <td>True</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>4</th>\n <td>True</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>5</th>\n <td>True</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " date sales category revenue\n0 True False False False\n1 True False False False\n2 True False False False\n3 True False False False\n4 True False False False\n5 True False False False" | |
}, | |
"metadata": {}, | |
"execution_count": 17 | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "### DataFrame.where( cond, other = nan, ...)\n\nReplace values where the condition is False. This fucntion is an application of the if-then idiom.\n\nParameters:\n\n* cond:bool Series/DataFrame, array-like, or callable. Where cond is True, keep the original value. Where False, replace with corresponding value from other.\n* other: scalar, Series/DataFrame, or callable\n\n#### example 1: \nfind rows meet the condition and replace with 0" | |
}, | |
{ | |
"metadata": { | |
"trusted": false | |
}, | |
"cell_type": "code", | |
"source": "df.info()", | |
"execution_count": 18, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"name": "stdout", | |
"text": "<class 'pandas.core.frame.DataFrame'>\nRangeIndex: 6 entries, 0 to 5\nData columns (total 4 columns):\n # Column Non-Null Count Dtype \n--- ------ -------------- ----- \n 0 date 6 non-null string \n 1 sales 6 non-null int64 \n 2 category 6 non-null string \n 3 revenue 5 non-null float64\ndtypes: float64(1), int64(1), string(2)\nmemory usage: 320.0 bytes\n" | |
} | |
] | |
}, | |
{ | |
"metadata": { | |
"trusted": false | |
}, | |
"cell_type": "code", | |
"source": "df.sales.where(df['sales']>20,0)", | |
"execution_count": 19, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/plain": "0 50\n1 0\n2 35\n3 0\n4 0\n5 40\nName: sales, dtype: int64" | |
}, | |
"metadata": {}, | |
"execution_count": 19 | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "#### excercise: \nfind data for 'a' category and set data as None for other categories " | |
}, | |
{ | |
"metadata": { | |
"trusted": false, | |
"collapsed": true | |
}, | |
"cell_type": "code", | |
"source": "df_eg = df.copy()", | |
"execution_count": 20, | |
"outputs": [] | |
}, | |
{ | |
"metadata": { | |
"trusted": false | |
}, | |
"cell_type": "code", | |
"source": "df_eg['category'] = df['category'].where(df['category']=='a',None)\ndf_eg", | |
"execution_count": 21, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>sales</th>\n <th>category</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>week1</td>\n <td>50</td>\n <td>a</td>\n <td>14.0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>week1</td>\n <td>20</td>\n <td><NA></td>\n <td>8.0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>week2</td>\n <td>35</td>\n <td>a</td>\n <td>16.0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>week3</td>\n <td>10</td>\n <td>a</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>4</th>\n <td>week3</td>\n <td>20</td>\n <td><NA></td>\n <td>7.0</td>\n </tr>\n <tr>\n <th>5</th>\n <td>week3</td>\n <td>40</td>\n <td>a</td>\n <td>9.0</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " date sales category revenue\n0 week1 50 a 14.0\n1 week1 20 <NA> 8.0\n2 week2 35 a 16.0\n3 week3 10 a NaN\n4 week3 20 <NA> 7.0\n5 week3 40 a 9.0" | |
}, | |
"metadata": {}, | |
"execution_count": 21 | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "### DataFrame.query(expr)\nUsed to query the columns of a Dataframe with a boolean expression\n\nParameters: expr\n* query expresion in string format\n\nReturns: \n* DataFrame resulting from the provided query expression\n\n#### example 1:" | |
}, | |
{ | |
"metadata": { | |
"trusted": false | |
}, | |
"cell_type": "code", | |
"source": "df.query('sales > revenue')", | |
"execution_count": 22, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>sales</th>\n <th>category</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>week1</td>\n <td>50</td>\n <td>a</td>\n <td>14.0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>week1</td>\n <td>20</td>\n <td>b</td>\n <td>8.0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>week2</td>\n <td>35</td>\n <td>a</td>\n <td>16.0</td>\n </tr>\n <tr>\n <th>4</th>\n <td>week3</td>\n <td>20</td>\n <td>b</td>\n <td>7.0</td>\n </tr>\n <tr>\n <th>5</th>\n <td>week3</td>\n <td>40</td>\n <td>a</td>\n <td>9.0</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " date sales category revenue\n0 week1 50 a 14.0\n1 week1 20 b 8.0\n2 week2 35 a 16.0\n4 week3 20 b 7.0\n5 week3 40 a 9.0" | |
}, | |
"metadata": {}, | |
"execution_count": 22 | |
} | |
] | |
}, | |
{ | |
"metadata": { | |
"trusted": false | |
}, | |
"cell_type": "code", | |
"source": "df[df.sales>df.revenue]", | |
"execution_count": 23, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>sales</th>\n <th>category</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>week1</td>\n <td>50</td>\n <td>a</td>\n <td>14.0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>week1</td>\n <td>20</td>\n <td>b</td>\n <td>8.0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>week2</td>\n <td>35</td>\n <td>a</td>\n <td>16.0</td>\n </tr>\n <tr>\n <th>4</th>\n <td>week3</td>\n <td>20</td>\n <td>b</td>\n <td>7.0</td>\n </tr>\n <tr>\n <th>5</th>\n <td>week3</td>\n <td>40</td>\n <td>a</td>\n <td>9.0</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " date sales category revenue\n0 week1 50 a 14.0\n1 week1 20 b 8.0\n2 week2 35 a 16.0\n4 week3 20 b 7.0\n5 week3 40 a 9.0" | |
}, | |
"metadata": {}, | |
"execution_count": 23 | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "#### excercise: \nselect out data for category 'a' in week1. Symbol & is used for 'and' operation. | is for 'or'" | |
}, | |
{ | |
"metadata": { | |
"trusted": false | |
}, | |
"cell_type": "code", | |
"source": "df[(df.category =='a') & (df.date =='week1')]", | |
"execution_count": 24, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>sales</th>\n <th>category</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>week1</td>\n <td>50</td>\n <td>a</td>\n <td>14.0</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " date sales category revenue\n0 week1 50 a 14.0" | |
}, | |
"metadata": {}, | |
"execution_count": 24 | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "### DataFrame.notna()\n\nDetect existing (non-missing) values. opposite function is .isna()\n\nReturn a boolean same-sized object indicating if the values are not NA. Non-missing values get mapped to True. \n\nIn python, pandas.dataframe.notna() and pandas.dataframe.isna() are doing exactly same thing as pandas.dataframe.notnull() and pandas.dataframe.isnull()" | |
}, | |
{ | |
"metadata": { | |
"trusted": false | |
}, | |
"cell_type": "code", | |
"source": "df.notna()", | |
"execution_count": 25, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>sales</th>\n <th>category</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n </tr>\n <tr>\n <th>1</th>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n </tr>\n <tr>\n <th>2</th>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n </tr>\n <tr>\n <th>3</th>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n <td>False</td>\n </tr>\n <tr>\n <th>4</th>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n </tr>\n <tr>\n <th>5</th>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " date sales category revenue\n0 True True True True\n1 True True True True\n2 True True True True\n3 True True True False\n4 True True True True\n5 True True True True" | |
}, | |
"metadata": {}, | |
"execution_count": 25 | |
} | |
] | |
}, | |
{ | |
"metadata": { | |
"trusted": false | |
}, | |
"cell_type": "code", | |
"source": "df.isna()", | |
"execution_count": 26, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>sales</th>\n <th>category</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>1</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>2</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>3</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>True</td>\n </tr>\n <tr>\n <th>4</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>5</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " date sales category revenue\n0 False False False False\n1 False False False False\n2 False False False False\n3 False False False True\n4 False False False False\n5 False False False False" | |
}, | |
"metadata": {}, | |
"execution_count": 26 | |
} | |
] | |
}, | |
{ | |
"metadata": { | |
"trusted": false | |
}, | |
"cell_type": "code", | |
"source": "df.isnull()", | |
"execution_count": 27, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>sales</th>\n <th>category</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>1</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>2</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>3</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>True</td>\n </tr>\n <tr>\n <th>4</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>5</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " date sales category revenue\n0 False False False False\n1 False False False False\n2 False False False False\n3 False False False True\n4 False False False False\n5 False False False False" | |
}, | |
"metadata": {}, | |
"execution_count": 27 | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "### Pandas.notna(obj)\n\nDetect non-missing values for an array-like object. Opposite fucntion pandas.isna(obj)\n\nsimilar function like pandas.isnull(), pandas.notnull()" | |
}, | |
{ | |
"metadata": { | |
"trusted": false | |
}, | |
"cell_type": "code", | |
"source": "pd.notna(df)", | |
"execution_count": 28, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>sales</th>\n <th>category</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n </tr>\n <tr>\n <th>1</th>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n </tr>\n <tr>\n <th>2</th>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n </tr>\n <tr>\n <th>3</th>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n <td>False</td>\n </tr>\n <tr>\n <th>4</th>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n </tr>\n <tr>\n <th>5</th>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n <td>True</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " date sales category revenue\n0 True True True True\n1 True True True True\n2 True True True True\n3 True True True False\n4 True True True True\n5 True True True True" | |
}, | |
"metadata": {}, | |
"execution_count": 28 | |
} | |
] | |
}, | |
{ | |
"metadata": { | |
"trusted": false | |
}, | |
"cell_type": "code", | |
"source": "pd.isna(df)", | |
"execution_count": 29, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>sales</th>\n <th>category</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>1</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>2</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>3</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>True</td>\n </tr>\n <tr>\n <th>4</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n <tr>\n <th>5</th>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n <td>False</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " date sales category revenue\n0 False False False False\n1 False False False False\n2 False False False False\n3 False False False True\n4 False False False False\n5 False False False False" | |
}, | |
"metadata": {}, | |
"execution_count": 29 | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "### DataFrame.get(key)\nGet item from object for given key, same function as df.['key'] and df.key\n\nParameters: key\n* query expresion in string format\n\nReturns: \n* value same type as items contained in object\n\n#### example :" | |
}, | |
{ | |
"metadata": { | |
"trusted": false | |
}, | |
"cell_type": "code", | |
"source": "df.get('sales')", | |
"execution_count": 30, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/plain": "0 50\n1 20\n2 35\n3 10\n4 20\n5 40\nName: sales, dtype: int64" | |
}, | |
"metadata": {}, | |
"execution_count": 30 | |
} | |
] | |
}, | |
{ | |
"metadata": { | |
"trusted": false | |
}, | |
"cell_type": "code", | |
"source": "df.get(['sales','revenue'])", | |
"execution_count": 31, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>sales</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>50</td>\n <td>14.0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>20</td>\n <td>8.0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>35</td>\n <td>16.0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>10</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>4</th>\n <td>20</td>\n <td>7.0</td>\n </tr>\n <tr>\n <th>5</th>\n <td>40</td>\n <td>9.0</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " sales revenue\n0 50 14.0\n1 20 8.0\n2 35 16.0\n3 10 NaN\n4 20 7.0\n5 40 9.0" | |
}, | |
"metadata": {}, | |
"execution_count": 31 | |
} | |
] | |
}, | |
{ | |
"metadata": { | |
"trusted": false | |
}, | |
"cell_type": "code", | |
"source": "df[['sales','revenue','date']]", | |
"execution_count": 32, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>sales</th>\n <th>revenue</th>\n <th>date</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>50</td>\n <td>14.0</td>\n <td>week1</td>\n </tr>\n <tr>\n <th>1</th>\n <td>20</td>\n <td>8.0</td>\n <td>week1</td>\n </tr>\n <tr>\n <th>2</th>\n <td>35</td>\n <td>16.0</td>\n <td>week2</td>\n </tr>\n <tr>\n <th>3</th>\n <td>10</td>\n <td>NaN</td>\n <td>week3</td>\n </tr>\n <tr>\n <th>4</th>\n <td>20</td>\n <td>7.0</td>\n <td>week3</td>\n </tr>\n <tr>\n <th>5</th>\n <td>40</td>\n <td>9.0</td>\n <td>week3</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " sales revenue date\n0 50 14.0 week1\n1 20 8.0 week1\n2 35 16.0 week2\n3 10 NaN week3\n4 20 7.0 week3\n5 40 9.0 week3" | |
}, | |
"metadata": {}, | |
"execution_count": 32 | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "### DataFrame.lookup()\n\n'fancy indexing' function for DataFrame\n\nGiven equal-length arrays of row and column labels, return an array of the values corresponding to each (row, col) pair\n\n#### example" | |
}, | |
{ | |
"metadata": { | |
"trusted": false | |
}, | |
"cell_type": "code", | |
"source": "df", | |
"execution_count": 33, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>sales</th>\n <th>category</th>\n <th>revenue</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>week1</td>\n <td>50</td>\n <td>a</td>\n <td>14.0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>week1</td>\n <td>20</td>\n <td>b</td>\n <td>8.0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>week2</td>\n <td>35</td>\n <td>a</td>\n <td>16.0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>week3</td>\n <td>10</td>\n <td>a</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>4</th>\n <td>week3</td>\n <td>20</td>\n <td>b</td>\n <td>7.0</td>\n </tr>\n <tr>\n <th>5</th>\n <td>week3</td>\n <td>40</td>\n <td>a</td>\n <td>9.0</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " date sales category revenue\n0 week1 50 a 14.0\n1 week1 20 b 8.0\n2 week2 35 a 16.0\n3 week3 10 a NaN\n4 week3 20 b 7.0\n5 week3 40 a 9.0" | |
}, | |
"metadata": {}, | |
"execution_count": 33 | |
} | |
] | |
}, | |
{ | |
"metadata": { | |
"trusted": false | |
}, | |
"cell_type": "code", | |
"source": "df.lookup([0,1,2],['sales','category','revenue'])", | |
"execution_count": 34, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/plain": "array([50, 'b', 16.0], dtype=object)" | |
}, | |
"metadata": {}, | |
"execution_count": 34 | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "## End\n\nThey are only a means to the end of learning how to do good data analysis " | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"name": "python3", | |
"display_name": "Python 3", | |
"language": "python" | |
}, | |
"gist": { | |
"id": "", | |
"data": { | |
"description": "select_data.ipynb", | |
"public": true | |
} | |
}, | |
"language_info": { | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"version": "3.5.4", | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment