jing-jin-mc/Pandas - Select Data.ipynb

## Pandas - Select Data.ipynb
{
  "cells": [
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "## Select data\n### Functions:\n* iloc\n* sample\n* isin\n* where\n* query\n* not\n* get\n* lookup"
    },
    {
      "metadata": {
        "trusted": false,
        "collapsed": true
      },
      "cell_type": "code",
      "source": "import pandas as pd ",
      "execution_count": 1,
      "outputs": []
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "#### create a test dataset to demo functions"
    },
    {
      "metadata": {
        "slideshow": {
          "slide_type": "slide"
        },
        "trusted": false
      },
      "cell_type": "code",
      "source": "df = pd.DataFrame(data = {'date':['week1','week1','week2','week3','week3','week3'],\n                          'sales':[50,20,35,10,20,40],\n                          'category':['a','b','a','a','b','a'],\n                          'revenue':[14,8,16,None,7,9]\n                         })\ndf.date = df.date.astype('string')\ndf.category = df.category.astype('string')\ndf",
      "execution_count": 2,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>date</th>\n      <th>sales</th>\n      <th>category</th>\n      <th>revenue</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>week1</td>\n      <td>50</td>\n      <td>a</td>\n      <td>14.0</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>week1</td>\n      <td>20</td>\n      <td>b</td>\n      <td>8.0</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>week2</td>\n      <td>35</td>\n      <td>a</td>\n      <td>16.0</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>week3</td>\n      <td>10</td>\n      <td>a</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>week3</td>\n      <td>20</td>\n      <td>b</td>\n      <td>7.0</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>week3</td>\n      <td>40</td>\n      <td>a</td>\n      <td>9.0</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "    date  sales category  revenue\n0  week1     50        a     14.0\n1  week1     20        b      8.0\n2  week2     35        a     16.0\n3  week3     10        a      NaN\n4  week3     20        b      7.0\n5  week3     40        a      9.0"
          },
          "metadata": {},
          "execution_count": 2
        }
      ]
    },
    {
      "metadata": {
        "trusted": false
      },
      "cell_type": "code",
      "source": "df.info()",
      "execution_count": 3,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": "<class 'pandas.core.frame.DataFrame'>\nRangeIndex: 6 entries, 0 to 5\nData columns (total 4 columns):\n #   Column    Non-Null Count  Dtype  \n---  ------    --------------  -----  \n 0   date      6 non-null      string \n 1   sales     6 non-null      int64  \n 2   category  6 non-null      string \n 3   revenue   5 non-null      float64\ndtypes: float64(1), int64(1), string(2)\nmemory usage: 320.0 bytes\n"
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "### DataFrame.iloc[x]\n\nPurely integer-location based indexing for selection by position\n\nHere you are using square brackets since what you put in [ ] is list-like data, e.g 1:3 means index from 1 to 3\n\n#### example 1: \nselect value by integer index"
    },
    {
      "metadata": {
        "trusted": false
      },
      "cell_type": "code",
      "source": "df.iloc[0]",
      "execution_count": 4,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": "date        week1\nsales          50\ncategory        a\nrevenue        14\nName: 0, dtype: object"
          },
          "metadata": {},
          "execution_count": 4
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "#### example 2:\nrows from n to m (m is not included)"
    },
    {
      "metadata": {
        "trusted": false
      },
      "cell_type": "code",
      "source": "df.iloc[1:3]",
      "execution_count": 5,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>date</th>\n      <th>sales</th>\n      <th>category</th>\n      <th>revenue</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>1</th>\n      <td>week1</td>\n      <td>20</td>\n      <td>b</td>\n      <td>8.0</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>week2</td>\n      <td>35</td>\n      <td>a</td>\n      <td>16.0</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "    date  sales category  revenue\n1  week1     20        b      8.0\n2  week2     35        a     16.0"
          },
          "metadata": {},
          "execution_count": 5
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "#### excercise: \nSelect the third row (index:2) to the last row (not include the last row). The last index can use -1 when you don't know how long the dataset"
    },
    {
      "metadata": {
        "trusted": false
      },
      "cell_type": "code",
      "source": "df.iloc[2:-1]",
      "execution_count": 6,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>date</th>\n      <th>sales</th>\n      <th>category</th>\n      <th>revenue</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>2</th>\n      <td>week2</td>\n      <td>35</td>\n      <td>a</td>\n      <td>16.0</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>week3</td>\n      <td>10</td>\n      <td>a</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>week3</td>\n      <td>20</td>\n      <td>b</td>\n      <td>7.0</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "    date  sales category  revenue\n2  week2     35        a     16.0\n3  week3     10        a      NaN\n4  week3     20        b      7.0"
          },
          "metadata": {},
          "execution_count": 6
        }
      ]
    },
    {
      "metadata": {
        "trusted": false
      },
      "cell_type": "code",
      "source": "df.iloc[2:]",
      "execution_count": 7,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>date</th>\n      <th>sales</th>\n      <th>category</th>\n      <th>revenue</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>2</th>\n      <td>week2</td>\n      <td>35</td>\n      <td>a</td>\n      <td>16.0</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>week3</td>\n      <td>10</td>\n      <td>a</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>week3</td>\n      <td>20</td>\n      <td>b</td>\n      <td>7.0</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>week3</td>\n      <td>40</td>\n      <td>a</td>\n      <td>9.0</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "    date  sales category  revenue\n2  week2     35        a     16.0\n3  week3     10        a      NaN\n4  week3     20        b      7.0\n5  week3     40        a      9.0"
          },
          "metadata": {},
          "execution_count": 7
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "### FYI: Using . after a dataframe means you are accessing attributes of a DataFrame object in Pandas.\n* All the following functions are attribute fucntions of DataFrame. \n* All the functions in python are defined in a format: \n    * function_name(Parameters)\n    * Parameters can be optional"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "### DataFrame.sample( n, frac, replace, weights, random_state, axis)\n\nReturn a random sample of items from an axis of object \n\nParameters:\n\n* n: int, optional. Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.\n* frac: float, optioanl. Fraction of axis items to return. Cannot be used with n.\n* replace: bool, default False. Allow or disallow sampling of the same row more than once.\n* weights: str or ndarray-like, optional\n* random_state: int, array-like, BitGenerator, np.random.RandomState, optional. Used for reproducibility.\n* axis: {0 or ‘index’, 1 or ‘columns’, None}, default None\n\n#### example 1: \nGet n random rows"
    },
    {
      "metadata": {
        "trusted": false
      },
      "cell_type": "code",
      "source": "df.sample(n=3, random_state = 1)",
      "execution_count": 8,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>date</th>\n      <th>sales</th>\n      <th>category</th>\n      <th>revenue</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>2</th>\n      <td>week2</td>\n      <td>35</td>\n      <td>a</td>\n      <td>16.0</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>week1</td>\n      <td>20</td>\n      <td>b</td>\n      <td>8.0</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>week3</td>\n      <td>20</td>\n      <td>b</td>\n      <td>7.0</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "    date  sales category  revenue\n2  week2     35        a     16.0\n1  week1     20        b      8.0\n4  week3     20        b      7.0"
          },
          "metadata": {},
          "execution_count": 8
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "#### example 2: \nGet 50% sample rows"
    },
    {
      "metadata": {
        "trusted": false
      },
      "cell_type": "code",
      "source": "df.sample(frac=0.5, random_state = 1)",
      "execution_count": 9,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>date</th>\n      <th>sales</th>\n      <th>category</th>\n      <th>revenue</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>2</th>\n      <td>week2</td>\n      <td>35</td>\n      <td>a</td>\n      <td>16.0</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>week1</td>\n      <td>20</td>\n      <td>b</td>\n      <td>8.0</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>week3</td>\n      <td>20</td>\n      <td>b</td>\n      <td>7.0</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "    date  sales category  revenue\n2  week2     35        a     16.0\n1  week1     20        b      8.0\n4  week3     20        b      7.0"
          },
          "metadata": {},
          "execution_count": 9
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "#### example 3: \nUsing one column in the dataframe as weights. Rows with larger value in that column are more likely to be sampled."
    },
    {
      "metadata": {
        "trusted": false
      },
      "cell_type": "code",
      "source": "df.sample(n=3, random_state = 1, weights = 'revenue')",
      "execution_count": 10,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>date</th>\n      <th>sales</th>\n      <th>category</th>\n      <th>revenue</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>2</th>\n      <td>week2</td>\n      <td>35</td>\n      <td>a</td>\n      <td>16.0</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>week3</td>\n      <td>20</td>\n      <td>b</td>\n      <td>7.0</td>\n    </tr>\n    <tr>\n      <th>0</th>\n      <td>week1</td>\n      <td>50</td>\n      <td>a</td>\n      <td>14.0</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "    date  sales category  revenue\n2  week2     35        a     16.0\n4  week3     20        b      7.0\n0  week1     50        a     14.0"
          },
          "metadata": {},
          "execution_count": 10
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "#### excercise: \nuse random_state =1 and get 2 sample rows from the dataset."
    },
    {
      "metadata": {
        "trusted": false
      },
      "cell_type": "code",
      "source": "df.sample(n=2)",
      "execution_count": 11,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>date</th>\n      <th>sales</th>\n      <th>category</th>\n      <th>revenue</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>3</th>\n      <td>week3</td>\n      <td>10</td>\n      <td>a</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>week3</td>\n      <td>40</td>\n      <td>a</td>\n      <td>9.0</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "    date  sales category  revenue\n3  week3     10        a      NaN\n5  week3     40        a      9.0"
          },
          "metadata": {},
          "execution_count": 11
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "### DataFrame.isin(values)\n\nTo check whether each element in the dataframe is contained in values\n\nParamters: values\n* series or dict\n\nReturns: \n* dataframe of booleans"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "#### example 1: values in a series"
    },
    {
      "metadata": {
        "trusted": false
      },
      "cell_type": "code",
      "source": "df.isin([10,20])",
      "execution_count": 12,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>date</th>\n      <th>sales</th>\n      <th>category</th>\n      <th>revenue</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>False</td>\n      <td>True</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>False</td>\n      <td>True</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>False</td>\n      <td>True</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "    date  sales  category  revenue\n0  False  False     False    False\n1  False   True     False    False\n2  False  False     False    False\n3  False   True     False    False\n4  False   True     False    False\n5  False  False     False    False"
          },
          "metadata": {},
          "execution_count": 12
        }
      ]
    },
    {
      "metadata": {
        "trusted": false
      },
      "cell_type": "code",
      "source": "df.category.isin(['a','b'])",
      "execution_count": 13,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": "0    True\n1    True\n2    True\n3    True\n4    True\n5    True\nName: category, dtype: bool"
          },
          "metadata": {},
          "execution_count": 13
        }
      ]
    },
    {
      "metadata": {
        "trusted": false
      },
      "cell_type": "code",
      "source": "pd.DataFrame(df['category'].isin(['a','b']))",
      "execution_count": 14,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>category</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>True</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>True</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>True</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>True</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>True</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>True</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "   category\n0      True\n1      True\n2      True\n3      True\n4      True\n5      True"
          },
          "metadata": {},
          "execution_count": 14
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "#### example 2: values in a dict"
    },
    {
      "metadata": {
        "trusted": false
      },
      "cell_type": "code",
      "source": "df",
      "execution_count": 15,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>date</th>\n      <th>sales</th>\n      <th>category</th>\n      <th>revenue</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>week1</td>\n      <td>50</td>\n      <td>a</td>\n      <td>14.0</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>week1</td>\n      <td>20</td>\n      <td>b</td>\n      <td>8.0</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>week2</td>\n      <td>35</td>\n      <td>a</td>\n      <td>16.0</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>week3</td>\n      <td>10</td>\n      <td>a</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>week3</td>\n      <td>20</td>\n      <td>b</td>\n      <td>7.0</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>week3</td>\n      <td>40</td>\n      <td>a</td>\n      <td>9.0</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "    date  sales category  revenue\n0  week1     50        a     14.0\n1  week1     20        b      8.0\n2  week2     35        a     16.0\n3  week3     10        a      NaN\n4  week3     20        b      7.0\n5  week3     40        a      9.0"
          },
          "metadata": {},
          "execution_count": 15
        }
      ]
    },
    {
      "metadata": {
        "trusted": false
      },
      "cell_type": "code",
      "source": "df.isin({'sales':[10,20],\n         'category':['c','d']\n        }\n       )",
      "execution_count": 16,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>date</th>\n      <th>sales</th>\n      <th>category</th>\n      <th>revenue</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>False</td>\n      <td>True</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>False</td>\n      <td>True</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>False</td>\n      <td>True</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "    date  sales  category  revenue\n0  False  False     False    False\n1  False   True     False    False\n2  False  False     False    False\n3  False   True     False    False\n4  False   True     False    False\n5  False  False     False    False"
          },
          "metadata": {},
          "execution_count": 16
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "#### excercise: \ncheck whether the data is before week5. ['week1','week2','week3','week4']"
    },
    {
      "metadata": {
        "trusted": false
      },
      "cell_type": "code",
      "source": "df.isin({'date':['week1','week2','week3','week4']\n        }\n       )",
      "execution_count": 17,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>date</th>\n      <th>sales</th>\n      <th>category</th>\n      <th>revenue</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>True</td>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>True</td>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>True</td>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>True</td>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>True</td>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>True</td>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "   date  sales  category  revenue\n0  True  False     False    False\n1  True  False     False    False\n2  True  False     False    False\n3  True  False     False    False\n4  True  False     False    False\n5  True  False     False    False"
          },
          "metadata": {},
          "execution_count": 17
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "### DataFrame.where( cond, other = nan, ...)\n\nReplace values where the condition is False. This fucntion is an application of the if-then idiom.\n\nParameters:\n\n* cond:bool Series/DataFrame, array-like, or callable. Where cond is True, keep the original value. Where False, replace with corresponding value from other.\n* other: scalar, Series/DataFrame, or callable\n\n#### example 1: \nfind rows meet the condition and replace with 0"
    },
    {
      "metadata": {
        "trusted": false
      },
      "cell_type": "code",
      "source": "df.info()",
      "execution_count": 18,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": "<class 'pandas.core.frame.DataFrame'>\nRangeIndex: 6 entries, 0 to 5\nData columns (total 4 columns):\n #   Column    Non-Null Count  Dtype  \n---  ------    --------------  -----  \n 0   date      6 non-null      string \n 1   sales     6 non-null      int64  \n 2   category  6 non-null      string \n 3   revenue   5 non-null      float64\ndtypes: float64(1), int64(1), string(2)\nmemory usage: 320.0 bytes\n"
        }
      ]
    },
    {
      "metadata": {
        "trusted": false
      },
      "cell_type": "code",
      "source": "df.sales.where(df['sales']>20,0)",
      "execution_count": 19,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": "0    50\n1     0\n2    35\n3     0\n4     0\n5    40\nName: sales, dtype: int64"
          },
          "metadata": {},
          "execution_count": 19
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "#### excercise: \nfind data for 'a' category and set data as None for other categories "
    },
    {
      "metadata": {
        "trusted": false,
        "collapsed": true
      },
      "cell_type": "code",
      "source": "df_eg = df.copy()",
      "execution_count": 20,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": false
      },
      "cell_type": "code",
      "source": "df_eg['category'] = df['category'].where(df['category']=='a',None)\ndf_eg",
      "execution_count": 21,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>date</th>\n      <th>sales</th>\n      <th>category</th>\n      <th>revenue</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>week1</td>\n      <td>50</td>\n      <td>a</td>\n      <td>14.0</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>week1</td>\n      <td>20</td>\n      <td>&lt;NA&gt;</td>\n      <td>8.0</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>week2</td>\n      <td>35</td>\n      <td>a</td>\n      <td>16.0</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>week3</td>\n      <td>10</td>\n      <td>a</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>week3</td>\n      <td>20</td>\n      <td>&lt;NA&gt;</td>\n      <td>7.0</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>week3</td>\n      <td>40</td>\n      <td>a</td>\n      <td>9.0</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "    date  sales category  revenue\n0  week1     50        a     14.0\n1  week1     20     <NA>      8.0\n2  week2     35        a     16.0\n3  week3     10        a      NaN\n4  week3     20     <NA>      7.0\n5  week3     40        a      9.0"
          },
          "metadata": {},
          "execution_count": 21
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "###  DataFrame.query(expr)\nUsed to query the columns of a Dataframe with a boolean expression\n\nParameters: expr\n* query expresion in string format\n\nReturns: \n* DataFrame resulting from the provided query expression\n\n#### example 1:"
    },
    {
      "metadata": {
        "trusted": false
      },
      "cell_type": "code",
      "source": "df.query('sales > revenue')",
      "execution_count": 22,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>date</th>\n      <th>sales</th>\n      <th>category</th>\n      <th>revenue</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>week1</td>\n      <td>50</td>\n      <td>a</td>\n      <td>14.0</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>week1</td>\n      <td>20</td>\n      <td>b</td>\n      <td>8.0</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>week2</td>\n      <td>35</td>\n      <td>a</td>\n      <td>16.0</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>week3</td>\n      <td>20</td>\n      <td>b</td>\n      <td>7.0</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>week3</td>\n      <td>40</td>\n      <td>a</td>\n      <td>9.0</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "    date  sales category  revenue\n0  week1     50        a     14.0\n1  week1     20        b      8.0\n2  week2     35        a     16.0\n4  week3     20        b      7.0\n5  week3     40        a      9.0"
          },
          "metadata": {},
          "execution_count": 22
        }
      ]
    },
    {
      "metadata": {
        "trusted": false
      },
      "cell_type": "code",
      "source": "df[df.sales>df.revenue]",
      "execution_count": 23,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>date</th>\n      <th>sales</th>\n      <th>category</th>\n      <th>revenue</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>week1</td>\n      <td>50</td>\n      <td>a</td>\n      <td>14.0</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>week1</td>\n      <td>20</td>\n      <td>b</td>\n      <td>8.0</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>week2</td>\n      <td>35</td>\n      <td>a</td>\n      <td>16.0</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>week3</td>\n      <td>20</td>\n      <td>b</td>\n      <td>7.0</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>week3</td>\n      <td>40</td>\n      <td>a</td>\n      <td>9.0</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "    date  sales category  revenue\n0  week1     50        a     14.0\n1  week1     20        b      8.0\n2  week2     35        a     16.0\n4  week3     20        b      7.0\n5  week3     40        a      9.0"
          },
          "metadata": {},
          "execution_count": 23
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "#### excercise: \nselect out data for category 'a' in week1. Symbol & is used for 'and' operation. | is for 'or'"
    },
    {
      "metadata": {
        "trusted": false
      },
      "cell_type": "code",
      "source": "df[(df.category =='a') & (df.date =='week1')]",
      "execution_count": 24,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>date</th>\n      <th>sales</th>\n      <th>category</th>\n      <th>revenue</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>week1</td>\n      <td>50</td>\n      <td>a</td>\n      <td>14.0</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "    date  sales category  revenue\n0  week1     50        a     14.0"
          },
          "metadata": {},
          "execution_count": 24
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "### DataFrame.notna()\n\nDetect existing (non-missing) values. opposite function is .isna()\n\nReturn a boolean same-sized object indicating if the values are not NA. Non-missing values get mapped to True. \n\nIn python, pandas.dataframe.notna() and pandas.dataframe.isna() are doing exactly same thing as pandas.dataframe.notnull() and pandas.dataframe.isnull()"
    },
    {
      "metadata": {
        "trusted": false
      },
      "cell_type": "code",
      "source": "df.notna()",
      "execution_count": 25,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>date</th>\n      <th>sales</th>\n      <th>category</th>\n      <th>revenue</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>True</td>\n      <td>True</td>\n      <td>True</td>\n      <td>True</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>True</td>\n      <td>True</td>\n      <td>True</td>\n      <td>True</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>True</td>\n      <td>True</td>\n      <td>True</td>\n      <td>True</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>True</td>\n      <td>True</td>\n      <td>True</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>True</td>\n      <td>True</td>\n      <td>True</td>\n      <td>True</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>True</td>\n      <td>True</td>\n      <td>True</td>\n      <td>True</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "   date  sales  category  revenue\n0  True   True      True     True\n1  True   True      True     True\n2  True   True      True     True\n3  True   True      True    False\n4  True   True      True     True\n5  True   True      True     True"
          },
          "metadata": {},
          "execution_count": 25
        }
      ]
    },
    {
      "metadata": {
        "trusted": false
      },
      "cell_type": "code",
      "source": "df.isna()",
      "execution_count": 26,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>date</th>\n      <th>sales</th>\n      <th>category</th>\n      <th>revenue</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n      <td>True</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "    date  sales  category  revenue\n0  False  False     False    False\n1  False  False     False    False\n2  False  False     False    False\n3  False  False     False     True\n4  False  False     False    False\n5  False  False     False    False"
          },
          "metadata": {},
          "execution_count": 26
        }
      ]
    },
    {
      "metadata": {
        "trusted": false
      },
      "cell_type": "code",
      "source": "df.isnull()",
      "execution_count": 27,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>date</th>\n      <th>sales</th>\n      <th>category</th>\n      <th>revenue</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n      <td>True</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "    date  sales  category  revenue\n0  False  False     False    False\n1  False  False     False    False\n2  False  False     False    False\n3  False  False     False     True\n4  False  False     False    False\n5  False  False     False    False"
          },
          "metadata": {},
          "execution_count": 27
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "### Pandas.notna(obj)\n\nDetect non-missing values for an array-like object. Opposite fucntion pandas.isna(obj)\n\nsimilar function like pandas.isnull(), pandas.notnull()"
    },
    {
      "metadata": {
        "trusted": false
      },
      "cell_type": "code",
      "source": "pd.notna(df)",
      "execution_count": 28,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>date</th>\n      <th>sales</th>\n      <th>category</th>\n      <th>revenue</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>True</td>\n      <td>True</td>\n      <td>True</td>\n      <td>True</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>True</td>\n      <td>True</td>\n      <td>True</td>\n      <td>True</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>True</td>\n      <td>True</td>\n      <td>True</td>\n      <td>True</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>True</td>\n      <td>True</td>\n      <td>True</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>True</td>\n      <td>True</td>\n      <td>True</td>\n      <td>True</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>True</td>\n      <td>True</td>\n      <td>True</td>\n      <td>True</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "   date  sales  category  revenue\n0  True   True      True     True\n1  True   True      True     True\n2  True   True      True     True\n3  True   True      True    False\n4  True   True      True     True\n5  True   True      True     True"
          },
          "metadata": {},
          "execution_count": 28
        }
      ]
    },
    {
      "metadata": {
        "trusted": false
      },
      "cell_type": "code",
      "source": "pd.isna(df)",
      "execution_count": 29,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>date</th>\n      <th>sales</th>\n      <th>category</th>\n      <th>revenue</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n      <td>True</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "    date  sales  category  revenue\n0  False  False     False    False\n1  False  False     False    False\n2  False  False     False    False\n3  False  False     False     True\n4  False  False     False    False\n5  False  False     False    False"
          },
          "metadata": {},
          "execution_count": 29
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "###  DataFrame.get(key)\nGet item from object for given key, same function as df.['key'] and df.key\n\nParameters: key\n* query expresion in string format\n\nReturns: \n* value same type as items contained in object\n\n#### example :"
    },
    {
      "metadata": {
        "trusted": false
      },
      "cell_type": "code",
      "source": "df.get('sales')",
      "execution_count": 30,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": "0    50\n1    20\n2    35\n3    10\n4    20\n5    40\nName: sales, dtype: int64"
          },
          "metadata": {},
          "execution_count": 30
        }
      ]
    },
    {
      "metadata": {
        "trusted": false
      },
      "cell_type": "code",
      "source": "df.get(['sales','revenue'])",
      "execution_count": 31,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>sales</th>\n      <th>revenue</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>50</td>\n      <td>14.0</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>20</td>\n      <td>8.0</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>35</td>\n      <td>16.0</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>10</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>20</td>\n      <td>7.0</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>40</td>\n      <td>9.0</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "   sales  revenue\n0     50     14.0\n1     20      8.0\n2     35     16.0\n3     10      NaN\n4     20      7.0\n5     40      9.0"
          },
          "metadata": {},
          "execution_count": 31
        }
      ]
    },
    {
      "metadata": {
        "trusted": false
      },
      "cell_type": "code",
      "source": "df[['sales','revenue','date']]",
      "execution_count": 32,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>sales</th>\n      <th>revenue</th>\n      <th>date</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>50</td>\n      <td>14.0</td>\n      <td>week1</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>20</td>\n      <td>8.0</td>\n      <td>week1</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>35</td>\n      <td>16.0</td>\n      <td>week2</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>10</td>\n      <td>NaN</td>\n      <td>week3</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>20</td>\n      <td>7.0</td>\n      <td>week3</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>40</td>\n      <td>9.0</td>\n      <td>week3</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "   sales  revenue   date\n0     50     14.0  week1\n1     20      8.0  week1\n2     35     16.0  week2\n3     10      NaN  week3\n4     20      7.0  week3\n5     40      9.0  week3"
          },
          "metadata": {},
          "execution_count": 32
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "### DataFrame.lookup()\n\n'fancy indexing' function for DataFrame\n\nGiven equal-length arrays of row and column labels, return an array of the values corresponding to each (row, col) pair\n\n#### example"
    },
    {
      "metadata": {
        "trusted": false
      },
      "cell_type": "code",
      "source": "df",
      "execution_count": 33,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>date</th>\n      <th>sales</th>\n      <th>category</th>\n      <th>revenue</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>week1</td>\n      <td>50</td>\n      <td>a</td>\n      <td>14.0</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>week1</td>\n      <td>20</td>\n      <td>b</td>\n      <td>8.0</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>week2</td>\n      <td>35</td>\n      <td>a</td>\n      <td>16.0</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>week3</td>\n      <td>10</td>\n      <td>a</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>week3</td>\n      <td>20</td>\n      <td>b</td>\n      <td>7.0</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>week3</td>\n      <td>40</td>\n      <td>a</td>\n      <td>9.0</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "    date  sales category  revenue\n0  week1     50        a     14.0\n1  week1     20        b      8.0\n2  week2     35        a     16.0\n3  week3     10        a      NaN\n4  week3     20        b      7.0\n5  week3     40        a      9.0"
          },
          "metadata": {},
          "execution_count": 33
        }
      ]
    },
    {
      "metadata": {
        "trusted": false
      },
      "cell_type": "code",
      "source": "df.lookup([0,1,2],['sales','category','revenue'])",
      "execution_count": 34,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": "array([50, 'b', 16.0], dtype=object)"
          },
          "metadata": {},
          "execution_count": 34
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "## End\n\nThey are only a means to the end of learning how to do good data analysis "
    }
  ],
  "metadata": {
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3",
      "language": "python"
    },
    "gist": {
      "id": "",
      "data": {
        "description": "select_data.ipynb",
        "public": true
      }
    },
    "language_info": {
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "version": "3.5.4",
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 2
}