jing-jin-mc/Pandas - Common pitfalls and tips.ipynb

## Pandas - Common pitfalls and tips.ipynb
{
  "cells": [
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "## Common errors and mistakes\n* Key Error\n* Index Error\n* Attribute Error\n\n## Best practices\n* Get or set values quickly\n* Leftover DataFrames\n* Set data type for the columns in the dataset\n* Check and know your data before calculation"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "#### create a test dataset to demo functions"
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "import pandas as pd ",
      "execution_count": 1,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "df = pd.DataFrame(data = {'date':['2020-01-03',\n                                  '2020-03-04',\n                                  '2020-04-05',\n                                  '2020-02-01',\n                                  '2020-02-09',\n                                  '2020-03-12'\n                                 ],\n                          'name':['Peter','Max','Ella','Maria','Tom','Emma'],\n                          'rate':[4.5,3.5,4,4.5,3,3.5],\n                          'review':['It is a really nice restaurant! I would recommend it.',\n                                     'Good price. Not really satisfied with the service. Not recommend.',\n                                     'Like it. Staff there was quite nice. Recommend.',\n                                     'Looooove it!!!!! Really good! Highly recommend!',\n                                    '',\n                                    'Not so bad. I will probably come back again'\n                                    ]\n                         })\ndf",
      "execution_count": 86,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 86,
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>date</th>\n      <th>name</th>\n      <th>rate</th>\n      <th>review</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>2020-01-03</td>\n      <td>Peter</td>\n      <td>4.5</td>\n      <td>It is a really nice restaurant! I would recomm...</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>2020-03-04</td>\n      <td>Max</td>\n      <td>3.5</td>\n      <td>Good price. Not really satisfied with the serv...</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>2020-04-05</td>\n      <td>Ella</td>\n      <td>4.0</td>\n      <td>Like it. Staff there was quite nice. Recommend.</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>2020-02-01</td>\n      <td>Maria</td>\n      <td>4.5</td>\n      <td>Looooove it!!!!! Really good! Highly recommend!</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>2020-02-09</td>\n      <td>Tom</td>\n      <td>3.0</td>\n      <td></td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>2020-03-12</td>\n      <td>Emma</td>\n      <td>3.5</td>\n      <td>Not so bad. I will probably come back again</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "         date   name  rate                                             review\n0  2020-01-03  Peter   4.5  It is a really nice restaurant! I would recomm...\n1  2020-03-04    Max   3.5  Good price. Not really satisfied with the serv...\n2  2020-04-05   Ella   4.0    Like it. Staff there was quite nice. Recommend.\n3  2020-02-01  Maria   4.5    Looooove it!!!!! Really good! Highly recommend!\n4  2020-02-09    Tom   3.0                                                   \n5  2020-03-12   Emma   3.5        Not so bad. I will probably come back again"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "df.info()",
      "execution_count": 87,
      "outputs": [
        {
          "output_type": "stream",
          "text": "<class 'pandas.core.frame.DataFrame'>\nRangeIndex: 6 entries, 0 to 5\nData columns (total 4 columns):\ndate      6 non-null object\nname      6 non-null object\nrate      6 non-null float64\nreview    6 non-null object\ndtypes: float64(1), object(3)\nmemory usage: 272.0+ bytes\n",
          "name": "stdout"
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "### Key Error\nA Python KeyError exception  is what is raised when you try to access a key that isn’t in a dataset\n\n* Index or column label cannot be found in the dataframe"
    },
    {
      "metadata": {
        "trusted": true,
        "scrolled": true
      },
      "cell_type": "code",
      "source": "### it should be df['date']: column name is case sensitive \n#df['Date']",
      "execution_count": 54,
      "outputs": []
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "* Confuse index name with column name "
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "### 'date' is not index, it is one column of the dataframe\n#df.loc['2020-01-03']",
      "execution_count": 56,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "### when 'date' is the index of the dataframe, it works\ndf_eg = df.set_index('date')\ndf_eg",
      "execution_count": 97,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 97,
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>name</th>\n      <th>rate</th>\n      <th>review</th>\n    </tr>\n    <tr>\n      <th>date</th>\n      <th></th>\n      <th></th>\n      <th></th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>2020-01-03</th>\n      <td>Peter</td>\n      <td>4.5</td>\n      <td>It is a really nice restaurant! I would recomm...</td>\n    </tr>\n    <tr>\n      <th>2020-03-04</th>\n      <td>Max</td>\n      <td>3.5</td>\n      <td>Good price. Not really satisfied with the serv...</td>\n    </tr>\n    <tr>\n      <th>2020-04-05</th>\n      <td>Ella</td>\n      <td>4.0</td>\n      <td>Like it. Staff there was quite nice. Recommend.</td>\n    </tr>\n    <tr>\n      <th>2020-02-01</th>\n      <td>Maria</td>\n      <td>4.5</td>\n      <td>Looooove it!!!!! Really good! Highly recommend!</td>\n    </tr>\n    <tr>\n      <th>2020-02-09</th>\n      <td>Tom</td>\n      <td>3.0</td>\n      <td></td>\n    </tr>\n    <tr>\n      <th>2020-03-12</th>\n      <td>Emma</td>\n      <td>3.5</td>\n      <td>Not so bad. I will probably come back again</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "             name  rate                                             review\ndate                                                                      \n2020-01-03  Peter   4.5  It is a really nice restaurant! I would recomm...\n2020-03-04    Max   3.5  Good price. Not really satisfied with the serv...\n2020-04-05   Ella   4.0    Like it. Staff there was quite nice. Recommend.\n2020-02-01  Maria   4.5    Looooove it!!!!! Really good! Highly recommend!\n2020-02-09    Tom   3.0                                                   \n2020-03-12   Emma   3.5        Not so bad. I will probably come back again"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "df_eg.loc['2020-01-03']",
      "execution_count": 58,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 58,
          "data": {
            "text/plain": "name                                                  Peter\nrate                                                    4.5\nreview    It is a really nice restaurant! I would recomm...\nName: 2020-01-03, dtype: object"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "* Cannot find the index"
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "### '2020-01-04' is not one of the index. The dataframe has index : \n###'2020-01-03','2020-03-04','2020-04-05','2020-02-01','2020-02-09','2020-03-12'\n#df_eg.loc['2020-01-04']",
      "execution_count": 62,
      "outputs": []
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "### Index Error\nIt is raised whenever attempting to access an index that is outside the bounds of a index list"
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "### df has index from 0 to 5, 6 is out of this list bound.\n#df.iloc[6]",
      "execution_count": 64,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "### you could using -1 to access the last entry of a dataframe\ndf.iloc[-1]",
      "execution_count": 65,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 65,
          "data": {
            "text/plain": "date                                       2020-03-12\nname                                             Emma\nrate                                              3.5\nreview    Not so bad. I will probably come back again\nName: 5, dtype: object"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "### Attribute Error \nAttribute errors in Python are generally raised when you try to access or call an attribute that a particular object type doesn’t possess.\n\n* .str: Vectorized string functions for Series and Index."
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "### missing accessor str since split is an function for a string not a serie\n#df['review'].split()",
      "execution_count": 69,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "df['review'].str.split()",
      "execution_count": 70,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 70,
          "data": {
            "text/plain": "0    [It, is, a, really, nice, restaurant!, I, woul...\n1    [Good, price., Not, really, satisfied, with, t...\n2    [Like, it., Staff, there, was, quite, nice., R...\n3    [Looooove, it!!!!!, Really, good!, Highly, rec...\n4                                                   []\n5    [Not, so, bad., I, will, probably, come, back,...\nName: review, dtype: object"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "* .dt: Accessor object for datetimelike properties of the Series values."
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "### the serie is not the datetimelike property. It is a stringlike serie\n#df.date.dt.year",
      "execution_count": 81,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "df.info()",
      "execution_count": 88,
      "outputs": [
        {
          "output_type": "stream",
          "text": "<class 'pandas.core.frame.DataFrame'>\nRangeIndex: 6 entries, 0 to 5\nData columns (total 4 columns):\ndate      6 non-null object\nname      6 non-null object\nrate      6 non-null float64\nreview    6 non-null object\ndtypes: float64(1), object(3)\nmemory usage: 272.0+ bytes\n",
          "name": "stdout"
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "### set the data type to datetime64[ns]\ndf.date = df.date.astype('datetime64[ns]')",
      "execution_count": 90,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "### another way of set the date type into datetime\n# df.date = pd.to_datetime(df.date)",
      "execution_count": 93,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "df.info()",
      "execution_count": 91,
      "outputs": [
        {
          "output_type": "stream",
          "text": "<class 'pandas.core.frame.DataFrame'>\nRangeIndex: 6 entries, 0 to 5\nData columns (total 4 columns):\ndate      6 non-null datetime64[ns]\nname      6 non-null object\nrate      6 non-null float64\nreview    6 non-null object\ndtypes: datetime64[ns](1), float64(1), object(2)\nmemory usage: 272.0+ bytes\n",
          "name": "stdout"
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "df.date.dt.year",
      "execution_count": 92,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 92,
          "data": {
            "text/plain": "0    2020\n1    2020\n2    2020\n3    2020\n4    2020\n5    2020\nName: date, dtype: int64"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "### Get or set values quickly\nThere are so many ways to get and set values in Pandas, by index, value, label, etc."
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "### using index location to get data\ndf.iloc[1].iat[2]",
      "execution_count": 94,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 94,
          "data": {
            "text/plain": "3.5"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "### Leftover DataFrames\n\n* Don’t leave extra DataFrames sitting around in memory, if you’re using a laptop it’s hurting the performance of almost everything you do. If you’re on a server, it’s hurting the performance of everyone else on that server (or at some point, you’ll get an “out of memory” error)."
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "### delet the dataframe we created in between\ndel df_eg",
      "execution_count": 98,
      "outputs": []
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "* Chain together multiple DataFrame modifications in one line (so long as it doesn’t make your code unreadable)"
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "df.set_index('date').loc['2020-03-04']",
      "execution_count": 115,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 115,
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>name</th>\n      <th>rate</th>\n      <th>review</th>\n    </tr>\n    <tr>\n      <th>date</th>\n      <th></th>\n      <th></th>\n      <th></th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>2020-03-04</th>\n      <td>Max</td>\n      <td>3.5</td>\n      <td>Good price. Not really satisfied with the serv...</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "           name  rate                                             review\ndate                                                                    \n2020-03-04  Max   3.5  Good price. Not really satisfied with the serv..."
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "### Set data type for the columns in the dataset\n\n* Do not let pandas guess what kind of data type in your dataset. Set data type for your dataset to avoid errors and computation cost"
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "df.info()",
      "execution_count": 116,
      "outputs": [
        {
          "output_type": "stream",
          "text": "<class 'pandas.core.frame.DataFrame'>\nRangeIndex: 6 entries, 0 to 5\nData columns (total 4 columns):\ndate      6 non-null datetime64[ns]\nname      6 non-null object\nrate      6 non-null float64\nreview    6 non-null object\ndtypes: datetime64[ns](1), float64(1), object(2)\nmemory usage: 272.0+ bytes\n",
          "name": "stdout"
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "### Check and know your data before calculation\n\nCommonly used functions:\n\n* DataFrame.head()\n* DataFrame.sample()\n* DataFrame.tail()\n* DataFrame.describe()\n* DataFrame.info()"
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "df.head()",
      "execution_count": 117,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 117,
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>date</th>\n      <th>name</th>\n      <th>rate</th>\n      <th>review</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>2020-01-03</td>\n      <td>Peter</td>\n      <td>4.5</td>\n      <td>It is a really nice restaurant! I would recomm...</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>2020-03-04</td>\n      <td>Max</td>\n      <td>3.5</td>\n      <td>Good price. Not really satisfied with the serv...</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>2020-04-05</td>\n      <td>Ella</td>\n      <td>4.0</td>\n      <td>Like it. Staff there was quite nice. Recommend.</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>2020-02-01</td>\n      <td>Maria</td>\n      <td>4.5</td>\n      <td>Looooove it!!!!! Really good! Highly recommend!</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>2020-02-09</td>\n      <td>Tom</td>\n      <td>3.0</td>\n      <td></td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "        date   name  rate                                             review\n0 2020-01-03  Peter   4.5  It is a really nice restaurant! I would recomm...\n1 2020-03-04    Max   3.5  Good price. Not really satisfied with the serv...\n2 2020-04-05   Ella   4.0    Like it. Staff there was quite nice. Recommend.\n3 2020-02-01  Maria   4.5    Looooove it!!!!! Really good! Highly recommend!\n4 2020-02-09    Tom   3.0                                                   "
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "df.sample(n=5)",
      "execution_count": 118,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 118,
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>date</th>\n      <th>name</th>\n      <th>rate</th>\n      <th>review</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>3</th>\n      <td>2020-02-01</td>\n      <td>Maria</td>\n      <td>4.5</td>\n      <td>Looooove it!!!!! Really good! Highly recommend!</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>2020-02-09</td>\n      <td>Tom</td>\n      <td>3.0</td>\n      <td></td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>2020-03-12</td>\n      <td>Emma</td>\n      <td>3.5</td>\n      <td>Not so bad. I will probably come back again</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>2020-04-05</td>\n      <td>Ella</td>\n      <td>4.0</td>\n      <td>Like it. Staff there was quite nice. Recommend.</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>2020-03-04</td>\n      <td>Max</td>\n      <td>3.5</td>\n      <td>Good price. Not really satisfied with the serv...</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "        date   name  rate                                             review\n3 2020-02-01  Maria   4.5    Looooove it!!!!! Really good! Highly recommend!\n4 2020-02-09    Tom   3.0                                                   \n5 2020-03-12   Emma   3.5        Not so bad. I will probably come back again\n2 2020-04-05   Ella   4.0    Like it. Staff there was quite nice. Recommend.\n1 2020-03-04    Max   3.5  Good price. Not really satisfied with the serv..."
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "df.tail()",
      "execution_count": 119,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 119,
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>date</th>\n      <th>name</th>\n      <th>rate</th>\n      <th>review</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>1</th>\n      <td>2020-03-04</td>\n      <td>Max</td>\n      <td>3.5</td>\n      <td>Good price. Not really satisfied with the serv...</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>2020-04-05</td>\n      <td>Ella</td>\n      <td>4.0</td>\n      <td>Like it. Staff there was quite nice. Recommend.</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>2020-02-01</td>\n      <td>Maria</td>\n      <td>4.5</td>\n      <td>Looooove it!!!!! Really good! Highly recommend!</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>2020-02-09</td>\n      <td>Tom</td>\n      <td>3.0</td>\n      <td></td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>2020-03-12</td>\n      <td>Emma</td>\n      <td>3.5</td>\n      <td>Not so bad. I will probably come back again</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "        date   name  rate                                             review\n1 2020-03-04    Max   3.5  Good price. Not really satisfied with the serv...\n2 2020-04-05   Ella   4.0    Like it. Staff there was quite nice. Recommend.\n3 2020-02-01  Maria   4.5    Looooove it!!!!! Really good! Highly recommend!\n4 2020-02-09    Tom   3.0                                                   \n5 2020-03-12   Emma   3.5        Not so bad. I will probably come back again"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "df.describe()",
      "execution_count": 120,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 120,
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>rate</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>count</th>\n      <td>6.000000</td>\n    </tr>\n    <tr>\n      <th>mean</th>\n      <td>3.833333</td>\n    </tr>\n    <tr>\n      <th>std</th>\n      <td>0.605530</td>\n    </tr>\n    <tr>\n      <th>min</th>\n      <td>3.000000</td>\n    </tr>\n    <tr>\n      <th>25%</th>\n      <td>3.500000</td>\n    </tr>\n    <tr>\n      <th>50%</th>\n      <td>3.750000</td>\n    </tr>\n    <tr>\n      <th>75%</th>\n      <td>4.375000</td>\n    </tr>\n    <tr>\n      <th>max</th>\n      <td>4.500000</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "           rate\ncount  6.000000\nmean   3.833333\nstd    0.605530\nmin    3.000000\n25%    3.500000\n50%    3.750000\n75%    4.375000\nmax    4.500000"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "df.info()",
      "execution_count": 121,
      "outputs": [
        {
          "output_type": "stream",
          "text": "<class 'pandas.core.frame.DataFrame'>\nRangeIndex: 6 entries, 0 to 5\nData columns (total 4 columns):\ndate      6 non-null datetime64[ns]\nname      6 non-null object\nrate      6 non-null float64\nreview    6 non-null object\ndtypes: datetime64[ns](1), float64(1), object(2)\nmemory usage: 272.0+ bytes\n",
          "name": "stdout"
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "## End"
    }
  ],
  "metadata": {
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3",
      "language": "python"
    },
    "language_info": {
      "file_extension": ".py",
      "nbconvert_exporter": "python",
      "version": "3.5.4",
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "name": "python",
      "mimetype": "text/x-python",
      "pygments_lexer": "ipython3"
    },
    "gist": {
      "id": "c39bf6335c2d92f86aef393c6cf0c44b",
      "data": {
        "description": "Pandas - Common pitfalls and tips.ipynb",
        "public": false
      }
    },
    "_draft": {
      "nbviewer_url": "https://gist.github.com/c39bf6335c2d92f86aef393c6cf0c44b"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 4
}