jing-jin-mc/Pandas - Working with text data.ipynb

## Pandas - Working with text data.ipynb
{
  "cells": [
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "## Working with text data \n### Functions:\n* split\n* replace\n* extract\n* wrap\n* partition\n* swapcase\n* capitalize\n* rfind"
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "import pandas as pd ",
      "execution_count": 1,
      "outputs": []
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "#### create a test dataset to demo functions"
    },
    {
      "metadata": {
        "slideshow": {
          "slide_type": "slide"
        },
        "trusted": true
      },
      "cell_type": "code",
      "source": "df = pd.DataFrame(data = {'date':['2020-01-03',\n                                  '2020-03-04',\n                                  '2020-04-05',\n                                  '2020-02-01',\n                                  '2020-02-09',\n                                  '2020-03-12',\n                                  '2020-03-19'\n                                 ],\n                          'name':['Peter','Max','Ella','Maria','Tom','Emma','Lisa'],\n                          'rate':[4.5,3.5,4,4.5,3,3.5,5],\n                          'review':['It is a really nice restaurant! I would recommend it.',\n                                     'Good price. Not really satisfied with the service. Not recommend.',\n                                     'Like it. Staff there was quite nice. Recommend.',\n                                     'Looooove it!!!!! Really good! Highly recommend!',\n                                    '',\n                                    'Not so bad. I will probably come back again',\n                                    'Such a nice restaurant!'\n                                    ]\n                         })\ndf",
      "execution_count": 2,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 2,
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>date</th>\n      <th>name</th>\n      <th>rate</th>\n      <th>review</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>2020-01-03</td>\n      <td>Peter</td>\n      <td>4.5</td>\n      <td>It is a really nice restaurant! I would recomm...</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>2020-03-04</td>\n      <td>Max</td>\n      <td>3.5</td>\n      <td>Good price. Not really satisfied with the serv...</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>2020-04-05</td>\n      <td>Ella</td>\n      <td>4.0</td>\n      <td>Like it. Staff there was quite nice. Recommend.</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>2020-02-01</td>\n      <td>Maria</td>\n      <td>4.5</td>\n      <td>Looooove it!!!!! Really good! Highly recommend!</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>2020-02-09</td>\n      <td>Tom</td>\n      <td>3.0</td>\n      <td></td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>2020-03-12</td>\n      <td>Emma</td>\n      <td>3.5</td>\n      <td>Not so bad. I will probably come back again</td>\n    </tr>\n    <tr>\n      <th>6</th>\n      <td>2020-03-19</td>\n      <td>Lisa</td>\n      <td>5.0</td>\n      <td>Such a nice restaurant!</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "         date   name  rate                                             review\n0  2020-01-03  Peter   4.5  It is a really nice restaurant! I would recomm...\n1  2020-03-04    Max   3.5  Good price. Not really satisfied with the serv...\n2  2020-04-05   Ella   4.0    Like it. Staff there was quite nice. Recommend.\n3  2020-02-01  Maria   4.5    Looooove it!!!!! Really good! Highly recommend!\n4  2020-02-09    Tom   3.0                                                   \n5  2020-03-12   Emma   3.5        Not so bad. I will probably come back again\n6  2020-03-19   Lisa   5.0                            Such a nice restaurant!"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "df.info()",
      "execution_count": 3,
      "outputs": [
        {
          "output_type": "stream",
          "text": "<class 'pandas.core.frame.DataFrame'>\nRangeIndex: 7 entries, 0 to 6\nData columns (total 4 columns):\ndate      7 non-null object\nname      7 non-null object\nrate      7 non-null float64\nreview    7 non-null object\ndtypes: float64(1), object(3)\nmemory usage: 304.0+ bytes\n",
          "name": "stdout"
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "### Pandas.Series.str.split(pat, n, expand)\n\nSplits the string in the Series/Index from the beginning, at the specified separator/delimiter string.\n\n* pat: String or regular expression to split on. If not specified, split on whitespace.\n* n: int, limit number of splits in output. None, 0 and -1 will be interpreted as return all splits.\n* expand: bool, default False, indicating whether expand the split strings into separate columns.\n    * If True, return DataFrame/MultiIndex expanding dimensionality.\n    * If False, return Series/Index, containing lists of strings.\n\n#### example: \nwhen using n = 2, the string will be split two times into three sub strings "
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "df.review",
      "execution_count": 4,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 4,
          "data": {
            "text/plain": "0    It is a really nice restaurant! I would recomm...\n1    Good price. Not really satisfied with the serv...\n2      Like it. Staff there was quite nice. Recommend.\n3      Looooove it!!!!! Really good! Highly recommend!\n4                                                     \n5          Not so bad. I will probably come back again\n6                              Such a nice restaurant!\nName: review, dtype: object"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "df.review.str.split()",
      "execution_count": 5,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 5,
          "data": {
            "text/plain": "0    [It, is, a, really, nice, restaurant!, I, woul...\n1    [Good, price., Not, really, satisfied, with, t...\n2    [Like, it., Staff, there, was, quite, nice., R...\n3    [Looooove, it!!!!!, Really, good!, Highly, rec...\n4                                                   []\n5    [Not, so, bad., I, will, probably, come, back,...\n6                         [Such, a, nice, restaurant!]\nName: review, dtype: object"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "eg = df.copy()\neg['split_result'] = df.review.str.split(n=2)\neg",
      "execution_count": 6,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 6,
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>date</th>\n      <th>name</th>\n      <th>rate</th>\n      <th>review</th>\n      <th>split_result</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>2020-01-03</td>\n      <td>Peter</td>\n      <td>4.5</td>\n      <td>It is a really nice restaurant! I would recomm...</td>\n      <td>[It, is, a really nice restaurant! I would rec...</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>2020-03-04</td>\n      <td>Max</td>\n      <td>3.5</td>\n      <td>Good price. Not really satisfied with the serv...</td>\n      <td>[Good, price., Not really satisfied with the s...</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>2020-04-05</td>\n      <td>Ella</td>\n      <td>4.0</td>\n      <td>Like it. Staff there was quite nice. Recommend.</td>\n      <td>[Like, it., Staff there was quite nice. Recomm...</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>2020-02-01</td>\n      <td>Maria</td>\n      <td>4.5</td>\n      <td>Looooove it!!!!! Really good! Highly recommend!</td>\n      <td>[Looooove, it!!!!!, Really good! Highly recomm...</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>2020-02-09</td>\n      <td>Tom</td>\n      <td>3.0</td>\n      <td></td>\n      <td>[]</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>2020-03-12</td>\n      <td>Emma</td>\n      <td>3.5</td>\n      <td>Not so bad. I will probably come back again</td>\n      <td>[Not, so, bad. I will probably come back again]</td>\n    </tr>\n    <tr>\n      <th>6</th>\n      <td>2020-03-19</td>\n      <td>Lisa</td>\n      <td>5.0</td>\n      <td>Such a nice restaurant!</td>\n      <td>[Such, a, nice restaurant!]</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "         date   name  rate                                             review  \\\n0  2020-01-03  Peter   4.5  It is a really nice restaurant! I would recomm...   \n1  2020-03-04    Max   3.5  Good price. Not really satisfied with the serv...   \n2  2020-04-05   Ella   4.0    Like it. Staff there was quite nice. Recommend.   \n3  2020-02-01  Maria   4.5    Looooove it!!!!! Really good! Highly recommend!   \n4  2020-02-09    Tom   3.0                                                      \n5  2020-03-12   Emma   3.5        Not so bad. I will probably come back again   \n6  2020-03-19   Lisa   5.0                            Such a nice restaurant!   \n\n                                        split_result  \n0  [It, is, a really nice restaurant! I would rec...  \n1  [Good, price., Not really satisfied with the s...  \n2  [Like, it., Staff there was quite nice. Recomm...  \n3  [Looooove, it!!!!!, Really good! Highly recomm...  \n4                                                 []  \n5    [Not, so, bad. I will probably come back again]  \n6                        [Such, a, nice restaurant!]  "
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "#### example for using pattern and expand:\n* Pattern can be str or regular expression \n    * r'\\W' split on Not Word\n* Suggest using expand together with n, in case get too many columns  "
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "df.review.str.split(pat='!', n=2,expand = True)",
      "execution_count": 7,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 7,
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>0</th>\n      <th>1</th>\n      <th>2</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>It is a really nice restaurant</td>\n      <td>I would recommend it.</td>\n      <td>None</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>Good price. Not really satisfied with the serv...</td>\n      <td>None</td>\n      <td>None</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>Like it. Staff there was quite nice. Recommend.</td>\n      <td>None</td>\n      <td>None</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>Looooove it</td>\n      <td></td>\n      <td>!!! Really good! Highly recommend!</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td></td>\n      <td>None</td>\n      <td>None</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>Not so bad. I will probably come back again</td>\n      <td>None</td>\n      <td>None</td>\n    </tr>\n    <tr>\n      <th>6</th>\n      <td>Such a nice restaurant</td>\n      <td></td>\n      <td>None</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "                                                   0                       1  \\\n0                     It is a really nice restaurant   I would recommend it.   \n1  Good price. Not really satisfied with the serv...                    None   \n2    Like it. Staff there was quite nice. Recommend.                    None   \n3                                        Looooove it                           \n4                                                                       None   \n5        Not so bad. I will probably come back again                    None   \n6                             Such a nice restaurant                           \n\n                                    2  \n0                                None  \n1                                None  \n2                                None  \n3  !!! Really good! Highly recommend!  \n4                                None  \n5                                None  \n6                                None  "
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "tmp = df.review.str.split(pat= r'\\W', expand = True)\ntmp",
      "execution_count": 8,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 8,
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>0</th>\n      <th>1</th>\n      <th>2</th>\n      <th>3</th>\n      <th>4</th>\n      <th>5</th>\n      <th>6</th>\n      <th>7</th>\n      <th>8</th>\n      <th>9</th>\n      <th>10</th>\n      <th>11</th>\n      <th>12</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>It</td>\n      <td>is</td>\n      <td>a</td>\n      <td>really</td>\n      <td>nice</td>\n      <td>restaurant</td>\n      <td></td>\n      <td>I</td>\n      <td>would</td>\n      <td>recommend</td>\n      <td>it</td>\n      <td></td>\n      <td>None</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>Good</td>\n      <td>price</td>\n      <td></td>\n      <td>Not</td>\n      <td>really</td>\n      <td>satisfied</td>\n      <td>with</td>\n      <td>the</td>\n      <td>service</td>\n      <td></td>\n      <td>Not</td>\n      <td>recommend</td>\n      <td></td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>Like</td>\n      <td>it</td>\n      <td></td>\n      <td>Staff</td>\n      <td>there</td>\n      <td>was</td>\n      <td>quite</td>\n      <td>nice</td>\n      <td></td>\n      <td>Recommend</td>\n      <td></td>\n      <td>None</td>\n      <td>None</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>Looooove</td>\n      <td>it</td>\n      <td></td>\n      <td></td>\n      <td></td>\n      <td></td>\n      <td></td>\n      <td>Really</td>\n      <td>good</td>\n      <td></td>\n      <td>Highly</td>\n      <td>recommend</td>\n      <td></td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td></td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>Not</td>\n      <td>so</td>\n      <td>bad</td>\n      <td></td>\n      <td>I</td>\n      <td>will</td>\n      <td>probably</td>\n      <td>come</td>\n      <td>back</td>\n      <td>again</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n    </tr>\n    <tr>\n      <th>6</th>\n      <td>Such</td>\n      <td>a</td>\n      <td>nice</td>\n      <td>restaurant</td>\n      <td></td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "         0      1     2           3       4           5         6       7   \\\n0        It     is     a      really    nice  restaurant                 I   \n1      Good  price               Not  really   satisfied      with     the   \n2      Like     it             Staff   there         was     quite    nice   \n3  Looooove     it                                                  Really   \n4             None  None        None    None        None      None    None   \n5       Not     so   bad                   I        will  probably    come   \n6      Such      a  nice  restaurant                None      None    None   \n\n        8          9       10         11    12  \n0    would  recommend      it             None  \n1  service                Not  recommend        \n2           Recommend               None  None  \n3     good             Highly  recommend        \n4     None       None    None       None  None  \n5     back      again    None       None  None  \n6     None       None    None       None  None  "
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "#### exercise:\nSplit the reviews on full stop and exclamation mark and expand it into at most 3 columns.\n\nAll the split patterns could be inluded in [   ] as a list."
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "df.review.str.split(pat = '[.!]',n=2,expand = True)",
      "execution_count": 9,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 9,
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>0</th>\n      <th>1</th>\n      <th>2</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>It is a really nice restaurant</td>\n      <td>I would recommend it</td>\n      <td></td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>Good price</td>\n      <td>Not really satisfied with the service</td>\n      <td>Not recommend.</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>Like it</td>\n      <td>Staff there was quite nice</td>\n      <td>Recommend.</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>Looooove it</td>\n      <td></td>\n      <td>!!! Really good! Highly recommend!</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td></td>\n      <td>None</td>\n      <td>None</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>Not so bad</td>\n      <td>I will probably come back again</td>\n      <td>None</td>\n    </tr>\n    <tr>\n      <th>6</th>\n      <td>Such a nice restaurant</td>\n      <td></td>\n      <td>None</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "                                0                                       1  \\\n0  It is a really nice restaurant                    I would recommend it   \n1                      Good price   Not really satisfied with the service   \n2                         Like it              Staff there was quite nice   \n3                     Looooove it                                           \n4                                                                    None   \n5                      Not so bad         I will probably come back again   \n6          Such a nice restaurant                                           \n\n                                    2  \n0                                      \n1                      Not recommend.  \n2                          Recommend.  \n3  !!! Really good! Highly recommend!  \n4                                None  \n5                                None  \n6                                None  "
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "### DataFrame.replace( to_replace, value, inplace ... )\n\nReplace values given in to_replace with value. Values of the DataFrame are replaced with other values dynamically. This differs from updating with .loc or .iloc, which require you to specify a location to update with some value.\n\nParameters:\n\n* to_replace : str, regex, list and ect... used to indicate how to find the values that will be replaced\n* value: str, regex, list and ect..., default None, used to indicate the value to replace any values matching to_replace with.\n* inplace: bool, default False. \n\nReturn:\n\nDataFrame, after replacement.\n\n#### example:"
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "df_example = df.copy()\ndf_example.review.replace(to_replace = '', value = 'Good!')",
      "execution_count": 10,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 10,
          "data": {
            "text/plain": "0    It is a really nice restaurant! I would recomm...\n1    Good price. Not really satisfied with the serv...\n2      Like it. Staff there was quite nice. Recommend.\n3      Looooove it!!!!! Really good! Highly recommend!\n4                                                Good!\n5          Not so bad. I will probably come back again\n6                              Such a nice restaurant!\nName: review, dtype: object"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "df_example",
      "execution_count": 11,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 11,
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>date</th>\n      <th>name</th>\n      <th>rate</th>\n      <th>review</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>2020-01-03</td>\n      <td>Peter</td>\n      <td>4.5</td>\n      <td>It is a really nice restaurant! I would recomm...</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>2020-03-04</td>\n      <td>Max</td>\n      <td>3.5</td>\n      <td>Good price. Not really satisfied with the serv...</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>2020-04-05</td>\n      <td>Ella</td>\n      <td>4.0</td>\n      <td>Like it. Staff there was quite nice. Recommend.</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>2020-02-01</td>\n      <td>Maria</td>\n      <td>4.5</td>\n      <td>Looooove it!!!!! Really good! Highly recommend!</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>2020-02-09</td>\n      <td>Tom</td>\n      <td>3.0</td>\n      <td></td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>2020-03-12</td>\n      <td>Emma</td>\n      <td>3.5</td>\n      <td>Not so bad. I will probably come back again</td>\n    </tr>\n    <tr>\n      <th>6</th>\n      <td>2020-03-19</td>\n      <td>Lisa</td>\n      <td>5.0</td>\n      <td>Such a nice restaurant!</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "         date   name  rate                                             review\n0  2020-01-03  Peter   4.5  It is a really nice restaurant! I would recomm...\n1  2020-03-04    Max   3.5  Good price. Not really satisfied with the serv...\n2  2020-04-05   Ella   4.0    Like it. Staff there was quite nice. Recommend.\n3  2020-02-01  Maria   4.5    Looooove it!!!!! Really good! Highly recommend!\n4  2020-02-09    Tom   3.0                                                   \n5  2020-03-12   Emma   3.5        Not so bad. I will probably come back again\n6  2020-03-19   Lisa   5.0                            Such a nice restaurant!"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "#### example of using dict format"
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "df_example.replace({'review':{'':'Good!'},\n                    'name':{'Peter':'peter'}\n                   }\n                  )",
      "execution_count": 12,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 12,
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>date</th>\n      <th>name</th>\n      <th>rate</th>\n      <th>review</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>2020-01-03</td>\n      <td>peter</td>\n      <td>4.5</td>\n      <td>It is a really nice restaurant! I would recomm...</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>2020-03-04</td>\n      <td>Max</td>\n      <td>3.5</td>\n      <td>Good price. Not really satisfied with the serv...</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>2020-04-05</td>\n      <td>Ella</td>\n      <td>4.0</td>\n      <td>Like it. Staff there was quite nice. Recommend.</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>2020-02-01</td>\n      <td>Maria</td>\n      <td>4.5</td>\n      <td>Looooove it!!!!! Really good! Highly recommend!</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>2020-02-09</td>\n      <td>Tom</td>\n      <td>3.0</td>\n      <td>Good!</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>2020-03-12</td>\n      <td>Emma</td>\n      <td>3.5</td>\n      <td>Not so bad. I will probably come back again</td>\n    </tr>\n    <tr>\n      <th>6</th>\n      <td>2020-03-19</td>\n      <td>Lisa</td>\n      <td>5.0</td>\n      <td>Such a nice restaurant!</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "         date   name  rate                                             review\n0  2020-01-03  peter   4.5  It is a really nice restaurant! I would recomm...\n1  2020-03-04    Max   3.5  Good price. Not really satisfied with the serv...\n2  2020-04-05   Ella   4.0    Like it. Staff there was quite nice. Recommend.\n3  2020-02-01  Maria   4.5    Looooove it!!!!! Really good! Highly recommend!\n4  2020-02-09    Tom   3.0                                              Good!\n5  2020-03-12   Emma   3.5        Not so bad. I will probably come back again\n6  2020-03-19   Lisa   5.0                            Such a nice restaurant!"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "df_example",
      "execution_count": 13,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 13,
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>date</th>\n      <th>name</th>\n      <th>rate</th>\n      <th>review</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>2020-01-03</td>\n      <td>Peter</td>\n      <td>4.5</td>\n      <td>It is a really nice restaurant! I would recomm...</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>2020-03-04</td>\n      <td>Max</td>\n      <td>3.5</td>\n      <td>Good price. Not really satisfied with the serv...</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>2020-04-05</td>\n      <td>Ella</td>\n      <td>4.0</td>\n      <td>Like it. Staff there was quite nice. Recommend.</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>2020-02-01</td>\n      <td>Maria</td>\n      <td>4.5</td>\n      <td>Looooove it!!!!! Really good! Highly recommend!</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>2020-02-09</td>\n      <td>Tom</td>\n      <td>3.0</td>\n      <td></td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>2020-03-12</td>\n      <td>Emma</td>\n      <td>3.5</td>\n      <td>Not so bad. I will probably come back again</td>\n    </tr>\n    <tr>\n      <th>6</th>\n      <td>2020-03-19</td>\n      <td>Lisa</td>\n      <td>5.0</td>\n      <td>Such a nice restaurant!</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "         date   name  rate                                             review\n0  2020-01-03  Peter   4.5  It is a really nice restaurant! I would recomm...\n1  2020-03-04    Max   3.5  Good price. Not really satisfied with the serv...\n2  2020-04-05   Ella   4.0    Like it. Staff there was quite nice. Recommend.\n3  2020-02-01  Maria   4.5    Looooove it!!!!! Really good! Highly recommend!\n4  2020-02-09    Tom   3.0                                                   \n5  2020-03-12   Emma   3.5        Not so bad. I will probably come back again\n6  2020-03-19   Lisa   5.0                            Such a nice restaurant!"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "#### example for using 'inplace = True'"
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "df_example.review.replace(to_replace = '', value = 'Good!', inplace = True)\ndf_example",
      "execution_count": 14,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 14,
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>date</th>\n      <th>name</th>\n      <th>rate</th>\n      <th>review</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>2020-01-03</td>\n      <td>Peter</td>\n      <td>4.5</td>\n      <td>It is a really nice restaurant! I would recomm...</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>2020-03-04</td>\n      <td>Max</td>\n      <td>3.5</td>\n      <td>Good price. Not really satisfied with the serv...</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>2020-04-05</td>\n      <td>Ella</td>\n      <td>4.0</td>\n      <td>Like it. Staff there was quite nice. Recommend.</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>2020-02-01</td>\n      <td>Maria</td>\n      <td>4.5</td>\n      <td>Looooove it!!!!! Really good! Highly recommend!</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>2020-02-09</td>\n      <td>Tom</td>\n      <td>3.0</td>\n      <td>Good!</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>2020-03-12</td>\n      <td>Emma</td>\n      <td>3.5</td>\n      <td>Not so bad. I will probably come back again</td>\n    </tr>\n    <tr>\n      <th>6</th>\n      <td>2020-03-19</td>\n      <td>Lisa</td>\n      <td>5.0</td>\n      <td>Such a nice restaurant!</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "         date   name  rate                                             review\n0  2020-01-03  Peter   4.5  It is a really nice restaurant! I would recomm...\n1  2020-03-04    Max   3.5  Good price. Not really satisfied with the serv...\n2  2020-04-05   Ella   4.0    Like it. Staff there was quite nice. Recommend.\n3  2020-02-01  Maria   4.5    Looooove it!!!!! Really good! Highly recommend!\n4  2020-02-09    Tom   3.0                                              Good!\n5  2020-03-12   Emma   3.5        Not so bad. I will probably come back again\n6  2020-03-19   Lisa   5.0                            Such a nice restaurant!"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "#### exercise:\nReplace all the rates below 4 with 4. Rate is range from 0 to 5 with 0.5 as the interval.\n\nMultiple patterns can be put into [  ] as a list. [3,3.5]"
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "df.rate.replace(to_replace = [3.5,3.0],value = 4.0)",
      "execution_count": 15,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 15,
          "data": {
            "text/plain": "0    4.5\n1    4.0\n2    4.0\n3    4.5\n4    4.0\n5    4.0\n6    5.0\nName: rate, dtype: float64"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "df.replace({'rate':{3.0:4.0\n                   },\n            'name':{'Peter':'peter'}\n           })",
      "execution_count": 16,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 16,
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>date</th>\n      <th>name</th>\n      <th>rate</th>\n      <th>review</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>2020-01-03</td>\n      <td>peter</td>\n      <td>4.5</td>\n      <td>It is a really nice restaurant! I would recomm...</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>2020-03-04</td>\n      <td>Max</td>\n      <td>3.5</td>\n      <td>Good price. Not really satisfied with the serv...</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>2020-04-05</td>\n      <td>Ella</td>\n      <td>4.0</td>\n      <td>Like it. Staff there was quite nice. Recommend.</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>2020-02-01</td>\n      <td>Maria</td>\n      <td>4.5</td>\n      <td>Looooove it!!!!! Really good! Highly recommend!</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>2020-02-09</td>\n      <td>Tom</td>\n      <td>4.0</td>\n      <td></td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>2020-03-12</td>\n      <td>Emma</td>\n      <td>3.5</td>\n      <td>Not so bad. I will probably come back again</td>\n    </tr>\n    <tr>\n      <th>6</th>\n      <td>2020-03-19</td>\n      <td>Lisa</td>\n      <td>5.0</td>\n      <td>Such a nice restaurant!</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "         date   name  rate                                             review\n0  2020-01-03  peter   4.5  It is a really nice restaurant! I would recomm...\n1  2020-03-04    Max   3.5  Good price. Not really satisfied with the serv...\n2  2020-04-05   Ella   4.0    Like it. Staff there was quite nice. Recommend.\n3  2020-02-01  Maria   4.5    Looooove it!!!!! Really good! Highly recommend!\n4  2020-02-09    Tom   4.0                                                   \n5  2020-03-12   Emma   3.5        Not so bad. I will probably come back again\n6  2020-03-19   Lisa   5.0                            Such a nice restaurant!"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "### pandas.Series.str.extract( pat, expand, etc ...)\n\nExtract capture groups in the regex pat as columns in a DataFrame.\n\nParameters:\n\n* pat: str, regular expression pattern with capturing groups\n* expand: bool, default True. If True, return DataFrame with one column per capture group. If False, return a Series/Index if there is one capture group or DataFrame if there are multiple capture groups.\n\n#### example :"
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "df.review.str.extract('(good)',expand=True)",
      "execution_count": 17,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 17,
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>0</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>good</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>6</th>\n      <td>NaN</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "      0\n0   NaN\n1   NaN\n2   NaN\n3  good\n4   NaN\n5   NaN\n6   NaN"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "df.review.str.extract(r'(good|nice)',expand=True)",
      "execution_count": 18,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 18,
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>0</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>nice</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>nice</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>good</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>6</th>\n      <td>nice</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "      0\n0  nice\n1   NaN\n2  nice\n3  good\n4   NaN\n5   NaN\n6  nice"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "df.review.str.extract(r'(?P<positive> good|nice)',expand=False)",
      "execution_count": 19,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 19,
          "data": {
            "text/plain": "0     nice\n1      NaN\n2     nice\n3     good\n4      NaN\n5      NaN\n6     nice\nName: positive, dtype: object"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "### pandas.Series.str.warp(width, break_long_words, etc...)\n\nWrap strings in Series/Index at specified line width.\n\nParameters:\n* width: int, maximum line width \n* break_long_words: bool, optional. default: True\n    * If True, then words longer than width will be broken in order to ensure that no lines are longer than width. \n    * If it is false, long words will not be broken, and some lines may be longer than width."
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "print('abc\\ndfg')",
      "execution_count": 20,
      "outputs": [
        {
          "output_type": "stream",
          "text": "abc\ndfg\n",
          "name": "stdout"
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "df.review.str.wrap(10)",
      "execution_count": 21,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 21,
          "data": {
            "text/plain": "0    It is a\\nreally\\nnice resta\\nurant! I\\nwould\\n...\n1    Good\\nprice. Not\\nreally\\nsatisfied\\nwith the\\...\n2    Like it.\\nStaff\\nthere was\\nquite\\nnice.\\nReco...\n3    Looooove\\nit!!!!!\\nReally\\ngood!\\nHighly\\nreco...\n4                                                     \n5     Not so\\nbad. I\\nwill\\nprobably\\ncome back\\nagain\n6                           Such a\\nnice resta\\nurant!\nName: review, dtype: object"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "print (df.review.str.wrap(10)[0])",
      "execution_count": 22,
      "outputs": [
        {
          "output_type": "stream",
          "text": "It is a\nreally\nnice resta\nurant! I\nwould\nrecommend\nit.\n",
          "name": "stdout"
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "df.review.str.wrap(10,\n                   break_long_words=False)",
      "execution_count": 23,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 23,
          "data": {
            "text/plain": "0    It is a\\nreally\\nnice\\nrestaurant!\\nI would\\nr...\n1    Good\\nprice. Not\\nreally\\nsatisfied\\nwith the\\...\n2    Like it.\\nStaff\\nthere was\\nquite\\nnice.\\nReco...\n3    Looooove\\nit!!!!!\\nReally\\ngood!\\nHighly\\nreco...\n4                                                     \n5     Not so\\nbad. I\\nwill\\nprobably\\ncome back\\nagain\n6                            Such a\\nnice\\nrestaurant!\nName: review, dtype: object"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "###  pandas.Series.str.partition(sep,expand ...)\n\nThis method splits the string at the first occurrence of sep, and returns 3 elements containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return 3 elements containing the string itself, followed by two empty strings.\n\nParameters: \n* sep: str, default whitespace indicating the String to split on.\n* expand: bool, default True\n    * If True, return DataFrame/MultiIndex expanding dimensionality. \n    * If False, return Series/Index.\n\nReturns: \n* DataFrame/MultiIndex or Series/Index of objects\n\n#### example :"
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "df.review.str.partition()",
      "execution_count": 24,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 24,
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>0</th>\n      <th>1</th>\n      <th>2</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>It</td>\n      <td></td>\n      <td>is a really nice restaurant! I would recommend...</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>Good</td>\n      <td></td>\n      <td>price. Not really satisfied with the service. ...</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>Like</td>\n      <td></td>\n      <td>it. Staff there was quite nice. Recommend.</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>Looooove</td>\n      <td></td>\n      <td>it!!!!! Really good! Highly recommend!</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td></td>\n      <td></td>\n      <td></td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>Not</td>\n      <td></td>\n      <td>so bad. I will probably come back again</td>\n    </tr>\n    <tr>\n      <th>6</th>\n      <td>Such</td>\n      <td></td>\n      <td>a nice restaurant!</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "          0  1                                                  2\n0        It     is a really nice restaurant! I would recommend...\n1      Good     price. Not really satisfied with the service. ...\n2      Like            it. Staff there was quite nice. Recommend.\n3  Looooove                it!!!!! Really good! Highly recommend!\n4                                                                \n5       Not               so bad. I will probably come back again\n6      Such                                    a nice restaurant!"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "df.review.str.partition('!')",
      "execution_count": 25,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 25,
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>0</th>\n      <th>1</th>\n      <th>2</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>It is a really nice restaurant</td>\n      <td>!</td>\n      <td>I would recommend it.</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>Good price. Not really satisfied with the serv...</td>\n      <td></td>\n      <td></td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>Like it. Staff there was quite nice. Recommend.</td>\n      <td></td>\n      <td></td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>Looooove it</td>\n      <td>!</td>\n      <td>!!!! Really good! Highly recommend!</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td></td>\n      <td></td>\n      <td></td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>Not so bad. I will probably come back again</td>\n      <td></td>\n      <td></td>\n    </tr>\n    <tr>\n      <th>6</th>\n      <td>Such a nice restaurant</td>\n      <td>!</td>\n      <td></td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "                                                   0  1  \\\n0                     It is a really nice restaurant  !   \n1  Good price. Not really satisfied with the serv...      \n2    Like it. Staff there was quite nice. Recommend.      \n3                                        Looooove it  !   \n4                                                         \n5        Not so bad. I will probably come back again      \n6                             Such a nice restaurant  !   \n\n                                     2  \n0                I would recommend it.  \n1                                       \n2                                       \n3  !!!! Really good! Highly recommend!  \n4                                       \n5                                       \n6                                       "
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "### pandas.Series.str.swapcase()\n\nConvert strings in the Series/Index to be swapcased. Uppcase the letters in lowcase, lowcase the letters in uppcase.\n\n#### example"
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "df.review.str.swapcase()",
      "execution_count": 26,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 26,
          "data": {
            "text/plain": "0    iT IS A REALLY NICE RESTAURANT! i WOULD RECOMM...\n1    gOOD PRICE. nOT REALLY SATISFIED WITH THE SERV...\n2      lIKE IT. sTAFF THERE WAS QUITE NICE. rECOMMEND.\n3      lOOOOOVE IT!!!!! rEALLY GOOD! hIGHLY RECOMMEND!\n4                                                     \n5          nOT SO BAD. i WILL PROBABLY COME BACK AGAIN\n6                              sUCH A NICE RESTAURANT!\nName: review, dtype: object"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "#### Other similar functions "
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "df.review.str.lower()",
      "execution_count": 27,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 27,
          "data": {
            "text/plain": "0    it is a really nice restaurant! i would recomm...\n1    good price. not really satisfied with the serv...\n2      like it. staff there was quite nice. recommend.\n3      looooove it!!!!! really good! highly recommend!\n4                                                     \n5          not so bad. i will probably come back again\n6                              such a nice restaurant!\nName: review, dtype: object"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "df.review.str.upper()",
      "execution_count": 28,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 28,
          "data": {
            "text/plain": "0    IT IS A REALLY NICE RESTAURANT! I WOULD RECOMM...\n1    GOOD PRICE. NOT REALLY SATISFIED WITH THE SERV...\n2      LIKE IT. STAFF THERE WAS QUITE NICE. RECOMMEND.\n3      LOOOOOVE IT!!!!! REALLY GOOD! HIGHLY RECOMMEND!\n4                                                     \n5          NOT SO BAD. I WILL PROBABLY COME BACK AGAIN\n6                              SUCH A NICE RESTAURANT!\nName: review, dtype: object"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "df.review.str.title()",
      "execution_count": 29,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 29,
          "data": {
            "text/plain": "0    It Is A Really Nice Restaurant! I Would Recomm...\n1    Good Price. Not Really Satisfied With The Serv...\n2      Like It. Staff There Was Quite Nice. Recommend.\n3      Looooove It!!!!! Really Good! Highly Recommend!\n4                                                     \n5          Not So Bad. I Will Probably Come Back Again\n6                              Such A Nice Restaurant!\nName: review, dtype: object"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "###  pandas.Series.str.capitalize()\n\nConvert strings in the Series/Index to be capitalized. Uppercase its first letter, and leave the rest of the string as-is.\n\n#### example :"
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "df.review.str.capitalize()",
      "execution_count": 30,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 30,
          "data": {
            "text/plain": "0    It is a really nice restaurant! i would recomm...\n1    Good price. not really satisfied with the serv...\n2      Like it. staff there was quite nice. recommend.\n3      Looooove it!!!!! really good! highly recommend!\n4                                                     \n5          Not so bad. i will probably come back again\n6                              Such a nice restaurant!\nName: review, dtype: object"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "###  pandas.Series.str.rfind()\n\nReturn highest indexes in each strings in the Series/Index.Each of returned indexes corresponds to the position where the substring is fully contained between [start:end]. Return -1 on failure.\n\nParameters: \n* sub: str. Substring being searched.\n* start: int. Left edge index.\n* end: int. Right edge index\n\nReturns: \n* Series or Index of int.\n\n#### example :"
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "df.review",
      "execution_count": 31,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 31,
          "data": {
            "text/plain": "0    It is a really nice restaurant! I would recomm...\n1    Good price. Not really satisfied with the serv...\n2      Like it. Staff there was quite nice. Recommend.\n3      Looooove it!!!!! Really good! Highly recommend!\n4                                                     \n5          Not so bad. I will probably come back again\n6                              Such a nice restaurant!\nName: review, dtype: object"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "df.review.str.rfind('good')",
      "execution_count": 32,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 32,
          "data": {
            "text/plain": "0    -1\n1    -1\n2    -1\n3    24\n4    -1\n5    -1\n6    -1\nName: review, dtype: int64"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "df.review[3][24]",
      "execution_count": 33,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 33,
          "data": {
            "text/plain": "'g'"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "## End\n\nThey are only a means to the end of learning how to do good data analysis "
    }
  ],
  "metadata": {
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3",
      "language": "python"
    },
    "gist": {
      "id": "",
      "data": {
        "description": "Pandas - Working with text data.ipynb",
        "public": false
      }
    },
    "language_info": {
      "file_extension": ".py",
      "nbconvert_exporter": "python",
      "version": "3.5.4",
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "name": "python",
      "mimetype": "text/x-python",
      "pygments_lexer": "ipython3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 2
}