Skip to content

Instantly share code, notes, and snippets.

@jing-jin-mc
Created December 3, 2020 16:44
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jing-jin-mc/5ad506bc1afa700a64660e98c60d8024 to your computer and use it in GitHub Desktop.
Save jing-jin-mc/5ad506bc1afa700a64660e98c60d8024 to your computer and use it in GitHub Desktop.
Pandas - Working with text data.ipynb
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"metadata": {},
"cell_type": "markdown",
"source": "## Working with text data \n### Functions:\n* split\n* replace\n* extract\n* wrap\n* partition\n* swapcase\n* capitalize\n* rfind"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "import pandas as pd ",
"execution_count": 1,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "#### create a test dataset to demo functions"
},
{
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"trusted": true
},
"cell_type": "code",
"source": "df = pd.DataFrame(data = {'date':['2020-01-03',\n '2020-03-04',\n '2020-04-05',\n '2020-02-01',\n '2020-02-09',\n '2020-03-12',\n '2020-03-19'\n ],\n 'name':['Peter','Max','Ella','Maria','Tom','Emma','Lisa'],\n 'rate':[4.5,3.5,4,4.5,3,3.5,5],\n 'review':['It is a really nice restaurant! I would recommend it.',\n 'Good price. Not really satisfied with the service. Not recommend.',\n 'Like it. Staff there was quite nice. Recommend.',\n 'Looooove it!!!!! Really good! Highly recommend!',\n '',\n 'Not so bad. I will probably come back again',\n 'Such a nice restaurant!'\n ]\n })\ndf",
"execution_count": 2,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 2,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>name</th>\n <th>rate</th>\n <th>review</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>2020-01-03</td>\n <td>Peter</td>\n <td>4.5</td>\n <td>It is a really nice restaurant! I would recomm...</td>\n </tr>\n <tr>\n <th>1</th>\n <td>2020-03-04</td>\n <td>Max</td>\n <td>3.5</td>\n <td>Good price. Not really satisfied with the serv...</td>\n </tr>\n <tr>\n <th>2</th>\n <td>2020-04-05</td>\n <td>Ella</td>\n <td>4.0</td>\n <td>Like it. Staff there was quite nice. Recommend.</td>\n </tr>\n <tr>\n <th>3</th>\n <td>2020-02-01</td>\n <td>Maria</td>\n <td>4.5</td>\n <td>Looooove it!!!!! Really good! Highly recommend!</td>\n </tr>\n <tr>\n <th>4</th>\n <td>2020-02-09</td>\n <td>Tom</td>\n <td>3.0</td>\n <td></td>\n </tr>\n <tr>\n <th>5</th>\n <td>2020-03-12</td>\n <td>Emma</td>\n <td>3.5</td>\n <td>Not so bad. I will probably come back again</td>\n </tr>\n <tr>\n <th>6</th>\n <td>2020-03-19</td>\n <td>Lisa</td>\n <td>5.0</td>\n <td>Such a nice restaurant!</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " date name rate review\n0 2020-01-03 Peter 4.5 It is a really nice restaurant! I would recomm...\n1 2020-03-04 Max 3.5 Good price. Not really satisfied with the serv...\n2 2020-04-05 Ella 4.0 Like it. Staff there was quite nice. Recommend.\n3 2020-02-01 Maria 4.5 Looooove it!!!!! Really good! Highly recommend!\n4 2020-02-09 Tom 3.0 \n5 2020-03-12 Emma 3.5 Not so bad. I will probably come back again\n6 2020-03-19 Lisa 5.0 Such a nice restaurant!"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df.info()",
"execution_count": 3,
"outputs": [
{
"output_type": "stream",
"text": "<class 'pandas.core.frame.DataFrame'>\nRangeIndex: 7 entries, 0 to 6\nData columns (total 4 columns):\ndate 7 non-null object\nname 7 non-null object\nrate 7 non-null float64\nreview 7 non-null object\ndtypes: float64(1), object(3)\nmemory usage: 304.0+ bytes\n",
"name": "stdout"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### Pandas.Series.str.split(pat, n, expand)\n\nSplits the string in the Series/Index from the beginning, at the specified separator/delimiter string.\n\n* pat: String or regular expression to split on. If not specified, split on whitespace.\n* n: int, limit number of splits in output. None, 0 and -1 will be interpreted as return all splits.\n* expand: bool, default False, indicating whether expand the split strings into separate columns.\n * If True, return DataFrame/MultiIndex expanding dimensionality.\n * If False, return Series/Index, containing lists of strings.\n\n#### example: \nwhen using n = 2, the string will be split two times into three sub strings "
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df.review",
"execution_count": 4,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 4,
"data": {
"text/plain": "0 It is a really nice restaurant! I would recomm...\n1 Good price. Not really satisfied with the serv...\n2 Like it. Staff there was quite nice. Recommend.\n3 Looooove it!!!!! Really good! Highly recommend!\n4 \n5 Not so bad. I will probably come back again\n6 Such a nice restaurant!\nName: review, dtype: object"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df.review.str.split()",
"execution_count": 5,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 5,
"data": {
"text/plain": "0 [It, is, a, really, nice, restaurant!, I, woul...\n1 [Good, price., Not, really, satisfied, with, t...\n2 [Like, it., Staff, there, was, quite, nice., R...\n3 [Looooove, it!!!!!, Really, good!, Highly, rec...\n4 []\n5 [Not, so, bad., I, will, probably, come, back,...\n6 [Such, a, nice, restaurant!]\nName: review, dtype: object"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "eg = df.copy()\neg['split_result'] = df.review.str.split(n=2)\neg",
"execution_count": 6,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 6,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>name</th>\n <th>rate</th>\n <th>review</th>\n <th>split_result</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>2020-01-03</td>\n <td>Peter</td>\n <td>4.5</td>\n <td>It is a really nice restaurant! I would recomm...</td>\n <td>[It, is, a really nice restaurant! I would rec...</td>\n </tr>\n <tr>\n <th>1</th>\n <td>2020-03-04</td>\n <td>Max</td>\n <td>3.5</td>\n <td>Good price. Not really satisfied with the serv...</td>\n <td>[Good, price., Not really satisfied with the s...</td>\n </tr>\n <tr>\n <th>2</th>\n <td>2020-04-05</td>\n <td>Ella</td>\n <td>4.0</td>\n <td>Like it. Staff there was quite nice. Recommend.</td>\n <td>[Like, it., Staff there was quite nice. Recomm...</td>\n </tr>\n <tr>\n <th>3</th>\n <td>2020-02-01</td>\n <td>Maria</td>\n <td>4.5</td>\n <td>Looooove it!!!!! Really good! Highly recommend!</td>\n <td>[Looooove, it!!!!!, Really good! Highly recomm...</td>\n </tr>\n <tr>\n <th>4</th>\n <td>2020-02-09</td>\n <td>Tom</td>\n <td>3.0</td>\n <td></td>\n <td>[]</td>\n </tr>\n <tr>\n <th>5</th>\n <td>2020-03-12</td>\n <td>Emma</td>\n <td>3.5</td>\n <td>Not so bad. I will probably come back again</td>\n <td>[Not, so, bad. I will probably come back again]</td>\n </tr>\n <tr>\n <th>6</th>\n <td>2020-03-19</td>\n <td>Lisa</td>\n <td>5.0</td>\n <td>Such a nice restaurant!</td>\n <td>[Such, a, nice restaurant!]</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " date name rate review \\\n0 2020-01-03 Peter 4.5 It is a really nice restaurant! I would recomm... \n1 2020-03-04 Max 3.5 Good price. Not really satisfied with the serv... \n2 2020-04-05 Ella 4.0 Like it. Staff there was quite nice. Recommend. \n3 2020-02-01 Maria 4.5 Looooove it!!!!! Really good! Highly recommend! \n4 2020-02-09 Tom 3.0 \n5 2020-03-12 Emma 3.5 Not so bad. I will probably come back again \n6 2020-03-19 Lisa 5.0 Such a nice restaurant! \n\n split_result \n0 [It, is, a really nice restaurant! I would rec... \n1 [Good, price., Not really satisfied with the s... \n2 [Like, it., Staff there was quite nice. Recomm... \n3 [Looooove, it!!!!!, Really good! Highly recomm... \n4 [] \n5 [Not, so, bad. I will probably come back again] \n6 [Such, a, nice restaurant!] "
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "#### example for using pattern and expand:\n* Pattern can be str or regular expression \n * r'\\W' split on Not Word\n* Suggest using expand together with n, in case get too many columns "
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df.review.str.split(pat='!', n=2,expand = True)",
"execution_count": 7,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 7,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>0</th>\n <th>1</th>\n <th>2</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>It is a really nice restaurant</td>\n <td>I would recommend it.</td>\n <td>None</td>\n </tr>\n <tr>\n <th>1</th>\n <td>Good price. Not really satisfied with the serv...</td>\n <td>None</td>\n <td>None</td>\n </tr>\n <tr>\n <th>2</th>\n <td>Like it. Staff there was quite nice. Recommend.</td>\n <td>None</td>\n <td>None</td>\n </tr>\n <tr>\n <th>3</th>\n <td>Looooove it</td>\n <td></td>\n <td>!!! Really good! Highly recommend!</td>\n </tr>\n <tr>\n <th>4</th>\n <td></td>\n <td>None</td>\n <td>None</td>\n </tr>\n <tr>\n <th>5</th>\n <td>Not so bad. I will probably come back again</td>\n <td>None</td>\n <td>None</td>\n </tr>\n <tr>\n <th>6</th>\n <td>Such a nice restaurant</td>\n <td></td>\n <td>None</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " 0 1 \\\n0 It is a really nice restaurant I would recommend it. \n1 Good price. Not really satisfied with the serv... None \n2 Like it. Staff there was quite nice. Recommend. None \n3 Looooove it \n4 None \n5 Not so bad. I will probably come back again None \n6 Such a nice restaurant \n\n 2 \n0 None \n1 None \n2 None \n3 !!! Really good! Highly recommend! \n4 None \n5 None \n6 None "
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "tmp = df.review.str.split(pat= r'\\W', expand = True)\ntmp",
"execution_count": 8,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 8,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>0</th>\n <th>1</th>\n <th>2</th>\n <th>3</th>\n <th>4</th>\n <th>5</th>\n <th>6</th>\n <th>7</th>\n <th>8</th>\n <th>9</th>\n <th>10</th>\n <th>11</th>\n <th>12</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>It</td>\n <td>is</td>\n <td>a</td>\n <td>really</td>\n <td>nice</td>\n <td>restaurant</td>\n <td></td>\n <td>I</td>\n <td>would</td>\n <td>recommend</td>\n <td>it</td>\n <td></td>\n <td>None</td>\n </tr>\n <tr>\n <th>1</th>\n <td>Good</td>\n <td>price</td>\n <td></td>\n <td>Not</td>\n <td>really</td>\n <td>satisfied</td>\n <td>with</td>\n <td>the</td>\n <td>service</td>\n <td></td>\n <td>Not</td>\n <td>recommend</td>\n <td></td>\n </tr>\n <tr>\n <th>2</th>\n <td>Like</td>\n <td>it</td>\n <td></td>\n <td>Staff</td>\n <td>there</td>\n <td>was</td>\n <td>quite</td>\n <td>nice</td>\n <td></td>\n <td>Recommend</td>\n <td></td>\n <td>None</td>\n <td>None</td>\n </tr>\n <tr>\n <th>3</th>\n <td>Looooove</td>\n <td>it</td>\n <td></td>\n <td></td>\n <td></td>\n <td></td>\n <td></td>\n <td>Really</td>\n <td>good</td>\n <td></td>\n <td>Highly</td>\n <td>recommend</td>\n <td></td>\n </tr>\n <tr>\n <th>4</th>\n <td></td>\n <td>None</td>\n <td>None</td>\n <td>None</td>\n <td>None</td>\n <td>None</td>\n <td>None</td>\n <td>None</td>\n <td>None</td>\n <td>None</td>\n <td>None</td>\n <td>None</td>\n <td>None</td>\n </tr>\n <tr>\n <th>5</th>\n <td>Not</td>\n <td>so</td>\n <td>bad</td>\n <td></td>\n <td>I</td>\n <td>will</td>\n <td>probably</td>\n <td>come</td>\n <td>back</td>\n <td>again</td>\n <td>None</td>\n <td>None</td>\n <td>None</td>\n </tr>\n <tr>\n <th>6</th>\n <td>Such</td>\n <td>a</td>\n <td>nice</td>\n <td>restaurant</td>\n <td></td>\n <td>None</td>\n <td>None</td>\n <td>None</td>\n <td>None</td>\n <td>None</td>\n <td>None</td>\n <td>None</td>\n <td>None</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " 0 1 2 3 4 5 6 7 \\\n0 It is a really nice restaurant I \n1 Good price Not really satisfied with the \n2 Like it Staff there was quite nice \n3 Looooove it Really \n4 None None None None None None None \n5 Not so bad I will probably come \n6 Such a nice restaurant None None None \n\n 8 9 10 11 12 \n0 would recommend it None \n1 service Not recommend \n2 Recommend None None \n3 good Highly recommend \n4 None None None None None \n5 back again None None None \n6 None None None None None "
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "#### exercise:\nSplit the reviews on full stop and exclamation mark and expand it into at most 3 columns.\n\nAll the split patterns could be inluded in [ ] as a list."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df.review.str.split(pat = '[.!]',n=2,expand = True)",
"execution_count": 9,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 9,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>0</th>\n <th>1</th>\n <th>2</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>It is a really nice restaurant</td>\n <td>I would recommend it</td>\n <td></td>\n </tr>\n <tr>\n <th>1</th>\n <td>Good price</td>\n <td>Not really satisfied with the service</td>\n <td>Not recommend.</td>\n </tr>\n <tr>\n <th>2</th>\n <td>Like it</td>\n <td>Staff there was quite nice</td>\n <td>Recommend.</td>\n </tr>\n <tr>\n <th>3</th>\n <td>Looooove it</td>\n <td></td>\n <td>!!! Really good! Highly recommend!</td>\n </tr>\n <tr>\n <th>4</th>\n <td></td>\n <td>None</td>\n <td>None</td>\n </tr>\n <tr>\n <th>5</th>\n <td>Not so bad</td>\n <td>I will probably come back again</td>\n <td>None</td>\n </tr>\n <tr>\n <th>6</th>\n <td>Such a nice restaurant</td>\n <td></td>\n <td>None</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " 0 1 \\\n0 It is a really nice restaurant I would recommend it \n1 Good price Not really satisfied with the service \n2 Like it Staff there was quite nice \n3 Looooove it \n4 None \n5 Not so bad I will probably come back again \n6 Such a nice restaurant \n\n 2 \n0 \n1 Not recommend. \n2 Recommend. \n3 !!! Really good! Highly recommend! \n4 None \n5 None \n6 None "
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### DataFrame.replace( to_replace, value, inplace ... )\n\nReplace values given in to_replace with value. Values of the DataFrame are replaced with other values dynamically. This differs from updating with .loc or .iloc, which require you to specify a location to update with some value.\n\nParameters:\n\n* to_replace : str, regex, list and ect... used to indicate how to find the values that will be replaced\n* value: str, regex, list and ect..., default None, used to indicate the value to replace any values matching to_replace with.\n* inplace: bool, default False. \n\nReturn:\n\nDataFrame, after replacement.\n\n#### example:"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_example = df.copy()\ndf_example.review.replace(to_replace = '', value = 'Good!')",
"execution_count": 10,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 10,
"data": {
"text/plain": "0 It is a really nice restaurant! I would recomm...\n1 Good price. Not really satisfied with the serv...\n2 Like it. Staff there was quite nice. Recommend.\n3 Looooove it!!!!! Really good! Highly recommend!\n4 Good!\n5 Not so bad. I will probably come back again\n6 Such a nice restaurant!\nName: review, dtype: object"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_example",
"execution_count": 11,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 11,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>name</th>\n <th>rate</th>\n <th>review</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>2020-01-03</td>\n <td>Peter</td>\n <td>4.5</td>\n <td>It is a really nice restaurant! I would recomm...</td>\n </tr>\n <tr>\n <th>1</th>\n <td>2020-03-04</td>\n <td>Max</td>\n <td>3.5</td>\n <td>Good price. Not really satisfied with the serv...</td>\n </tr>\n <tr>\n <th>2</th>\n <td>2020-04-05</td>\n <td>Ella</td>\n <td>4.0</td>\n <td>Like it. Staff there was quite nice. Recommend.</td>\n </tr>\n <tr>\n <th>3</th>\n <td>2020-02-01</td>\n <td>Maria</td>\n <td>4.5</td>\n <td>Looooove it!!!!! Really good! Highly recommend!</td>\n </tr>\n <tr>\n <th>4</th>\n <td>2020-02-09</td>\n <td>Tom</td>\n <td>3.0</td>\n <td></td>\n </tr>\n <tr>\n <th>5</th>\n <td>2020-03-12</td>\n <td>Emma</td>\n <td>3.5</td>\n <td>Not so bad. I will probably come back again</td>\n </tr>\n <tr>\n <th>6</th>\n <td>2020-03-19</td>\n <td>Lisa</td>\n <td>5.0</td>\n <td>Such a nice restaurant!</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " date name rate review\n0 2020-01-03 Peter 4.5 It is a really nice restaurant! I would recomm...\n1 2020-03-04 Max 3.5 Good price. Not really satisfied with the serv...\n2 2020-04-05 Ella 4.0 Like it. Staff there was quite nice. Recommend.\n3 2020-02-01 Maria 4.5 Looooove it!!!!! Really good! Highly recommend!\n4 2020-02-09 Tom 3.0 \n5 2020-03-12 Emma 3.5 Not so bad. I will probably come back again\n6 2020-03-19 Lisa 5.0 Such a nice restaurant!"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "#### example of using dict format"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_example.replace({'review':{'':'Good!'},\n 'name':{'Peter':'peter'}\n }\n )",
"execution_count": 12,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 12,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>name</th>\n <th>rate</th>\n <th>review</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>2020-01-03</td>\n <td>peter</td>\n <td>4.5</td>\n <td>It is a really nice restaurant! I would recomm...</td>\n </tr>\n <tr>\n <th>1</th>\n <td>2020-03-04</td>\n <td>Max</td>\n <td>3.5</td>\n <td>Good price. Not really satisfied with the serv...</td>\n </tr>\n <tr>\n <th>2</th>\n <td>2020-04-05</td>\n <td>Ella</td>\n <td>4.0</td>\n <td>Like it. Staff there was quite nice. Recommend.</td>\n </tr>\n <tr>\n <th>3</th>\n <td>2020-02-01</td>\n <td>Maria</td>\n <td>4.5</td>\n <td>Looooove it!!!!! Really good! Highly recommend!</td>\n </tr>\n <tr>\n <th>4</th>\n <td>2020-02-09</td>\n <td>Tom</td>\n <td>3.0</td>\n <td>Good!</td>\n </tr>\n <tr>\n <th>5</th>\n <td>2020-03-12</td>\n <td>Emma</td>\n <td>3.5</td>\n <td>Not so bad. I will probably come back again</td>\n </tr>\n <tr>\n <th>6</th>\n <td>2020-03-19</td>\n <td>Lisa</td>\n <td>5.0</td>\n <td>Such a nice restaurant!</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " date name rate review\n0 2020-01-03 peter 4.5 It is a really nice restaurant! I would recomm...\n1 2020-03-04 Max 3.5 Good price. Not really satisfied with the serv...\n2 2020-04-05 Ella 4.0 Like it. Staff there was quite nice. Recommend.\n3 2020-02-01 Maria 4.5 Looooove it!!!!! Really good! Highly recommend!\n4 2020-02-09 Tom 3.0 Good!\n5 2020-03-12 Emma 3.5 Not so bad. I will probably come back again\n6 2020-03-19 Lisa 5.0 Such a nice restaurant!"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_example",
"execution_count": 13,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 13,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>name</th>\n <th>rate</th>\n <th>review</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>2020-01-03</td>\n <td>Peter</td>\n <td>4.5</td>\n <td>It is a really nice restaurant! I would recomm...</td>\n </tr>\n <tr>\n <th>1</th>\n <td>2020-03-04</td>\n <td>Max</td>\n <td>3.5</td>\n <td>Good price. Not really satisfied with the serv...</td>\n </tr>\n <tr>\n <th>2</th>\n <td>2020-04-05</td>\n <td>Ella</td>\n <td>4.0</td>\n <td>Like it. Staff there was quite nice. Recommend.</td>\n </tr>\n <tr>\n <th>3</th>\n <td>2020-02-01</td>\n <td>Maria</td>\n <td>4.5</td>\n <td>Looooove it!!!!! Really good! Highly recommend!</td>\n </tr>\n <tr>\n <th>4</th>\n <td>2020-02-09</td>\n <td>Tom</td>\n <td>3.0</td>\n <td></td>\n </tr>\n <tr>\n <th>5</th>\n <td>2020-03-12</td>\n <td>Emma</td>\n <td>3.5</td>\n <td>Not so bad. I will probably come back again</td>\n </tr>\n <tr>\n <th>6</th>\n <td>2020-03-19</td>\n <td>Lisa</td>\n <td>5.0</td>\n <td>Such a nice restaurant!</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " date name rate review\n0 2020-01-03 Peter 4.5 It is a really nice restaurant! I would recomm...\n1 2020-03-04 Max 3.5 Good price. Not really satisfied with the serv...\n2 2020-04-05 Ella 4.0 Like it. Staff there was quite nice. Recommend.\n3 2020-02-01 Maria 4.5 Looooove it!!!!! Really good! Highly recommend!\n4 2020-02-09 Tom 3.0 \n5 2020-03-12 Emma 3.5 Not so bad. I will probably come back again\n6 2020-03-19 Lisa 5.0 Such a nice restaurant!"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "#### example for using 'inplace = True'"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df_example.review.replace(to_replace = '', value = 'Good!', inplace = True)\ndf_example",
"execution_count": 14,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 14,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>name</th>\n <th>rate</th>\n <th>review</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>2020-01-03</td>\n <td>Peter</td>\n <td>4.5</td>\n <td>It is a really nice restaurant! I would recomm...</td>\n </tr>\n <tr>\n <th>1</th>\n <td>2020-03-04</td>\n <td>Max</td>\n <td>3.5</td>\n <td>Good price. Not really satisfied with the serv...</td>\n </tr>\n <tr>\n <th>2</th>\n <td>2020-04-05</td>\n <td>Ella</td>\n <td>4.0</td>\n <td>Like it. Staff there was quite nice. Recommend.</td>\n </tr>\n <tr>\n <th>3</th>\n <td>2020-02-01</td>\n <td>Maria</td>\n <td>4.5</td>\n <td>Looooove it!!!!! Really good! Highly recommend!</td>\n </tr>\n <tr>\n <th>4</th>\n <td>2020-02-09</td>\n <td>Tom</td>\n <td>3.0</td>\n <td>Good!</td>\n </tr>\n <tr>\n <th>5</th>\n <td>2020-03-12</td>\n <td>Emma</td>\n <td>3.5</td>\n <td>Not so bad. I will probably come back again</td>\n </tr>\n <tr>\n <th>6</th>\n <td>2020-03-19</td>\n <td>Lisa</td>\n <td>5.0</td>\n <td>Such a nice restaurant!</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " date name rate review\n0 2020-01-03 Peter 4.5 It is a really nice restaurant! I would recomm...\n1 2020-03-04 Max 3.5 Good price. Not really satisfied with the serv...\n2 2020-04-05 Ella 4.0 Like it. Staff there was quite nice. Recommend.\n3 2020-02-01 Maria 4.5 Looooove it!!!!! Really good! Highly recommend!\n4 2020-02-09 Tom 3.0 Good!\n5 2020-03-12 Emma 3.5 Not so bad. I will probably come back again\n6 2020-03-19 Lisa 5.0 Such a nice restaurant!"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "#### exercise:\nReplace all the rates below 4 with 4. Rate is range from 0 to 5 with 0.5 as the interval.\n\nMultiple patterns can be put into [ ] as a list. [3,3.5]"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df.rate.replace(to_replace = [3.5,3.0],value = 4.0)",
"execution_count": 15,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 15,
"data": {
"text/plain": "0 4.5\n1 4.0\n2 4.0\n3 4.5\n4 4.0\n5 4.0\n6 5.0\nName: rate, dtype: float64"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df.replace({'rate':{3.0:4.0\n },\n 'name':{'Peter':'peter'}\n })",
"execution_count": 16,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 16,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>name</th>\n <th>rate</th>\n <th>review</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>2020-01-03</td>\n <td>peter</td>\n <td>4.5</td>\n <td>It is a really nice restaurant! I would recomm...</td>\n </tr>\n <tr>\n <th>1</th>\n <td>2020-03-04</td>\n <td>Max</td>\n <td>3.5</td>\n <td>Good price. Not really satisfied with the serv...</td>\n </tr>\n <tr>\n <th>2</th>\n <td>2020-04-05</td>\n <td>Ella</td>\n <td>4.0</td>\n <td>Like it. Staff there was quite nice. Recommend.</td>\n </tr>\n <tr>\n <th>3</th>\n <td>2020-02-01</td>\n <td>Maria</td>\n <td>4.5</td>\n <td>Looooove it!!!!! Really good! Highly recommend!</td>\n </tr>\n <tr>\n <th>4</th>\n <td>2020-02-09</td>\n <td>Tom</td>\n <td>4.0</td>\n <td></td>\n </tr>\n <tr>\n <th>5</th>\n <td>2020-03-12</td>\n <td>Emma</td>\n <td>3.5</td>\n <td>Not so bad. I will probably come back again</td>\n </tr>\n <tr>\n <th>6</th>\n <td>2020-03-19</td>\n <td>Lisa</td>\n <td>5.0</td>\n <td>Such a nice restaurant!</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " date name rate review\n0 2020-01-03 peter 4.5 It is a really nice restaurant! I would recomm...\n1 2020-03-04 Max 3.5 Good price. Not really satisfied with the serv...\n2 2020-04-05 Ella 4.0 Like it. Staff there was quite nice. Recommend.\n3 2020-02-01 Maria 4.5 Looooove it!!!!! Really good! Highly recommend!\n4 2020-02-09 Tom 4.0 \n5 2020-03-12 Emma 3.5 Not so bad. I will probably come back again\n6 2020-03-19 Lisa 5.0 Such a nice restaurant!"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### pandas.Series.str.extract( pat, expand, etc ...)\n\nExtract capture groups in the regex pat as columns in a DataFrame.\n\nParameters:\n\n* pat: str, regular expression pattern with capturing groups\n* expand: bool, default True. If True, return DataFrame with one column per capture group. If False, return a Series/Index if there is one capture group or DataFrame if there are multiple capture groups.\n\n#### example :"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df.review.str.extract('(good)',expand=True)",
"execution_count": 17,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 17,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>0</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>NaN</td>\n </tr>\n <tr>\n <th>1</th>\n <td>NaN</td>\n </tr>\n <tr>\n <th>2</th>\n <td>NaN</td>\n </tr>\n <tr>\n <th>3</th>\n <td>good</td>\n </tr>\n <tr>\n <th>4</th>\n <td>NaN</td>\n </tr>\n <tr>\n <th>5</th>\n <td>NaN</td>\n </tr>\n <tr>\n <th>6</th>\n <td>NaN</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " 0\n0 NaN\n1 NaN\n2 NaN\n3 good\n4 NaN\n5 NaN\n6 NaN"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df.review.str.extract(r'(good|nice)',expand=True)",
"execution_count": 18,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 18,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>0</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>nice</td>\n </tr>\n <tr>\n <th>1</th>\n <td>NaN</td>\n </tr>\n <tr>\n <th>2</th>\n <td>nice</td>\n </tr>\n <tr>\n <th>3</th>\n <td>good</td>\n </tr>\n <tr>\n <th>4</th>\n <td>NaN</td>\n </tr>\n <tr>\n <th>5</th>\n <td>NaN</td>\n </tr>\n <tr>\n <th>6</th>\n <td>nice</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " 0\n0 nice\n1 NaN\n2 nice\n3 good\n4 NaN\n5 NaN\n6 nice"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df.review.str.extract(r'(?P<positive> good|nice)',expand=False)",
"execution_count": 19,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 19,
"data": {
"text/plain": "0 nice\n1 NaN\n2 nice\n3 good\n4 NaN\n5 NaN\n6 nice\nName: positive, dtype: object"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### pandas.Series.str.warp(width, break_long_words, etc...)\n\nWrap strings in Series/Index at specified line width.\n\nParameters:\n* width: int, maximum line width \n* break_long_words: bool, optional. default: True\n * If True, then words longer than width will be broken in order to ensure that no lines are longer than width. \n * If it is false, long words will not be broken, and some lines may be longer than width."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "print('abc\\ndfg')",
"execution_count": 20,
"outputs": [
{
"output_type": "stream",
"text": "abc\ndfg\n",
"name": "stdout"
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df.review.str.wrap(10)",
"execution_count": 21,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 21,
"data": {
"text/plain": "0 It is a\\nreally\\nnice resta\\nurant! I\\nwould\\n...\n1 Good\\nprice. Not\\nreally\\nsatisfied\\nwith the\\...\n2 Like it.\\nStaff\\nthere was\\nquite\\nnice.\\nReco...\n3 Looooove\\nit!!!!!\\nReally\\ngood!\\nHighly\\nreco...\n4 \n5 Not so\\nbad. I\\nwill\\nprobably\\ncome back\\nagain\n6 Such a\\nnice resta\\nurant!\nName: review, dtype: object"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "print (df.review.str.wrap(10)[0])",
"execution_count": 22,
"outputs": [
{
"output_type": "stream",
"text": "It is a\nreally\nnice resta\nurant! I\nwould\nrecommend\nit.\n",
"name": "stdout"
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df.review.str.wrap(10,\n break_long_words=False)",
"execution_count": 23,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 23,
"data": {
"text/plain": "0 It is a\\nreally\\nnice\\nrestaurant!\\nI would\\nr...\n1 Good\\nprice. Not\\nreally\\nsatisfied\\nwith the\\...\n2 Like it.\\nStaff\\nthere was\\nquite\\nnice.\\nReco...\n3 Looooove\\nit!!!!!\\nReally\\ngood!\\nHighly\\nreco...\n4 \n5 Not so\\nbad. I\\nwill\\nprobably\\ncome back\\nagain\n6 Such a\\nnice\\nrestaurant!\nName: review, dtype: object"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### pandas.Series.str.partition(sep,expand ...)\n\nThis method splits the string at the first occurrence of sep, and returns 3 elements containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return 3 elements containing the string itself, followed by two empty strings.\n\nParameters: \n* sep: str, default whitespace indicating the String to split on.\n* expand: bool, default True\n * If True, return DataFrame/MultiIndex expanding dimensionality. \n * If False, return Series/Index.\n\nReturns: \n* DataFrame/MultiIndex or Series/Index of objects\n\n#### example :"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df.review.str.partition()",
"execution_count": 24,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 24,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>0</th>\n <th>1</th>\n <th>2</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>It</td>\n <td></td>\n <td>is a really nice restaurant! I would recommend...</td>\n </tr>\n <tr>\n <th>1</th>\n <td>Good</td>\n <td></td>\n <td>price. Not really satisfied with the service. ...</td>\n </tr>\n <tr>\n <th>2</th>\n <td>Like</td>\n <td></td>\n <td>it. Staff there was quite nice. Recommend.</td>\n </tr>\n <tr>\n <th>3</th>\n <td>Looooove</td>\n <td></td>\n <td>it!!!!! Really good! Highly recommend!</td>\n </tr>\n <tr>\n <th>4</th>\n <td></td>\n <td></td>\n <td></td>\n </tr>\n <tr>\n <th>5</th>\n <td>Not</td>\n <td></td>\n <td>so bad. I will probably come back again</td>\n </tr>\n <tr>\n <th>6</th>\n <td>Such</td>\n <td></td>\n <td>a nice restaurant!</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " 0 1 2\n0 It is a really nice restaurant! I would recommend...\n1 Good price. Not really satisfied with the service. ...\n2 Like it. Staff there was quite nice. Recommend.\n3 Looooove it!!!!! Really good! Highly recommend!\n4 \n5 Not so bad. I will probably come back again\n6 Such a nice restaurant!"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df.review.str.partition('!')",
"execution_count": 25,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 25,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>0</th>\n <th>1</th>\n <th>2</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>It is a really nice restaurant</td>\n <td>!</td>\n <td>I would recommend it.</td>\n </tr>\n <tr>\n <th>1</th>\n <td>Good price. Not really satisfied with the serv...</td>\n <td></td>\n <td></td>\n </tr>\n <tr>\n <th>2</th>\n <td>Like it. Staff there was quite nice. Recommend.</td>\n <td></td>\n <td></td>\n </tr>\n <tr>\n <th>3</th>\n <td>Looooove it</td>\n <td>!</td>\n <td>!!!! Really good! Highly recommend!</td>\n </tr>\n <tr>\n <th>4</th>\n <td></td>\n <td></td>\n <td></td>\n </tr>\n <tr>\n <th>5</th>\n <td>Not so bad. I will probably come back again</td>\n <td></td>\n <td></td>\n </tr>\n <tr>\n <th>6</th>\n <td>Such a nice restaurant</td>\n <td>!</td>\n <td></td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " 0 1 \\\n0 It is a really nice restaurant ! \n1 Good price. Not really satisfied with the serv... \n2 Like it. Staff there was quite nice. Recommend. \n3 Looooove it ! \n4 \n5 Not so bad. I will probably come back again \n6 Such a nice restaurant ! \n\n 2 \n0 I would recommend it. \n1 \n2 \n3 !!!! Really good! Highly recommend! \n4 \n5 \n6 "
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### pandas.Series.str.swapcase()\n\nConvert strings in the Series/Index to be swapcased. Uppcase the letters in lowcase, lowcase the letters in uppcase.\n\n#### example"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df.review.str.swapcase()",
"execution_count": 26,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 26,
"data": {
"text/plain": "0 iT IS A REALLY NICE RESTAURANT! i WOULD RECOMM...\n1 gOOD PRICE. nOT REALLY SATISFIED WITH THE SERV...\n2 lIKE IT. sTAFF THERE WAS QUITE NICE. rECOMMEND.\n3 lOOOOOVE IT!!!!! rEALLY GOOD! hIGHLY RECOMMEND!\n4 \n5 nOT SO BAD. i WILL PROBABLY COME BACK AGAIN\n6 sUCH A NICE RESTAURANT!\nName: review, dtype: object"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "#### Other similar functions "
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df.review.str.lower()",
"execution_count": 27,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 27,
"data": {
"text/plain": "0 it is a really nice restaurant! i would recomm...\n1 good price. not really satisfied with the serv...\n2 like it. staff there was quite nice. recommend.\n3 looooove it!!!!! really good! highly recommend!\n4 \n5 not so bad. i will probably come back again\n6 such a nice restaurant!\nName: review, dtype: object"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df.review.str.upper()",
"execution_count": 28,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 28,
"data": {
"text/plain": "0 IT IS A REALLY NICE RESTAURANT! I WOULD RECOMM...\n1 GOOD PRICE. NOT REALLY SATISFIED WITH THE SERV...\n2 LIKE IT. STAFF THERE WAS QUITE NICE. RECOMMEND.\n3 LOOOOOVE IT!!!!! REALLY GOOD! HIGHLY RECOMMEND!\n4 \n5 NOT SO BAD. I WILL PROBABLY COME BACK AGAIN\n6 SUCH A NICE RESTAURANT!\nName: review, dtype: object"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df.review.str.title()",
"execution_count": 29,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 29,
"data": {
"text/plain": "0 It Is A Really Nice Restaurant! I Would Recomm...\n1 Good Price. Not Really Satisfied With The Serv...\n2 Like It. Staff There Was Quite Nice. Recommend.\n3 Looooove It!!!!! Really Good! Highly Recommend!\n4 \n5 Not So Bad. I Will Probably Come Back Again\n6 Such A Nice Restaurant!\nName: review, dtype: object"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### pandas.Series.str.capitalize()\n\nConvert strings in the Series/Index to be capitalized. Uppercase its first letter, and leave the rest of the string as-is.\n\n#### example :"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df.review.str.capitalize()",
"execution_count": 30,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 30,
"data": {
"text/plain": "0 It is a really nice restaurant! i would recomm...\n1 Good price. not really satisfied with the serv...\n2 Like it. staff there was quite nice. recommend.\n3 Looooove it!!!!! really good! highly recommend!\n4 \n5 Not so bad. i will probably come back again\n6 Such a nice restaurant!\nName: review, dtype: object"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### pandas.Series.str.rfind()\n\nReturn highest indexes in each strings in the Series/Index.Each of returned indexes corresponds to the position where the substring is fully contained between [start:end]. Return -1 on failure.\n\nParameters: \n* sub: str. Substring being searched.\n* start: int. Left edge index.\n* end: int. Right edge index\n\nReturns: \n* Series or Index of int.\n\n#### example :"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df.review",
"execution_count": 31,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 31,
"data": {
"text/plain": "0 It is a really nice restaurant! I would recomm...\n1 Good price. Not really satisfied with the serv...\n2 Like it. Staff there was quite nice. Recommend.\n3 Looooove it!!!!! Really good! Highly recommend!\n4 \n5 Not so bad. I will probably come back again\n6 Such a nice restaurant!\nName: review, dtype: object"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df.review.str.rfind('good')",
"execution_count": 32,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 32,
"data": {
"text/plain": "0 -1\n1 -1\n2 -1\n3 24\n4 -1\n5 -1\n6 -1\nName: review, dtype: int64"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df.review[3][24]",
"execution_count": 33,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 33,
"data": {
"text/plain": "'g'"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## End\n\nThey are only a means to the end of learning how to do good data analysis "
}
],
"metadata": {
"kernelspec": {
"name": "python3",
"display_name": "Python 3",
"language": "python"
},
"gist": {
"id": "",
"data": {
"description": "Pandas - Working with text data.ipynb",
"public": false
}
},
"language_info": {
"file_extension": ".py",
"nbconvert_exporter": "python",
"version": "3.5.4",
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"name": "python",
"mimetype": "text/x-python",
"pygments_lexer": "ipython3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment