Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save magicmathmandarin/7c7c76a271c0a7ed1a268c3cb8d60204 to your computer and use it in GitHub Desktop.
Save magicmathmandarin/7c7c76a271c0a7ed1a268c3cb8d60204 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Women, men, and money\n",
"<span style=\"color: #48c2ff; font-family: Arial; font-size: 3em;\">Women, men and money</span>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![waiter_waitress](https://user-images.githubusercontent.com/22870395/53466314-6edec080-3a1f-11e9-8d64-091db778aaa8.jpg)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## From a dataset on restaurant tip, we will investigate how much money women and men relatively make, and *why*?\n",
"\n",
"## Most people don't tip more or less because of server is a man or woman, but we do tip according to tip amount (for equal services).\n",
"\n",
"\n",
"### I will tell you base on data analysis the real reason women make less is mostly because: \n",
"# 1. women work mostly lunch shifts\n",
"# 2. lunch bills are smaller than dinner ones. \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![momdaughtercooking](https://user-images.githubusercontent.com/22870395/53700405-08132b80-3dc0-11e9-9f4c-849440357c9d.PNG)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# - The difficult question is: \n",
"\n",
"## Should women work more dinner shifts and leave the important job of taking care of children to ...?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# TOC:\n",
"* [Tips data](#first-bullet)\n",
"* [Men make more tip money](#second-bullet)\n",
"* [So why](#third-bullet)\n",
"* [Here is why](#fourth-bullet)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## import some libraries\n",
"and setting some parameters for plotting"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn.metrics import r2_score\n",
"from sklearn.metrics import mean_squared_error\n",
"from sklearn import linear_model\n",
"from sklearn.linear_model import LinearRegression\n",
"import matplotlib.pyplot as plt\n",
"plt.style.use('ggplot')\n",
"plt.rcParams['axes.labelsize'] = 14\n",
"plt.rcParams['axes.titlesize'] = 24\n",
"plt.rcParams['xtick.labelsize'] = 12\n",
"plt.rcParams['ytick.labelsize'] = 12\n",
"plt.rcParams[\"figure.figsize\"] = (15,4)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import seaborn as sns; sns.set(color_codes=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pd.options.display.float_format = '{:20,.1f}'.format"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Tips data <a class=\"anchor\" id=\"first-bullet\"></a>\n",
"\n",
"This data comes with the seaborn library. So you don't have to download any data if you already have the seaborn library. \n",
"\n",
"It has 246 rows. You can see the data at https://github.com/mwaskom/seaborn-data/blob/master/tips.csv\n",
"\n",
"So it is a very small sample. No history or background was provided. \n",
"\n",
"Even so, we can see some patterns that are observed elsewhere too. \n",
"\n",
"We start with small data to answer a small question: how much money do women and men make relatively, and why. \n",
"\n",
"Then we will move on to bigger data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tips = sns.load_dataset(\"tips\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"tips.head(5)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tips.tip.hist(bins=30,color='blue')\n",
"plt.ylabel(\"count\")\n",
"plt.xlabel('tips')\n",
"plt.title('herstogram of tips')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plt.hist(tips[tips[\"sex\"]=='Female'][\"tip\"].reset_index(drop=True), alpha=0.6, label=\"women\",bins=30, color='red')\n",
"plt.hist(tips[tips[\"sex\"]=='Male'][\"tip\"].reset_index(drop=True), alpha=0.5, label=\"men\", bins=30,color='steelblue')\n",
"plt.legend(loc=2)\n",
"plt.xlabel(\"tip\")\n",
"plt.ylabel(\"count\")\n",
"plt.title('herstogram of tips by gender')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span style=\"color: #ff4867; font-family: Arial; font-size: 3em;\">Men make more tip money </span> <a class=\"anchor\" id=\"second-bullet\"></a>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Hey, what's going on? Why women's tips mostly stops short of 6, while men's go as high as close to 10?\n",
"Look at the herstogram plot above, women's are in red, and men's are in grey. \n",
"\n",
"## Its seems that men on average make more tips"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tips.groupby('sex').tip.mean().plot(kind='bar',rot=0, title='Average tip by gender',color=['steelblue','red'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## But I don't tip differently because the wait person is a man or a woman\n",
"\n",
"## Neither do you.\n",
"\n",
"<span style=\"color: #ff4867; font-family: Arial; font-size: 3em;\">So why </span> <a class=\"anchor\" id=\"third-bullet\"></a>\n",
"\n",
"\n",
"## Let's see what the machine says first. \n",
"(Because machines don't lie and don't get angry)\n",
"\n",
"###### we will run a linear regression model to use everything in this data to predict the tip.\n",
"In other words, if you tell me:\n",
"1. if the wait person is a man or a woman\n",
"2. the bill for the food\n",
"3. size\n",
"4. smoker or non-smoker\n",
"5. day of the week\n",
"6. dinner or lunch (sorry, there is no pancake breakfast in this data)\n",
"\n",
"the model will predict how much the tip is"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tips_num = tips[['total_bill','tip','size']]\n",
"tips_cate = pd.get_dummies(tips[['sex','smoker', 'day', 'time']],drop_first=True)\n",
"tips_final = pd.concat([tips_num, tips_cate],axis=1)\n",
"\n",
"y = tips_final['tip']\n",
"x = tips_final.drop('tip', axis =1)\n",
"X_train, X_test, y_train, y_test = train_test_split(x, y, test_size = 0.25, random_state=1)\n",
"\n",
"regr = linear_model.LinearRegression()\n",
"\n",
"# Train the model using the training sets\n",
"regr.fit(X_train, y_train)\n",
"\n",
"# Make predictions using the testing set\n",
"y_pred = regr.predict(X_test)\n",
"\n",
"\n",
"# The mean squared error\n",
"print(\"Mean squared error: %.2f\"\n",
" % mean_squared_error(y_test, y_pred))\n",
"# Explained variance score: 1 is perfect prediction\n",
"print('R square: %.2f' % r2_score(y_test, y_pred))\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# The coefficients\n",
"print('Coefficients: \\n', regr.coef_.round(2))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(X_train.columns)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This model result says that \n",
"### tip =\n",
"###### 0.09 * total_bill + 0.17 * size + 0.09 *(if and only if it is woman) + 0.1 * (if and only if it is not smoker) \n",
"###### + 0.53 * (if and only if it is Friday) + 0.56 * (if and only if it is Satuary) + 0.6 * (if and only if it is Sunday)\n",
"###### - 0.62 * (if and only if it is dinner)\n",
"\n",
"##### The model result did not say if it is a woman then you have to subtract.\n",
"#### On the opposite, it says to add\n",
"## hmm..."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sns.regplot(x=y_test, y=y_pred, ci=False)\n",
"plt.title(\"actual vs predicted tip in test data\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"y_pred = regr.predict(X_train)\n",
"sns.regplot(x=y_train, y=y_pred, color='g')\n",
"#plt.scatter(y_train,y_pred, color='black')\n",
"plt.xlabel('actual tip in the training data')\n",
"plt.ylabel('predicted tip in the training data')\n",
"plt.title(\"actual vs predicted tip in training data\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span style=\"color: #ff4867; font-family: Arial; font-size: 3em;\">Here is why </span> <a class=\"anchor\" id=\"fourth-bullet\"></a>\n",
"\n",
"### While we don't tip because someone is a male or a female, the data shows that waiters get bigger total bills than waitresses"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sns.regplot(x='total_bill',y='tip', data=tips, color='orange')\n",
"plt.title(\"total bill vs tip\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Comparing dinner and lunch average bills"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tips.groupby('sex').total_bill.mean().plot(kind='bar',rot=0, title='the real reason why women get less tip: total bill!',color=['steelblue','red'])\n",
"plt.ylabel('average total bill')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Comparing dinner and lunch average bills\n",
"\n",
"### Dinner bills are larger. Dinner menu usually are more expensive. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tips.groupby('time').total_bill.mean().plot(kind='bar',rot=0, title='dinner bills are higher',color=['green','purple'])\n",
"plt.ylabel('average total bill')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Women work fewer dinner shifts than men\n",
"## And we know people spend more at dinner, and the bills are larger\n",
"\n",
"<span style=\"color: #ff4867; font-family: Arial; font-size: 3em;\">Why women work fewer dinners </span>\n",
"\n",
"### *Maybe they are not allowed to work dinner if they have less experience, and maybe they have to go home to look after their families in the evenings.*"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tips.groupby(['sex','time']).day.count()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fig, axs = plt.subplots(1,2,figsize=(8,4))\n",
"tips[tips.sex=='Female'].groupby('time').size().plot(kind='pie',ax=axs[0], title=\"women\")\n",
"\n",
"tips[tips.sex=='Male'].groupby('time').size().sort_values().plot(kind='pie', ax=axs[1],title=\"men\")\n",
"plt.ylabel('')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tips.groupby(['sex','time']).day.count().reset_index()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Scatterplots of total bills vs tip for men, women"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"g = sns.FacetGrid(tips, col=\"time\", hue=\"sex\", hue_order=[\"Female\", \"Male\"],palette=\"Set1\")\n",
"g = (g.map(plt.scatter, \"total_bill\", \"tip\", edgecolor=\"w\").add_legend())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Herstogram of total bills, for men, women, dinner and lunch"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"g = sns.FacetGrid(tips, col=\"time\", row=\"sex\", hue=\"sex\", hue_order=[\"Female\", \"Male\"],palette=\"Set1\")\n",
"\n",
"g = g.map(plt.hist, \"total_bill\")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span style=\"color: #48c2ff; font-family: Arial; font-size: 3em;\">Pairplot for data analysis </span>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span style=\"color: #ff4867; font-family: Arial; font-size: 3em;\">Lunch bill and tip: women vs men </span>"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"amnt= ['total_bill','tip']\n",
"df_amnt= pd.concat([tips[amnt],tips.sex, tips.time], axis=1)\n",
"sns.set(style=\"ticks\", color_codes=True)\n",
"sns.pairplot(df_amnt[df_amnt.time=='Lunch'], palette=\"Set1\",kind='reg',hue='sex', hue_order=[\"Female\", \"Male\"], plot_kws={'scatter_kws': {'alpha': 0.1}})\n",
"plt.title(\"Lunch\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span style=\"color: #ff4867; font-family: Arial; font-size: 3em;\">Dinner bill and tip: women vs men </span>"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"amnt= ['total_bill','tip']\n",
"df_amnt= pd.concat([tips[amnt],tips.sex, tips.time], axis=1)\n",
"sns.set(style=\"ticks\", color_codes=True)\n",
"sns.pairplot(df_amnt[df_amnt.time=='Dinner'], kind='reg',hue='sex',palette=\"Set1\",hue_order=[\"Female\", \"Male\"], plot_kws={'scatter_kws': {'alpha': 0.1}})\n",
"plt.title(\"Dinner\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Indeed, in every industry, including food services like in the restaurants, women have shouldered more share of the most difficult job in the world: parenting, and dedicated their time and energy to their families. \n",
"\n",
"## As a result, they have earned less money than men on average. \n",
"# Let's give our appreciation to women for their roles in taking care of families!\n",
"\n",
"### There is a profound reason why we say \n",
"<span style=\"color: #48c255; font-family: Arial; font-size: 3em;\">Mother nature</span>\n",
"![mothernature](https://user-images.githubusercontent.com/22870395/53511940-83f33800-3a8f-11e9-8eb0-57aeb29e27d6.jpg)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Should women work more dinner shifts and leave the important job of taking care of children to ...?\n",
"<span style=\"color: #48c2ff; font-family: Arial; font-size: 3em;\">Should women work more dinner shifts and leave the important job of taking care of children to ...?</span>\n",
"## That is a difficult question..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![family](https://user-images.githubusercontent.com/22870395/53700487-d8185800-3dc0-11e9-86a8-88fc89f6225a.PNG)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.1"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment