BrendaHali/pandas_cheat_sheet.ipynb

## pandas_cheat_sheet.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Pandas Cheat Sheet\n",
    "Collected by Brenda Hali"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Installation\n",
    "Pandas is built on top of Numpy, for better use we need to have NumPy installed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install numpy\n",
    "!pip install pandas #Installs the library"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Import"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Data Structure \n",
    "- Pandas supports up to two-dimentions DataFrame\n",
    "- 1D objects are called Series. \n",
    "- 2D objects are called DataFrame. \n",
    "- The structure is Rows and Columns. \n",
    "\n",
    "#### The basics\n",
    "Pandas documentation https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html\n",
    "\n",
    "#### Reading data from a file\n",
    "It supports: cvs, sql, json, html, etc. \n",
    "- Link of full imports options: \n",
    "https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.read_csv('file.csv') #regular import\n",
    "df = pd.read_csv('file.csv', index_col=0) #takes the column 0 as the index\n",
    "df = pd.read_html(url) #needs beautifulsoup for manipulation\n",
    "#options but not unique\n",
    "df = pd.read_json()\n",
    "df = pd.read_sql()\n",
    "df = pd.read_excel()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Saving dataframe as a file\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.to_csv('file_name.csv',index=False) #This will save the file as dataframe without an index\n",
    "#options\n",
    "df.to_json()\n",
    "df.to_sql()\n",
    "pd.to_excel()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Data type change"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.to_datetime\n",
    "pd.to_timedelta"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Atributes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.columns #Returns the columns \n",
    "df.dtypes #Returns the datatypes\n",
    "df.index #Retuns the idex\n",
    "df.shape #Returnsthe shape\n",
    "df.T #Returns the dataframe inverted \n",
    "df.values #Returns the values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Join, Merge and Concat\n",
    "Link to documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html\n",
    "Common adjustments: \n",
    "\n",
    "- `ignore_index = True`-  When the index is not relevant for the join\n",
    "- `axis= 0`: adds up rows | `axis= 1` Adds up the columns\n",
    "- `keys = ['a', 'b', 'c']`-  Adds up the DataFrame on certain keys\n",
    "- `left` join - Use keys from left frame only\n",
    "- `right` join - Use keys from right frame only\n",
    "- `outer` - Use union of keys from both frames\n",
    "- `inner` - Use intersection of keys from both frames\n",
    "\n",
    "This will return the index and column/row content. In some methods, if we want to modify the dataframe `inplace=True` needs to be specified"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_1.join(df_2) #Joins the dataframes based in Index - They must have the same index\n",
    "\n",
    "pd.concat([df_1, df_2], axis=1, join='outer', ignore_index=False, keys=None,\n",
    "          levels=None, names=None, verify_integrity=False, copy=True) #Values can be changed as needed\n",
    "\n",
    "df_1.pd.append(df_2, sort = True, inplace = True) #This adds dataframes with the same column structure and names\n",
    "\n",
    "dataframes = {'a': df_1, 'b': df_2, 'c': df_3}\n",
    "new_df  = pd.concat(dataframes) #Concats the dataframes indicated in the dataframes dictionary and create a new column indicating the origin of the data\n",
    "\n",
    "pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None,\n",
    "         left_index=False, right_index=False, sort=True,\n",
    "         suffixes=('_x', '_y'), copy=True, indicator=False,\n",
    "         validate=None)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Columns addition and creation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df['sum_col1_col2'] = df['col_1'] + df['col_2']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Inspecting and Visualizing data without change in dataframe\n",
    "Use this if you want to see the values of certain columns or rows. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.head() #Returns the first 5 values\n",
    "\n",
    "df.tail() #Returns the last 5 values\n",
    "\n",
    "df['column_name'] #Returns all the data in the column\n",
    "df.column_name #Returns all the data in the column\n",
    "df['column_name', 'second_column'] #Returns columns as a new DataFrame\n",
    "df[7:9] # Displays the values of rows 7 to 9\n",
    "df.value_counts(dropna=False)  #Retuns unique values and counts\n",
    "df.sort_index(axis=0, ascending=False) #Returns dataframe sorted by index\n",
    "df.apply(pd.Series.value_counts) #Returns values and counts for all columns\n",
    "df.sort_values(by='column_name') #Returns dataframe sorted by the column selected\n",
    "df.groupby('column_name').mean() #Returns dataframe grouped by column name and the mean\n",
    "df.pivot_table\n",
    "df.iloc[0] #Selection by position\n",
    "df.loc[`index_one`] # Selection by index\n",
    "df.iloc[0,:]  #Returns First row\n",
    "df.iloc[0,0] #Returns element of first column\n",
    "\n",
    "#Simple examples that can be adapted as needed\n",
    "\n",
    "df[df['is_muy_value'] == 1][['what_im_looking_for']]\n",
    "\n",
    "df[df['column_1'] < 10].groupby('column_2').mean()[['what_im_looking_for']]\n",
    "df[df['column_1'] == 0].sort_values(by='column_2', ascending=False).head()\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Replacing and renaming"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.columns = ['column_1', 'column_2'] #Renames columns\n",
    "df.rename(columns={'old_name': 'new_ name'}) # Selective renaming\n",
    "\n",
    "df.replace(1,'one') #Replace all values equal to 1 with ‘one’\n",
    "df.replace([1,3],['one','three']) #Replace all 1 with ‘one’ and 3 with ‘three’\n",
    "\n",
    "\n",
    "df.set_index('column_1') #Changes the index\n",
    "df.astype(int) #Converts the datatype of the series to integer - It can be changed to any datatype\n",
    "\n",
    "df['column_1'].astype(int) #changes the datatype"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Null Values\n",
    "- `isna` = `isnull`\n",
    "- `notna` = `notnull`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.fillna(value = 'my_selected_value', inplace = True) #Fills all NANs with the value we selected and make the change permanent\n",
    "df.fillna(x) # Replace all null values with x\n",
    "df.notna().sum() #Sums of nas per column\n",
    "df.interpolate()\n",
    "df.isna().sum() #Returns True/false to NAs\n",
    "df.isnull()\n",
    "df.dropna(inplace = True) # Drops null values permanently\n",
    "df.isnull().sum() #Prints null values agregated by column\n",
    "df.isnull().sum()[df.isnull().sum() !=0].sort_values().plot(kind='barh'); #Plots the null values\n",
    "\n",
    "#Advance replacing:\n",
    "df.fillna(df.mean()) #Replace all null values with the mean \n",
    "\n",
    "#Other way to overwrite the dataframe without the NAs in specific column\n",
    "df.column.fillna(value='no_info', inplace=True) \n",
    "df= df.loc[df['column'] != 'no_info']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Dropping"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.drop(['column_1', 'column_2'], axis=1, inplace = True) #drops specific columns\n",
    "df.drop_duplicates(inplace=True) #drops duplicates permanently\n",
    "df.drop('row_1', axis=0, inplace = True) #drops the row permanently"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Agregation Methods, stadistical methods and summaries\n",
    "Can be used in way `df.sum` and `df.sum()` way"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.count() # Returns the number of non-null values in each DataFrame column\n",
    "df.describe() # Summary statistics for numerical columns\n",
    "df.max() #Returns the highest value in each column\n",
    "df.mean() #Returns the mean of all columns\n",
    "df.median() # Returns the median of each column\n",
    "df.min() # Returns the lowest value in each column\n",
    "df.mode() #Returns mode \n",
    "df.std() # Returns the standard deviation of each column\n",
    "df.var() #retuns varianza\n",
    "df.abs() #Returns absolute values\n",
    "df.corr() # Returns the correlation between columns in a DataFrame\n",
    "df.round() #rounds the number"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Other to explore"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.clip()\n",
    "df.nunique() \n",
    "df.idxmax()\n",
    "df.idxmin()\n",
    "df.cov()\n",
    "df.cummax()\n",
    "df.cummin()\n",
    "df.cumprod()\n",
    "df.cumsum()\n",
    "df.diff()\n",
    "df.nlargest()\n",
    "df.nsmallest()\n",
    "df.pct_change()\n",
    "df.prod()\n",
    "df.quantile()\n",
    "df.rank()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Simple Functions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Dummy Variables\n",
    "https://socialresearchmethods.net/kb/dummyvar.php"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.get_dummies(df, columns=['my_column'], drop_first=True) #I dummy variables we drop the first column to not to make it redundant"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Does this column have the value I'm looking for?\n",
    "def is_the_value_im_looking(i):\n",
    "    val = i.split()\n",
    "    if 'value' in str(val):\n",
    "        return 1\n",
    "    else:\n",
    "        return 0\n",
    "    \n",
    "#Create a column called as the value I'm looking for and adds 0 or 1\n",
    "df['value'] = df['col_1'].apply(is_the_value_im_looking)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Extracts the title from everyone's name and create dummy columns, made with list comprehension.\n",
    "#This can be adapted as needed\n",
    "\n",
    "df['Title'] = [each.split(',')[1].split('.')[0].strip() for each in df['Name']]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Rate per column. \n",
    "#this can be adapted as needed. \n",
    "\n",
    "for i in ['column_1', 'column_2', 'column_3']:\n",
    "    print(i, ':')\n",
    "    print(df[df[i] == 1][['the_value_im_lookingfor']].mean())\n",
    "    print()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Plotting\n",
    "Simple plotting examples"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.groupby('column_1').mean()[['value']].plot(kind='barh')\n",
    "plt.title(\"plot title\");"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.groupby(['column_1', 'colum_2']).mean()[['value']].plot(kind='barh');"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Pandas Cheat Sheet\n",
	"Collected by Brenda Hali"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"#### Installation\n",
	"Pandas is built on top of Numpy, for better use we need to have NumPy installed."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"!pip install numpy\n",
	"!pip install pandas #Installs the library"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"#### Import"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"import numpy as np\n",
	"import pandas as pd"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"#### Data Structure \n",
	"- Pandas supports up to two-dimentions DataFrame\n",
	"- 1D objects are called Series. \n",
	"- 2D objects are called DataFrame. \n",
	"- The structure is Rows and Columns. \n",
	"\n",
	"#### The basics\n",
	"Pandas documentation https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html\n",
	"\n",
	"#### Reading data from a file\n",
	"It supports: cvs, sql, json, html, etc. \n",
	"- Link of full imports options: \n",
	"https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"df = pd.read_csv('file.csv') #regular import\n",
	"df = pd.read_csv('file.csv', index_col=0) #takes the column 0 as the index\n",
	"df = pd.read_html(url) #needs beautifulsoup for manipulation\n",
	"#options but not unique\n",
	"df = pd.read_json()\n",
	"df = pd.read_sql()\n",
	"df = pd.read_excel()"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"#### Saving dataframe as a file\n"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"df.to_csv('file_name.csv',index=False) #This will save the file as dataframe without an index\n",
	"#options\n",
	"df.to_json()\n",
	"df.to_sql()\n",
	"pd.to_excel()"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"#### Data type change"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"pd.to_datetime\n",
	"pd.to_timedelta"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"#### Atributes"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"df.columns #Returns the columns \n",
	"df.dtypes #Returns the datatypes\n",
	"df.index #Retuns the idex\n",
	"df.shape #Returnsthe shape\n",
	"df.T #Returns the dataframe inverted \n",
	"df.values #Returns the values"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"#### Join, Merge and Concat\n",
	"Link to documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html\n",
	"Common adjustments: \n",
	"\n",
	"- `ignore_index = True`- When the index is not relevant for the join\n",
	"- `axis= 0`: adds up rows \| `axis= 1` Adds up the columns\n",
	"- `keys = ['a', 'b', 'c']`- Adds up the DataFrame on certain keys\n",
	"- `left` join - Use keys from left frame only\n",
	"- `right` join - Use keys from right frame only\n",
	"- `outer` - Use union of keys from both frames\n",
	"- `inner` - Use intersection of keys from both frames\n",
	"\n",
	"This will return the index and column/row content. In some methods, if we want to modify the dataframe `inplace=True` needs to be specified"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"df_1.join(df_2) #Joins the dataframes based in Index - They must have the same index\n",
	"\n",
	"pd.concat([df_1, df_2], axis=1, join='outer', ignore_index=False, keys=None,\n",
	" levels=None, names=None, verify_integrity=False, copy=True) #Values can be changed as needed\n",
	"\n",
	"df_1.pd.append(df_2, sort = True, inplace = True) #This adds dataframes with the same column structure and names\n",
	"\n",
	"dataframes = {'a': df_1, 'b': df_2, 'c': df_3}\n",
	"new_df = pd.concat(dataframes) #Concats the dataframes indicated in the dataframes dictionary and create a new column indicating the origin of the data\n",
	"\n",
	"pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None,\n",
	" left_index=False, right_index=False, sort=True,\n",
	" suffixes=('_x', '_y'), copy=True, indicator=False,\n",
	" validate=None)\n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"#### Columns addition and creation"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"df['sum_col1_col2'] = df['col_1'] + df['col_2']"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"#### Inspecting and Visualizing data without change in dataframe\n",
	"Use this if you want to see the values of certain columns or rows. "
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"df.head() #Returns the first 5 values\n",
	"\n",
	"df.tail() #Returns the last 5 values\n",
	"\n",
	"df['column_name'] #Returns all the data in the column\n",
	"df.column_name #Returns all the data in the column\n",
	"df['column_name', 'second_column'] #Returns columns as a new DataFrame\n",
	"df[7:9] # Displays the values of rows 7 to 9\n",
	"df.value_counts(dropna=False) #Retuns unique values and counts\n",
	"df.sort_index(axis=0, ascending=False) #Returns dataframe sorted by index\n",
	"df.apply(pd.Series.value_counts) #Returns values and counts for all columns\n",
	"df.sort_values(by='column_name') #Returns dataframe sorted by the column selected\n",
	"df.groupby('column_name').mean() #Returns dataframe grouped by column name and the mean\n",
	"df.pivot_table\n",
	"df.iloc[0] #Selection by position\n",
	"df.loc[`index_one`] # Selection by index\n",
	"df.iloc[0,:] #Returns First row\n",
	"df.iloc[0,0] #Returns element of first column\n",
	"\n",
	"#Simple examples that can be adapted as needed\n",
	"\n",
	"df[df['is_muy_value'] == 1][['what_im_looking_for']]\n",
	"\n",
	"df[df['column_1'] < 10].groupby('column_2').mean()[['what_im_looking_for']]\n",
	"df[df['column_1'] == 0].sort_values(by='column_2', ascending=False).head()\n",
	"\n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"#### Replacing and renaming"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"df.columns = ['column_1', 'column_2'] #Renames columns\n",
	"df.rename(columns={'old_name': 'new_ name'}) # Selective renaming\n",
	"\n",
	"df.replace(1,'one') #Replace all values equal to 1 with ‘one’\n",
	"df.replace([1,3],['one','three']) #Replace all 1 with ‘one’ and 3 with ‘three’\n",
	"\n",
	"\n",
	"df.set_index('column_1') #Changes the index\n",
	"df.astype(int) #Converts the datatype of the series to integer - It can be changed to any datatype\n",
	"\n",
	"df['column_1'].astype(int) #changes the datatype"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"#### Null Values\n",
	"- `isna` = `isnull`\n",
	"- `notna` = `notnull`"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"df.fillna(value = 'my_selected_value', inplace = True) #Fills all NANs with the value we selected and make the change permanent\n",
	"df.fillna(x) # Replace all null values with x\n",
	"df.notna().sum() #Sums of nas per column\n",
	"df.interpolate()\n",
	"df.isna().sum() #Returns True/false to NAs\n",
	"df.isnull()\n",
	"df.dropna(inplace = True) # Drops null values permanently\n",
	"df.isnull().sum() #Prints null values agregated by column\n",
	"df.isnull().sum()[df.isnull().sum() !=0].sort_values().plot(kind='barh'); #Plots the null values\n",
	"\n",
	"#Advance replacing:\n",
	"df.fillna(df.mean()) #Replace all null values with the mean \n",
	"\n",
	"#Other way to overwrite the dataframe without the NAs in specific column\n",
	"df.column.fillna(value='no_info', inplace=True) \n",
	"df= df.loc[df['column'] != 'no_info']"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"#### Dropping"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"df.drop(['column_1', 'column_2'], axis=1, inplace = True) #drops specific columns\n",
	"df.drop_duplicates(inplace=True) #drops duplicates permanently\n",
	"df.drop('row_1', axis=0, inplace = True) #drops the row permanently"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"#### Agregation Methods, stadistical methods and summaries\n",
	"Can be used in way `df.sum` and `df.sum()` way"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"df.count() # Returns the number of non-null values in each DataFrame column\n",
	"df.describe() # Summary statistics for numerical columns\n",
	"df.max() #Returns the highest value in each column\n",
	"df.mean() #Returns the mean of all columns\n",
	"df.median() # Returns the median of each column\n",
	"df.min() # Returns the lowest value in each column\n",
	"df.mode() #Returns mode \n",
	"df.std() # Returns the standard deviation of each column\n",
	"df.var() #retuns varianza\n",
	"df.abs() #Returns absolute values\n",
	"df.corr() # Returns the correlation between columns in a DataFrame\n",
	"df.round() #rounds the number"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Other to explore"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"df.clip()\n",
	"df.nunique() \n",
	"df.idxmax()\n",
	"df.idxmin()\n",
	"df.cov()\n",
	"df.cummax()\n",
	"df.cummin()\n",
	"df.cumprod()\n",
	"df.cumsum()\n",
	"df.diff()\n",
	"df.nlargest()\n",
	"df.nsmallest()\n",
	"df.pct_change()\n",
	"df.prod()\n",
	"df.quantile()\n",
	"df.rank()"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Simple Functions"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"#### Dummy Variables\n",
	"https://socialresearchmethods.net/kb/dummyvar.php"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"df = pd.get_dummies(df, columns=['my_column'], drop_first=True) #I dummy variables we drop the first column to not to make it redundant"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"#Does this column have the value I'm looking for?\n",
	"def is_the_value_im_looking(i):\n",
	" val = i.split()\n",
	" if 'value' in str(val):\n",
	" return 1\n",
	" else:\n",
	" return 0\n",
	" \n",
	"#Create a column called as the value I'm looking for and adds 0 or 1\n",
	"df['value'] = df['col_1'].apply(is_the_value_im_looking)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"#Extracts the title from everyone's name and create dummy columns, made with list comprehension.\n",
	"#This can be adapted as needed\n",
	"\n",
	"df['Title'] = [each.split(',')[1].split('.')[0].strip() for each in df['Name']]"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"#Rate per column. \n",
	"#this can be adapted as needed. \n",
	"\n",
	"for i in ['column_1', 'column_2', 'column_3']:\n",
	" print(i, ':')\n",
	" print(df[df[i] == 1][['the_value_im_lookingfor']].mean())\n",
	" print()"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Plotting\n",
	"Simple plotting examples"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"df.groupby('column_1').mean()[['value']].plot(kind='barh')\n",
	"plt.title(\"plot title\");"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"df.groupby(['column_1', 'colum_2']).mean()[['value']].plot(kind='barh');"
	]
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.7.3"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 4
	}