Skip to content

Instantly share code, notes, and snippets.

@lbourbon
Created May 4, 2017 14:02
Show Gist options
  • Save lbourbon/ae785610765eb0c0626bec638fd82ae6 to your computer and use it in GitHub Desktop.
Save lbourbon/ae785610765eb0c0626bec638fd82ae6 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## AGRUPAMENTO:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Em alguns momentos é importante agrupar os dados de nosso dataframe de acordo com determinada característica. <br>\n",
"Para isso, usamos a função <i>pd.groupby()</i>"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Time</th>\n",
" <th>Jogador</th>\n",
" <th>Gols</th>\n",
" <th>Casamentos</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Flamengo</td>\n",
" <td>Romário</td>\n",
" <td>95</td>\n",
" <td>8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Flamengo</td>\n",
" <td>Zico</td>\n",
" <td>105</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Vasco</td>\n",
" <td>Edmundo</td>\n",
" <td>24</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Vasco</td>\n",
" <td>Amaral</td>\n",
" <td>-1</td>\n",
" <td>-2</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Time Jogador Gols Casamentos\n",
"0 Flamengo Romário 95 8\n",
"1 Flamengo Zico 105 3\n",
"2 Vasco Edmundo 24 2\n",
"3 Vasco Amaral -1 -2"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d = {\"Time\": ['Flamengo','Flamengo','Vasco','Vasco'],\n",
" \"Jogador\": ['Romário', 'Zico' , 'Edmundo', 'Amaral'],\n",
" \"Gols\": [95, 105, 24, -1],\n",
" \"Casamentos\": [8, 3, 2, -2]}\n",
"\n",
"df = pd.DataFrame(d)\n",
"df = df[['Time', 'Jogador', 'Gols', 'Casamentos']] # para colocar as colunas na ordem desejada\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Gols</th>\n",
" <th>Casamentos</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Time</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Flamengo</th>\n",
" <td>200</td>\n",
" <td>11</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Vasco</th>\n",
" <td>23</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Gols Casamentos\n",
"Time \n",
"Flamengo 200 11\n",
"Vasco 23 0"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Caso queiramos agrupar nosso dataframe por time para saber a número total de gols ou a média de gols\n",
"# É interessante perceber que temos tb que passar um função, caso contrário temos apenas um objeto pandas\n",
"\n",
"df.groupby(by='Time').sum() # maneira mais pythonica!\n",
"\n",
"agrupado = df.groupby(by='Time') # outra forma de fazer a mesma coisa\n",
"agrupado.sum()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Time\n",
"Flamengo 100.0\n",
"Vasco 11.5\n",
"Name: Gols, dtype: float64"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Para a função ser aplicada apenas nas colunas de interesse:\n",
"\n",
"df.groupby('Time')['Gols'].mean()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th>Casamentos</th>\n",
" <th>Gols</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Time</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th rowspan=\"8\" valign=\"top\">Flamengo</th>\n",
" <th>count</th>\n",
" <td>2.000000</td>\n",
" <td>2.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>5.500000</td>\n",
" <td>100.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>3.535534</td>\n",
" <td>7.071068</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>3.000000</td>\n",
" <td>95.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>4.250000</td>\n",
" <td>97.500000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>5.500000</td>\n",
" <td>100.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>6.750000</td>\n",
" <td>102.500000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>8.000000</td>\n",
" <td>105.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"8\" valign=\"top\">Vasco</th>\n",
" <th>count</th>\n",
" <td>2.000000</td>\n",
" <td>2.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>0.000000</td>\n",
" <td>11.500000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>2.828427</td>\n",
" <td>17.677670</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>-2.000000</td>\n",
" <td>-1.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>-1.000000</td>\n",
" <td>5.250000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>0.000000</td>\n",
" <td>11.500000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>1.000000</td>\n",
" <td>17.750000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>2.000000</td>\n",
" <td>24.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Casamentos Gols\n",
"Time \n",
"Flamengo count 2.000000 2.000000\n",
" mean 5.500000 100.000000\n",
" std 3.535534 7.071068\n",
" min 3.000000 95.000000\n",
" 25% 4.250000 97.500000\n",
" 50% 5.500000 100.000000\n",
" 75% 6.750000 102.500000\n",
" max 8.000000 105.000000\n",
"Vasco count 2.000000 2.000000\n",
" mean 0.000000 11.500000\n",
" std 2.828427 17.677670\n",
" min -2.000000 -1.000000\n",
" 25% -1.000000 5.250000\n",
" 50% 0.000000 11.500000\n",
" 75% 1.000000 17.750000\n",
" max 2.000000 24.000000"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Perceba que quando aplicamos groupby no df, ele nos retorna outro df. Isso significa que podemos \n",
"# usar todas aquelas funções dos tutoriais anteriores\n",
"\n",
"df.groupby('Time').describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## DADOS AUSENTES:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Nessa seção vamos aprender como lidar com tabelas incompletas"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 418 entries, 0 to 417\n",
"Data columns (total 11 columns):\n",
"PassengerId 418 non-null int64\n",
"Pclass 418 non-null int64\n",
"Name 418 non-null object\n",
"Sex 418 non-null object\n",
"Age 332 non-null float64\n",
"SibSp 418 non-null int64\n",
"Parch 418 non-null int64\n",
"Ticket 418 non-null object\n",
"Fare 417 non-null float64\n",
"Cabin 91 non-null object\n",
"Embarked 418 non-null object\n",
"dtypes: float64(2), int64(4), object(5)\n",
"memory usage: 36.0+ KB\n"
]
}
],
"source": [
"# Usaremos os dados dos passageiros do Titanic, disponível em kaggle.com\n",
"\n",
"df = pd.read_csv('titanic.csv')\n",
"df.info()"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PassengerId</th>\n",
" <th>Pclass</th>\n",
" <th>Name</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Ticket</th>\n",
" <th>Fare</th>\n",
" <th>Cabin</th>\n",
" <th>Embarked</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>892</td>\n",
" <td>3</td>\n",
" <td>Kelly, Mr. James</td>\n",
" <td>male</td>\n",
" <td>34.5</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>330911</td>\n",
" <td>7.8292</td>\n",
" <td>NaN</td>\n",
" <td>Q</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>893</td>\n",
" <td>3</td>\n",
" <td>Wilkes, Mrs. James (Ellen Needs)</td>\n",
" <td>female</td>\n",
" <td>47.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>363272</td>\n",
" <td>7.0000</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>894</td>\n",
" <td>2</td>\n",
" <td>Myles, Mr. Thomas Francis</td>\n",
" <td>male</td>\n",
" <td>62.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>240276</td>\n",
" <td>9.6875</td>\n",
" <td>NaN</td>\n",
" <td>Q</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>895</td>\n",
" <td>3</td>\n",
" <td>Wirz, Mr. Albert</td>\n",
" <td>male</td>\n",
" <td>27.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>315154</td>\n",
" <td>8.6625</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>896</td>\n",
" <td>3</td>\n",
" <td>Hirvonen, Mrs. Alexander (Helga E Lindqvist)</td>\n",
" <td>female</td>\n",
" <td>22.0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>3101298</td>\n",
" <td>12.2875</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PassengerId Pclass Name Sex \\\n",
"0 892 3 Kelly, Mr. James male \n",
"1 893 3 Wilkes, Mrs. James (Ellen Needs) female \n",
"2 894 2 Myles, Mr. Thomas Francis male \n",
"3 895 3 Wirz, Mr. Albert male \n",
"4 896 3 Hirvonen, Mrs. Alexander (Helga E Lindqvist) female \n",
"\n",
" Age SibSp Parch Ticket Fare Cabin Embarked \n",
"0 34.5 0 0 330911 7.8292 NaN Q \n",
"1 47.0 1 0 363272 7.0000 NaN S \n",
"2 62.0 0 0 240276 9.6875 NaN Q \n",
"3 27.0 0 0 315154 8.6625 NaN S \n",
"4 22.0 1 1 3101298 12.2875 NaN S "
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Perceba que algumas células do nosso dataframe contém <b>NaN</b>, que significa <i>not a number</i>, o que é apenas uma forma\n",
"de dizer que a não se tem aquela informação, o dado está ausente.\n",
"\n",
"A função <i>isnull()</i> retorna um df de variáveis booleanas indicando se a informação está ausente"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PassengerId</th>\n",
" <th>Pclass</th>\n",
" <th>Name</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Ticket</th>\n",
" <th>Fare</th>\n",
" <th>Cabin</th>\n",
" <th>Embarked</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PassengerId Pclass Name Sex Age SibSp Parch Ticket Fare Cabin \\\n",
"0 False False False False False False False False False True \n",
"1 False False False False False False False False False True \n",
"2 False False False False False False False False False True \n",
"3 False False False False False False False False False True \n",
"4 False False False False False False False False False True \n",
"\n",
" Embarked \n",
"0 False \n",
"1 False \n",
"2 False \n",
"3 False \n",
"4 False "
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.isnull().head()"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"PassengerId 0\n",
"Pclass 0\n",
"Name 0\n",
"Sex 0\n",
"Age 86\n",
"SibSp 0\n",
"Parch 0\n",
"Ticket 0\n",
"Fare 1\n",
"Cabin 327\n",
"Embarked 0\n",
"dtype: int64"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Para saber a quantidade de células com Nan, podemos fazer:\n",
"\n",
"df.isnull().sum()"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x1ddbbfc6390>"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAj8AAAEuCAYAAABhznOUAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFKVJREFUeJzt3Xu0bVddH/BveIXYJEpBjFAUFJkoLxUIwYqERzAVLUIF\ngyAEQiWGh1gFaWEYEW0hJRkIUlEQgqA4sFCVhww6EoFgQOgDVGJ+RUClKIFiIEFjwuP2j7lu7vHm\nnHNPctY+Z+89P58xMnLuOXufM9dee831nb8519pHHThwIAAAo7jRfjcAAGAvCT8AwFCEHwBgKMIP\nADAU4QcAGIrwAwAM5SY7feA1V3zWNfEAcJh73e0R+92E2fyPP33TfjdhNjc7/pZHbfUzlR8AYCjC\nDwAwFOEHABiK8AMADEX4AQCGIvwAAEPZ8aXuwJG55BVg+R114MDObt/jPj8AwKrY7j4/Kj8AsAsq\nvqvHmh8AYCjCDwAwFOEHABiK8AMADEX4AQCGIvwAAENxnx8AYO1sd58flR8AYChucggAu+Amh6tH\n+IEZ6QQBlp/wAzMSGACWnzU/AMBQVH5gRqa9AJaf8AMzEhgAlp/wAzNS+QFYfsIPzEhgAFh+wg/M\nSOUHYPkJPwCwCwYKq8dnewHALqj4LqftPttL+AEA1o4PNgUAmFjzAzNS/obxOO5Xj/ADMxql4wBY\nZdb8AABrZ7s1Pyo/ALALpr1Wj8oPALB2VH4AYEFUflaPS90BgKGo/MCMjAABlp81PwDA2rHmBwAW\nRMV39VjzAwAMxbQXALB2THsBwIKY9lo9pr0AgKGY9gIA1o5pLwBYENNeq8e0FwAwFNNeAMDa2W7a\nS+UHABiKNT8wI3P/MB7H/eoRfmBGo3QcAKvMtBcAMBQLngGAteM+P7BHzP3DeBz3q8e0FwAwFJUf\nmNEooyaAVabyAwAMRfgBAIYi/AAAQ3GpOwCwdlzqDnvEJa8wHsf96hF+YEajdBwAq8y0FwCwdkx7\nAcCCmPZaPa72AgCGIvwAAEMRfgCAoQg/AMBQhB8AYCiu9oIZueoDYPm5zw8AsHbc5wcAFkTFd/Wo\n/AAAa0flBwAWROVn9bjaCwAYisoPzMgIEGD5WfMDAKwda34AYEFUfFeP8AMz0gkCLD/hB2YkMAAs\nP+EHZqTyA7D8hB+YkcAAsPyEH5iRyg/A8hN+YEYCA8Dyc4dnAGAoKj8wI9NeAMtP5QcAGIrKD8xI\ntQRg+an8AABDEX4AgKEIPwDAUIQfAGAowg8AMBThBwAYylEHDhzY0QOvueKzO3sgDMxNDgGWw82O\nv+VRW/1M+AGAXTDoWU7bhR/TXgDAUFR+AIC1s13lx8dbALDn1mmqaJ2s07TXdoQfAPbcOp1kBbnV\nY80PADAUa34AgLXjai8AgIk1PwCwC+u05med1mJtR+UHABiKyg/MyAgQYPkJPzAjgQFg+bnaCwBY\nO+7wDAALYrp79VjwDAAMReUHZmQECLD8hB+YkcAAsPyEH5iRyg/A8hN+YEYCA8Dys+AZABiK+/wA\nAGvHp7oDAEyEHwBgKBY8w4xc7QXjcdyvHmt+AIC147O9AGBBVH5WjzU/AMBQhB8AYCjCDwAwFOEH\nABiK8AMADEX4AQCG4j4/AMDacZ8fAFgQ9/lZPcIPzEgnCLD8hB+YkcAAsPwseAYAhiL8AABDEX4A\ngKEIPwDAUIQfAGAowg8AMBThBwAYivADAAzFZ3sBAGvHZ3vBHvHxFjAex/3qUfkBANaOyg/sESNA\nGI/jfvUIPzCjUToOgFVm2gsAWDumvQBgQUx7rR73+QEAhqLyAzMyAgRYfsIPzEhgAFh+pr0AgKEI\nPwDAUEx7wYys+QFYfsIPzEhgAFh+bnIIAKwdNzkEgAUx3b16VH4AgLWj8gMAC6Lys3pUfgCAtaPy\nA3vECBDG47hfPcIPzGiUjgNglbnDMwAwFJUfmJHyN8DyE35gRgIDwPJztRcAsHZc7QUAC2K6e/UI\nPzAjnSDA8nO1FwAwFJUfmJFqCcDyU/kBAIYi/AAAQxF+AIChCD8AwFAseIYZudQdYPmp/AAAQ1H5\ngRmplgAsP5/tBQCsne0+28u0FwAwFNNeMCMLnmE8jvvVI/zAjEbpOABWmWkvAGAowg8AMBThBwAY\nivADAAzFgmeYkas+AJaf8AMzEhgAlp/wAzNS+QFYfj7eAgBYO9t9vIXKD8xI5QfG47hfPa72AgCG\nIvwAAEOx5gcAWDvW/ADAgljzs3pMewEAQxF+AIChCD8AwFAseIYZmfsHWA7bLXgWfgBgFwx6lpOr\nvWCP6AQBlp/KDwCwdlR+AGBBVHxXj6u9AIChCD8AwFBMe8GMlL8Blp/wAzMSGACWn2kvAGAoLnUH\nANaOS90BYEGs9Vs9pr0AgKGY9gIA1s52014qPwDAUKz5AYBdsOZn9Zj2AgDWjqu9AGBBVH5WjzU/\nAMBQhB8AYCimvWBGyt8Ay0/4gRkJDADLz9VeAMDacbUXACyI6e7VY8EzADAUlR+YkREgwPITfmBG\nAgPA8hN+YEYqPwDLz5ofAGAoKj8wI9USgOWn8gMADEX4AQCG4g7PAMDacYdnAFgQV3muHtNeAMBQ\nTHsBAGtnu2kvlR8AYCjW/MCMzP3DeBz3q0f4gRmN0nEArDLTXgDAUIQfAGAorvYCANaOmxzCHrHw\nEcbjuF89wg/MaJSOA2CVWfMDAAxF+AEAhmLBMwCwdix4BoAFseB59Zj2AgCGIvwAAEOx5gcAWDvW\n/ADAgljzs3pMewEAQ1H5gRkZAQIsP5UfAGAoKj8wI9USgOWn8gMADEX4AQCG4j4/AMDacZ8fAFgQ\nV3muHtNeAMBQTHsBAGvHtBcALIhpr9Vj2gsAGIrwAwAMRfgBAIYi/AAAQ3G1FwCwdlztBQAL4mqv\n1aPyAwCsHZUfAFgQlZ/VI/zAjHSCAMtP+IEZCQwAy0/4gRmp/AAsP+EHZiQwACw/4QdmpPIDsPx2\nfKk7AMA68PEWAMBQhB8AYCjCDwAwFOEHABiK8AMADEX4AQCGcsT7/LTWTk7yhiSXJDmQ5Jgkv1lV\nL11s0+bRWvtUVZ1w2PdOTXJaVZ2+g+efnB1uf2vtnUnOrKpLd9/yxZi25/eS3LWqPjF97wVJLq2q\n8/exabNorT07yYOT3DTJV5L8dFX9z/1t1Q3XWntWkp9Mcoeq+sf9bs9ObLYPkvxokvOSPDHJp6rq\n5Yc958Qkv5A+IDsuyRuq6ty9bPfh5u779qt/aK2dm+SeSU5I8lVJPpbki0kuqqqf3+I51+k3t3jc\nP09yalX91oxN3rHW2u2T/EmS/7Xh2xdutV3LprV2lyTnpO+XY5O8LcnPVdV17kHTWjs/yW9X1ds3\nfO+EJD9bVWctsI0n59BxcNBnquqRR3je6UnuXFXPvgF/8y+n5+64z2ut3Tz9PHb7nTx+pzc5vLCq\nTpv+wNFJqrX22qr63E4btuLWbfuvTvLq1topmx1kq6q19m1J/nWSf1lVB1pr357kNUnusb8t25XH\nJvntJKclOX9/m3JkW+2DqrrH9POtnvrLSR5XVZe21m6a5OLW2oVV9b/3pOFbW/ljv6p+KtndyWgb\nd0/f3/sSfiaXVNXJ+/j3b5DW2tekH9uPqKqPtNZunOR3kjw5ycu3ffKkqj6VZGHBZ4Nrj4N1cUPu\n8Hxcki8nuUdr7ez0kdqxSX4kyV+nJ8SvTk+yz6mqd7TWXp3kjukjp1+qqte21u6f5Ben3/XR9B3+\nmCTfNz33m5O8sKrOn0aFL0tyZZJPJ/nHqjq9tfa06e8eSE/EL5nS8S2n/x56sNGttW9N8qokfz/9\nd/kN2PbDt/8F0/Z/cmr7wb/1L5L8SpKbJ/n6JM+tqt9trf1ikgekv+5vrKoXttbOSvL49BHyB6rq\n6TewXdfHhVO7n5J+0jnY7v+U5F7pr92HquoJrbWfS993t5q+/7Ik/ybJnZI8vqret9l+2INt2Mzn\nk3xDkie21t5eVR9srZ3YWrtbkpckOSrJZ9OrD/dL8jNJ7p/k7CTHVNWz9qndm5pGXB9N7whfl+R6\nHQv70+ot98E7k5w5PebhrbVHpR/nT6+q9ye5LMlTp77ig+nh6ZrphP2D6cfdrZL8fFW9cW836Vrb\n9X3XJHlz+vvrbUneleTFuW7/cHZr7euS/LMkj66qj+3pFkym99aZVXVaa+2MJD+e5MZJfr+qzt7w\nuP+Y3p8/NckPJfl36a/Be6YQ9Zz01+PHqurX9ngzNjWFiF9Ncrv0/vf3q+q5m5wbnpXeD9w4yXlV\n9Tt73NSHpYeKjyRJVX25tfa4JF9srb3y8PZPzzmrtfbM9HPIGUm+lH68n9Ra+5P0993d0/uBh1XV\n5xfV+OmY/lCSuyb5QpKLknxvkq9J8pDpYfdtrV2Q5Pj0itZbW2s/lH7uuenUzodPv+OF6cfRr234\nG2dOv+vRSU7KdTPD0Ul+M8ktkvzF9Wn/Ttf8PLC19s7W2oXTH3pakrskeeyUuN+U5JHpgeVWSX5g\nauxNWmvHJfmeJI9IcmqSL7fWjkryivTEe//0zuH06W99dVV9f/po4uAI5eVJTq+qB04bfXCE+cNJ\nvjv9DfyD7dCw8sKq+q6q2hhw/nN6efDBSS7e4XZvt/2/lOSJVXWfJG9N8q0bHn/nJOdW1SlJfix9\nRye9A/yRqb0HR45PSPLUqrpvkj9vre3VR478eJKfbK3dcfr38Ukun9p8ryQntdZuO/3sqqo6Nckb\nk3xfVf1AkhckOe0I+2FPVdUnM1Udkry3tXZpku9Pf689ZXqvvi3Js6rqLeml8tekB6D/sB9tPoIn\nJXllVVWSq1tr98n1Pxb21Db7YKOPT+0/I4dGuI9JD0C/kh7qzp0qLUkPCqekd4Ln7eExkuy870v6\ntNJDquqc9JPvZv3DW6dt/4P0MLGvWmu3Tu9n75fkO5Mc3Vo7dvrZi5LcpKqekn5yeV6SB1XVdye5\nbWvtlPST0YX7HHy+bdpH75xOyCcleV9VfW+SE3ModCfTuWF6zB2mbXlAkudMlZi9dJv0KchrVdUX\n0gPPVu2/uKoelB4Uzjns9x2f5PUbzqn/asa2PnDjazwFsCR5/9Seo5P8w3T+uCS9T016oeHB6WHz\nl1trN0ofOD90eu0vSQ9MSXLzqrpfVb12+vfT0t+Xj0wPRZtlhjOT/FlVfU/6Mbdj13va66DW2sOS\nvKS19oUkt03yR1X14dbaryZ5fXqqe0lVXdlae0Z6mjs+fQT7tek7+A1TH31Mkv+entw+OP2JT6RX\nTpLkNlX14enri9KnAO6a5BuTXDB9/xZJvmX6ujbZhjslef/09R/ln4aVG7L9r6qqP0+Sqvr16XsH\nf/y3SZ47jagOpL8WSe/gX5DeSf7B9L0nJPnp1todkrw3vTqxcFX12Wm/vCb99bgqyTe01l6fnuKP\n3dDug/Ppn8uhed/L0/fPVvths32wUFOQu6Kqnjj9+17pr/PNk/yXaf/cNMlHpqeck+Svkjyqqr60\n1+3dTmvtFulV0FtPVZ2Do+/rcyws0z742w0Pe3eSTP3FCdNc/XdW1fOTPL/1dSSvTh84XJnkXVX1\nlSSXtdYuT+8/Nv6+RdpR3zf96ONVdc309Qlb9A8H1599Kr0f2G/flH7yuGr697OTZKpO3T2HRtN3\nTH/d3zZtx3Hpg91lWN/4T6a9WmvHJ3lca+0BSa5IPzEfdPCYuFuSe05hKen9wu1z6PyzF/4qPXBe\nazoP3C7Jvbdo/7un/1+cPqA/3MFp4o3nzzlsdhw8NNufG5JeITyQ5NOttc+nV90+neQ10/Fz5/Tz\nXnLd/urBSb40VcRunc0zw63TBxepqj9urX1xpxu0m6u9XpHkCdUXDf9NkqOm6YXjquqh6VM5L22t\nfX2Se1bVw9PT3znpL9T/TS/LnZxp9DD93s3WoHxiGt0mPbEn/YX6cJIHTL/j/PSFb0mfQjrcJUnu\nO3197+u7sZv4m9batyRJa+1nWmsP3/Cz5yf5jar60SR/mP7aHJ2eYB+dPtI4vbX2jUn+bXr5+f5J\nviPJd83Qth2pqjenv46np7+ZbldVj06vghyTQ0Fsu3VB2+2HvXb39NHFzaZ//5/099pfpK8nOTm9\n1P2W6ecvT/ITSZ43hY1l8tgkv15VD5mqbvdJr3xcdT2Phb221T748obHnJgkU3/x1+nH6+taa3dK\nkqr6u/QTw9XT4+85Pf7r0gdQn17wNhzJdfq+6fsb+52t+odlW2P30SR3Plhla63916nie1n6iPwu\nrV8g8vH0E+op03vspUnel77Ny3bV8OlJPldVj0lybpKvmmYbkkP76NIkfzhtywPTl2t8dI/b+ZYk\np7bWvjlJWl/rdl6Sb8/W7T9x+v/9kvzZJr9zr99fR/p7906uXZh9bHoF53npg7YnpQ+6Nzt+kj4t\nePk09fX/snlmuPa83lr7jhwasB/RbsrHr0tyUWvt79MPlNukj6jPbn0+/0ZJfjbTCKe1dnF6B/ii\naS7/J5K8dSqDXZHkcelrBTZzVpJXTUnxmiSfrKoPTXOJ75kO3Penl8K28lPpafOZST6TZLdXzjx5\natNX0kehL04/kSZ90dqLWmv/Pn2H3aqqrm6t/V16h3FVknekd/x/mv46Xjm1/4932a7r6xlJHpS+\n/uKbWmvvTn9Dfyx9n27rBuyHhamqN7W+tusD03vlRkmemd5p/8Y0XXIgyRnT+++yqnrZ9B5+Zfpa\npmXxpPQrpJIkVfUPrbU3ph9ruz0WFmabffCMDQ+7wzSNdHSSJ0/9waPSt+vgOoAPpK/Re2x6/3FB\nevXrrKraGKT2w2Z93+G26x+WRlV9prX2wiTvaq0dSPLmqvpkay3VF6yfkeTt6eH7vOlxN07yl+mB\n4RZJ7tZae0ZVvXifNuNwFyT5rdbafdMD9Edy3X305iQnt9YuSj8p/7equnIvG1lVV7TWHp/kFdN5\n8LipXdu1/6Tp2DmQvnZxT2YKMk17Hfa9Y3bwvGOm9h6bfkxckV4pfW/6eqXL07ft41s8/+np/dkF\n6cfP4Znh4vS+/T3pgfbqLX7PdazEp7q31p6SfunrZ1prv5DkmlqRSxlhTqMdC20xVygBg9vLhYO7\ncVmSd0wjyc+nT6nBiBwLALu0EpUfAIC5LNtCNQCAhRJ+AIChCD8AwFCEHwBgKMIPADAU4QcAGMr/\nB5R/xIlUgGu+AAAAAElFTkSuQmCC\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x1ddbbdd4cf8>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Só para facilitar a visualização dos dados ausentes - para saber mais leia o tutorial 'Seaborn'\n",
"# Em preto, os dados ausentes\n",
"\n",
"import seaborn as sns\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline\n",
"plt.figure(figsize=(10,5))\n",
"sns.heatmap(df.isnull(), cbar=False, yticklabels=False)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"E agora? Temos 3 colunas com falta de informações - Age (idade), Fare (taxa) e Cabin (cabine). \n",
"O que fazer? Como resolver esse problema?\n",
"\n",
"Não existe uma maneira correta ou única para resolver isso, vai depender de cada caso. Algumas opções são:\n",
"\n",
"a) Apagar a coluna; b) Apagar a linha; c) Preencher com algum valor"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Vamos resolver o problema da Cabine. Nessa coluna temos 327 valores nulos. Vimos que nosso dataframe tem 418 linhas,\n",
"# ou seja, aqui não dá pra fazer muita coisa, melhor apagar toda coluna:\n",
"\n",
"df.drop('Cabin', axis=1, inplace = True) # axis é o eixo: 0 p linha e 1 p coluna\n",
" # lembre sempre do inplace = True, para que a mudança seja permanente no nosso df"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"PassengerId 0\n",
"Pclass 0\n",
"Name 0\n",
"Sex 0\n",
"Age 86\n",
"SibSp 0\n",
"Parch 0\n",
"Ticket 0\n",
"Fare 1\n",
"Embarked 0\n",
"dtype: int64"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.isnull().sum()"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"30.272590361445783"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Agora vamos para a coluna de idade, uma opção aqui seria preencher os valores ausentes com a média de todas as idades\n",
"\n",
"df['Age'].mean()"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Pclass\n",
"1 40.918367\n",
"2 28.777500\n",
"3 24.027945\n",
"Name: Age, dtype: float64"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Uma solução mais elegante e mais correta seria pegar a média dos passageiros de cada classe para preencher os\n",
"# valores, posso fazer isso com o groupby\n",
"\n",
"df.groupby('Pclass')['Age'].mean()"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Agora é só criar uma função para preencher os valores que faltam com as médias calculadas:\n",
"\n",
"def preenchedor(cols): # a função será aplicada em uma lista de colunas\n",
" Idade = cols[0] # sendo que a primeira é a da Idade, index =[0]\n",
" Classe = cols[1] # e a segunda é a da Classe, index = [1]\n",
" \n",
" if pd.isnull(Idade): # se o valor na coluna da idade for NaN\n",
" if Classe == 1: # e se pertencer a primeira classe\n",
" return 41 # retorna a média de idade da 1ª classe = 41\n",
" elif Classe == 2:\n",
" return 29 # mesma coisa para 2ª classe\n",
" else:\n",
" return 24 # e para a 3ª classe\n",
" else: # mas se o valor da idade não for Nan \n",
" return Idade # recebe a própria Idade\n",
" \n",
"\n",
"# Créditos para a função acima: Jose Portilla - Pierian Data"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Aplicando a função para modificar a coluna Age\n",
"\n",
"df['Age'] = df[['Age','Pclass']].apply(preenchedor, axis=1)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Agora que resolver os maiores problemas, só sobrou um dado ausente na coluna Taxa(fare), como uma\n",
"# linha não vai fazer falta, podemos apagá-la\n",
"\n",
"df.dropna(inplace=True)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"PassengerId 0\n",
"Pclass 0\n",
"Name 0\n",
"Sex 0\n",
"Age 0\n",
"SibSp 0\n",
"Parch 0\n",
"Ticket 0\n",
"Fare 0\n",
"Embarked 0\n",
"dtype: int64"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Agora nosso dataframe está tinindo, pronto para ser utilizado:\n",
"\n",
"df.isnull().sum()"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x1ddbc3d45c0>"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAV0AAAEnCAYAAAAKMZAQAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAEwRJREFUeJzt3XmQpWV1x/FvAzOiA0iJBFmMsnlAMURRjIoohEQwZkKi\noOAKokDYVMpSEWMUlABJQFlqhihCUHGJpQEXJCCLigpRElDkuOCGbLIMoMiwdf543gt32hmFcN/z\n9jjfT1VX32mgz6PT/bvvs09NT08jSaqx0tANkKQViaErSYUMXUkqZOhKUiFDV5IKGbqSVGiV3/UP\n77rtJteTSdJDNHeNtaaW9c980pWkQoauJBUydCWpkKErSYUMXUkqZOhKUiFDV5IKGbqSVMjQlaRC\nhq4kFTJ0JamQoStJhQxdSSpk6EpSIUNXkgoZupJUyNCVpEKGriQVMnQlqZChK0mFDF1JKmToSlIh\nQ1eSChm6klTI0JWkQoauJBUydCWpkKErSYUMXUkqZOhKUiFDV5IKGbqSVMjQlaRChq4kFTJ0JamQ\noStJhQxdSSpk6EpSIUNXkgoZupJUyNCVpEKGriQVMnQlqZChK0mFDF1JKmToSlIhQ1eSChm6klTI\n0JWkQoauJBUydCWpkKErSYUMXUkqZOhKUiFDV5IKGbqSVMjQlaRChq4kFTJ0JamQoStJhQxdSSpk\n6EpSIUNXkgoZupJUyNCVpEKGriQVMnQlqZChK0mFDF1JKmToSlIhQ1eSChm6klTI0JWkQoauJBUy\ndCWpkKErSYUMXUkqZOhKUiFDV5IKGbqSVMjQlaRChq4kFTJ0JamQoStJhQxdSSpk6EpSIUNXkgoZ\nupJUyNCVpEKGriQVMnQlqZChK0mFDF1JKmToSlIhQ1eSChm6klTI0JWkQoauJBUydCWpkKErSYUM\nXUkqZOhKUiFDV5IKGbqSVMjQlaRChq4kFTJ0JamQoStJhQxdSSpk6EpSIUNXkgoZupJUyNCVpEKG\nriQVMnQlqZChK0mFDF1JKmToSlIhQ1eSChm6klTI0JWkQoauJBUydCWpkKErSYUMXUkqZOhKUiFD\nV5IKGbqSVMjQlaRChq4kFTJ0JamQoStJhQxdSSpk6EpSIUNXkgoZupJUyNCVpEKGriQVMnQlqZCh\nK0mFDF1JKmToSlIhQ1eSChm6klTI0JWkQoauJBUydCWpkKErSYUMXUkqZOhKUiFDV5IKGbqSVMjQ\nlaRChq4kFTJ0JamQoStJhQxdSSpk6EpSIUNXkgoZupJUyNCVpEKGriQVMnQlqZChK0mFDF1JKmTo\nSlIhQ1eSChm6klTI0JWkQoauJBUydCWpkKErSYUMXUkqZOhKUiFDV5IKGbqSVMjQlaRChq4kFTJ0\nJamQoStJhQxdSSpk6EpSIUNXkgoZupJUyNCVpEKGriQVMnQlqZChK0mFDF1JKmToSlIhQ1eSChm6\nklTI0JWkQoauJBUydCWpkKErSYUMXUkqZOhKUiFDV5IKGbqSVMjQlaRChq4kFTJ0JamQoStJhQxd\nSSpk6EpSIUNXkgoZupJUyNCVpEKGriQVMnQlqZChK0mFDF1JKmToSlIhQ1eSChm6klRoanp6eug2\nSNIKwyddSSpk6EpSIUNXkgoZupJUyNCVpEKGriQVMnQlqdAqQzdAK46I2BTYFLgM+EVmukhcK5w/\n+NCNiJWAKeA5wDcz864B2rDCh01E7A/8LfAY4FRgE2D/gdqyErA2cMOK+HcxEhF7ZeYHx/58YGZ+\nYKC2rAE8EfhRZv56iDZUmWjoRsSHgaX+EGfmnpOs9WBExLHA94AnAE8HrgdeU9yGwcMmIv4JOCQz\n74uIRwMfzMxdKtsAvBzYFjg3M4+NiEuK6wMQEX8H/CtwC7B6ROybmf81QDueCLwUeNToa5n5nqLa\nuwHzge0iYvvuyysDWwDloRsRLwXeQcujT0bEdGYeXlj/xyyZW3cDc4DFmbn5pOtNekz348AnaAFz\nJfAh2tPdqhOu82A9MzMXAs/OzB2BDQZow8uBvwAWZeaxwLMGaMNi4JyI2Bm4EPjcAG1YifaDPfrh\nXjxAGwDeCWydmU8Dngu8d6B2nA7Moz0IjD6qnAUspP1uLuw+jqf9nA7hTcCfATcCh9MeUiptBjwZ\nOA94eWYG8BLgq30Um+iTbmZ+CSAiDs7Mo7ovfy0iyp8kOitHxFbATyJiLrD6AG2YDWHzj7Sn7E8C\nB2XmqQO04XRa4D8hIr4AfHaANgDclJk3AGTm9RFx20DtuCMz3z1E4cy8BTgfOD8i/hzYGPgGcPMQ\n7QHuzczF3RPudESUDi9k5mKAiNg4My/uvnZpREQf9foa012t67ZcQhtLHepJ99+BE4E9gaNo7+jV\nPsbwYXMB8C3amNmCiHhaZr6hsgGZeVxEnEPrwl6ZmZdX1h9ze0R8ifb/yTOAR0XE+7o2HtJ38Yh4\nUvfy+ojYnfb3Mt3V/37f9We05X203t/mtIeBtwO7Vbah89WI+BiwQUQsoOXGEBZFxGHAxbTcuraP\nIn2F7utoIfck4LsUj6OOZOaJtNAlIv4lM38+QBuOj4hzGTZsjszMz3ev50fEgdUNiIiTx/64U0Tc\nDfwcOKF78qoy/qb3i8K6I+Nv/K/vPqAF7/a//a/3apvM3DYizsvMUyNi3+L6I0cCzwYupf2OnDlQ\nO14B7AP8FXAFrYc4cb2EbmZ+D/jrPr73QxERbwEWAWsCe0TEWZn55uI2bE0b112VNnFBZv59ZRuA\nC7t38PVp47lfKK4P8EjgR8BXaON3zwRuoA17zK9oQERs2YXLXFrYLQZOzsz7KuoDZOZ2XVtWBTbv\nurE7A5//3f9lL1bp2jEdESsD9w7QBoDPZ+Y2tLHmId0J3Er7ubyMNhw58eHASa9euJb2jj3FjFUM\nmbneJGs9SC+hzZiflZlPjojzBmjDqbR38sqnuZlOBr4IvAC4jjbB+fziNqydmaOu65ci4uzMfGdE\nXFhRPCLeDLwsIp4LHE1b0fJT4BjgoIo2zPARWtBeSusR7grsXtyGY2jDG2sD3+z+PISbI+IgIIH7\nADLz7AHasRC4hjaheAltePJFky4y6Ym0dQEiYs3MXDTJ7/3/dC/wOB6YGX7kAG34QWaeMkDdcWtl\n5skR8crMvKhbp1ptjYjYLDOvjIjNaeP+awGrFdXfhTZON00Lt00zc1FEXFRUf6b1M/PDAJl51BAP\nBJn5qW6cfRPgxwz3pHsT8KfdB7S/oyFCd+PM3CsinpeZZ0bE2/oo0teY7ueAbXr63g/F+d3HKyPi\nGIbpwn06Ij5OGyMC6tZjjouIzbrPGwD3VNenrU3+aESsC/wGOAV4GXVLtm7PzHsj4unAVWMPBVNF\n9WeajognZeb3I2Jj2jrZUhFxXGYeAFwSES8EjqM9dZfKzD1mtGvd6jZ0VomIx9L+blane+qeeJE+\nvimzpLuQme+gLbomIi7JzLur2wDsB3yaNrY8lAOBD9PWIn6WByZvymTmxd1Ezf7AXwLrZOZhhU2Y\n7lYOvBY4A+7fKTjEGxDAG4FPRMQ6tC7t3gO04bZu48xqtInenQZoAxHxHmBfYC5ts8j3gacM0JRD\nga8B69KW0PUy7NRX6M6K7kJEzKeF3hxgKiIem5lPLW7GTZl5ZHFNALqnug8BW9PGMRcAawCPp40l\nVrRhLm0Z0n60SYk1gA0z8zcV9cccCpxGG9M+JCKeTxtXrd6ZN7Jtt0FjMJn5jog4GtgkM18wYFPm\n05auHUPbLXjiQO24KDMjItambdTYqI8ifa1e2KN7qtiENgt4TR91HoTDaU8Q+9B2m+wwQBtujIiF\nwLd5YD3mSUW1jwZek5l3R8ThwI7AD2mTamcUteEntI0Rr8jMH0TEFwcIXDLzEsZ2A0bE14GNBur9\nALwoIo7JzPJx1LEJb2jDK+tExDUw2IT3td3miNUz84fdG/UQTgdempm/jIi9gYPpYbill9Cdcd7A\nKbTDXoY43OTazPx6ROyTmadExGsHaMMPu8+P6z5XHrCycmZeFhHrAfMy89sAEVG2RAo4lrb+8YkR\n8UGGG0MFICKeQZulXgf4WUTsPdDa6bWBa8b2/U9n5nMqCo8mvAEiYl5m/joi1svMoR6Oro6IPYFf\nR8QRtCWeQzgnIk7r6i+ipy37fc1ij5838H6GOW8AYHFEbAvM6SYKHlvdgG6r50m0MdVTgMot0aOn\nuB2BcwAiYg6F26Ez86jM3JJ2kMruwDMj4siI2KKqDTN8AHhVZm5A6wUN1ZV9MW3Y52W035fynWAR\n8S66OQ/g/RHx1uL6h3Yv96YdTPUWWq+4dOlcRMztnq5PBv6X9jD6OqCX7ch9jenOhvMGoA3Ob0Yb\nZjis+1wqIj5E220zj7Zk7Sra5oAK50TE12hjuPO7WfLjaYcSlcrMC4ALImJN4FW08dUhxjR/k5lX\ndG26PCLKj/rszKGNJ8+hPf2vR/1k2vzM3AogM3fpflYq5x+2Bw7vTr97b2ZuT1tBUS1Zcrhl/GsT\nH9ftK3QHPdxkbH87tK2mAIdQ27Uf2ZI2E7uwa8N/VBXOzCMj4gzg1sy8pgvdkzLzM1VtWEqbFtF+\nsUp/uSJidNbE3RFxIu3nc2tgqANvPgZ8hra08hrq1iuPuy8i5mbmXV0PqHr99tQyXpfKzA0BunXs\nH+m7Xl8TaUMfbjK+v33mDrnq/e03dycnzcvMG6Ofg4uWqduSPXr9I9pW3BXRaBzz693noG35/J9h\nmsOvMvOIiNg0M/eMiK8M0IYFwHci4nJaj/Co3/PvT9r0Ml4P5fW0FS29mpqenvz/1hmHm0AbWyw/\n3GRp+9urZ6ujneR0M23i5vG05VJDjXGvsCJig8y8ekYvCKg/3atrz7m0CcbjaGuHvzrEErJuedRG\ntBsbbiyufSvtQKwp2hry0euyScUZ7fkG8AiW3F8w8fHlvoYXBj/cpDPY/vaIeHX38kra2tQrgZ/R\nlo6p3pu7j4W0p6rH0La93kpx7yfa1TTvBnamjW1f1X2uqn9oZh4eEacz9oTZHcZUOYn1J4W1HoyS\nicS+QnfQw03GDLm/feY1H1PAHsAd1HfjBB+JiEtpK2leTOtaL6KFX5luOeXBtJ1wB2TmWdStmR55\nXvd5QXHdJWTmT4esvxSXAy9kycnNCyZdpK/QHT/cZDPaXVSVh5uMDLa/PTPfPnrd1T6VdibFG6va\noCWMNorcNeBGEWg9raD1fk5jmOMM58D9K0r0gM/Qlq49lXbM4x19FOkrdEeHm6xH61LvR+3hJqMu\n3NsYeH97ROxHC9o3ZeYQd5OpWdZGkeoJnDuz3Uh944A7rzbu5hp+SxbcnjGLTWXmPt2c1F604dGJ\n62v1wsXAVjO+/N991FqaZXThSkXE+rQNETfTLkIc8jxdLXujyBBLtUaGWiZ1B22ySEu6p5t8n0cb\n6+4lH/vaBvxq2lPm/XejZWYvh0csw2zown2Xtinky8AJ40vFiicr1MyWjSJPiXYf2NTYa6D05+K6\nHOZy0tnuBNrNxGfTVlvN/tuAx7yVtkqh/E6yzmzowv3NQHW1FLNoo8iuY6+Hmsj61kB1Z7XM/DRA\nN//0qczsZeNMX+t0z8zMwe5Ii4gvd1sKl3gtScsSES+i9X4W0Yad3pCZ50+6Tl9PundExBdpu31G\nxxlWDtDPhi6cpOXLu4BndUc7Po52fMHEz0npK3SHuG123Gzowklavtyemb8EyMzrImK5OmXso7St\njX9Mm0j6Tk91lsr1h5IerLHlc6tExOdoE2hb09PpiH2F7gIKrjKWpAnIGZ8B/rOvYn0d5bZxZv4D\nbRXBmcCje6ojSQ9LZp7aLaH7LHAL7bbq0cfE9fWkW3KVsSRN0NnAFTxwc/c08MlJF+krdGdeZex5\nA5Jmu1szc4++i/SyTnekO6vzlsy8p7cikjQBEXEw7V60K0Zfy8yJn4zY1zbgV9DOKn0EcFREHJ2Z\n/9xHLUmakOfRMuv53Z+nadc6TVRfwwsHATsBH6ctGzsbMHQlzWarZeYOfRfpK3RHs363Z+biiOir\njiRNynciYjfa7S6jnbQTv8qprzC8ijaB9qaIeBdwWU91JGlStqRdITQ6cnNV4NmTLtLLOt1uBvBp\n3aHdCzJz3z7qSNLDFRGfAMjM7YAvZOZ23es7+6jXS+hGxA7ANt2pPRdFhIfMSJqt/mjs9fjO2V6W\ndvW1I+29wA+AA4HnAvv0VEeSJqn32zz6Ct07gOuBezLzOnp6x5CkCZhexute9DWRdhvtipyTuosZ\nb+ipjiQ9XEs7f3sKeHIfxfoK3V1ph95cERFbAP/WUx1JeriWdf52L2dx93VdzybALsAc2jvGeplZ\nfv25JM02fY3pjq7H2QbYEFirpzqStFzpK3R/lZlHAFdn5muBdXqqI0nLlb5Cd7q72G31iJhHu1lT\nklZ4Ew/diFgDeDewM3AabUvwuZOuI0nLo4lOpEXE/sDBwD3AAZl51sS+uST9AZj0k+7uQNAOiTho\nwt9bkpZ7kw7dOzPzrsy8EZg74e8tScu9vibSoGAPsyQtbyY9pns9bdJsCtiesQm0zPSkMUkrvElv\nA17WdjpJEj3fBixJWlKfY7qSpBkMXUkqZOhKUiFDV5IKGbqSVOj/AEcDinLQsygEAAAAAElFTkSu\nQmCC\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x1ddb9f3e438>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.heatmap(df.isnull(), cbar=False, yticklabels=False)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"# JUNTAR DATAFRAMES"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As três principais funções para unir dataframes são: <i> .concat(), .join() e .merge()</i><br>\n",
"Vamos ver as diferenças entre elas e quando usar cada uma."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Temos 2 dataframes df1 e df2\n",
"\n",
"df1 = pd.DataFrame(columns= 'A B C'.split(), index= [0,1,2], data=[['a1','b1','c1'],\n",
" ['a2', 'b2', 'c2'], \n",
" ['a3', 'b3', 'c3']])\n",
"df2 = pd.DataFrame(columns= 'A B C'.split(), index= [3,4,5], data=[['a3','b3','c3'],\n",
" ['a4', 'b4', 'c4'], \n",
" ['a5', 'b5', 'c5']])\n",
"\n",
"df3 = pd.DataFrame(columns= 'D E F'.split(), index= [0, 1, 2], data=[['d1','e1','f1'],\n",
" ['d2', 'e2', 'f2'], \n",
" ['d3', 'e3', 'f3']])"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>a1</td>\n",
" <td>b1</td>\n",
" <td>c1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>a2</td>\n",
" <td>b2</td>\n",
" <td>c2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>a3</td>\n",
" <td>b3</td>\n",
" <td>c3</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B C\n",
"0 a1 b1 c1\n",
"1 a2 b2 c2\n",
"2 a3 b3 c3"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df1"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>a3</td>\n",
" <td>b3</td>\n",
" <td>c3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>a4</td>\n",
" <td>b4</td>\n",
" <td>c4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>a5</td>\n",
" <td>b5</td>\n",
" <td>c5</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B C\n",
"3 a3 b3 c3\n",
"4 a4 b4 c4\n",
"5 a5 b5 c5"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<b> CONCAT ( ) : </b> basicamente cola dois ou mais dataframes juntos"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>a1</td>\n",
" <td>b1</td>\n",
" <td>c1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>a2</td>\n",
" <td>b2</td>\n",
" <td>c2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>a3</td>\n",
" <td>b3</td>\n",
" <td>c3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>a3</td>\n",
" <td>b3</td>\n",
" <td>c3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>a4</td>\n",
" <td>b4</td>\n",
" <td>c4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>a5</td>\n",
" <td>b5</td>\n",
" <td>c5</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B C\n",
"0 a1 b1 c1\n",
"1 a2 b2 c2\n",
"2 a3 b3 c3\n",
"3 a3 b3 c3\n",
"4 a4 b4 c4\n",
"5 a5 b5 c5"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# A primeira função que veremos é a 'concat'. Ela basicamente junta dois dataframes.\n",
"\n",
"pd.concat([df1, df2]) # passamos uma lista com os df a serem concatenados"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>a1</td>\n",
" <td>b1</td>\n",
" <td>c1</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>a2</td>\n",
" <td>b2</td>\n",
" <td>c2</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>a3</td>\n",
" <td>b3</td>\n",
" <td>c3</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>a3</td>\n",
" <td>b3</td>\n",
" <td>c3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>a4</td>\n",
" <td>b4</td>\n",
" <td>c4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>a5</td>\n",
" <td>b5</td>\n",
" <td>c5</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B C A B C\n",
"0 a1 b1 c1 NaN NaN NaN\n",
"1 a2 b2 c2 NaN NaN NaN\n",
"2 a3 b3 c3 NaN NaN NaN\n",
"3 NaN NaN NaN a3 b3 c3\n",
"4 NaN NaN NaN a4 b4 c4\n",
"5 NaN NaN NaN a5 b5 c5"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Para juntar pelo eixo das colunas, use 'axis=1'. No nosso caso, resultou em alguns valores NaN, pois os df originais\n",
"# não possuem a informação das respectivas células\n",
"\n",
"pd.concat([df1, df2], axis=1)"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" <th>D</th>\n",
" <th>E</th>\n",
" <th>F</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>a1</td>\n",
" <td>b1</td>\n",
" <td>c1</td>\n",
" <td>d1</td>\n",
" <td>e1</td>\n",
" <td>f1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>a2</td>\n",
" <td>b2</td>\n",
" <td>c2</td>\n",
" <td>d2</td>\n",
" <td>e2</td>\n",
" <td>f2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>a3</td>\n",
" <td>b3</td>\n",
" <td>c3</td>\n",
" <td>d3</td>\n",
" <td>e3</td>\n",
" <td>f3</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B C D E F\n",
"0 a1 b1 c1 d1 e1 f1\n",
"1 a2 b2 c2 d2 e2 f2\n",
"2 a3 b3 c3 d3 e3 f3"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Um bom uso para o 'axis=1' seria na concatenação do df1 com o df3, pois compartilham o mesmo índice com colunas diferentes\n",
"\n",
"pd.concat([df1, df3], axis=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<b> JOIN( ): </b> faz a união de dataframes no índice ou em alguma coluna (transformado em índice)"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Suponha que você tenha 2 df com informações sobre funcionários de uma empresa\n",
"\n",
"func1 = pd.DataFrame(columns=['Experiência', 'Vendas Totais'], index = ['José', 'João', 'Pedro'], data=[['15 anos', 20000 ],\n",
" ['10 anos',15000 ],\n",
" ['2 anos', 10000]])\n",
"func2 = pd.DataFrame(columns=['Idade', 'Salário'], index = ['José', 'João', 'Paulo'], data=[['55 anos', 10000 ],\n",
" ['40 anos', 7500 ],\n",
" ['22 anos', 5000]])"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Experiência</th>\n",
" <th>Vendas Totais</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>José</th>\n",
" <td>15 anos</td>\n",
" <td>20000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>João</th>\n",
" <td>10 anos</td>\n",
" <td>15000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Pedro</th>\n",
" <td>2 anos</td>\n",
" <td>10000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Experiência Vendas Totais\n",
"José 15 anos 20000\n",
"João 10 anos 15000\n",
"Pedro 2 anos 10000"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"func1 # func1 tem dados sobre Pedro, mas não sobre Paulo"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Idade</th>\n",
" <th>Salário</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>José</th>\n",
" <td>55 anos</td>\n",
" <td>10000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>João</th>\n",
" <td>40 anos</td>\n",
" <td>7500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Paulo</th>\n",
" <td>22 anos</td>\n",
" <td>5000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Idade Salário\n",
"José 55 anos 10000\n",
"João 40 anos 7500\n",
"Paulo 22 anos 5000"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"func2 # func2 tem dados sobre Paulo, mas não sobre Pedro"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Experiência</th>\n",
" <th>Vendas Totais</th>\n",
" <th>Idade</th>\n",
" <th>Salário</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>José</th>\n",
" <td>15 anos</td>\n",
" <td>20000</td>\n",
" <td>55 anos</td>\n",
" <td>10000.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>João</th>\n",
" <td>10 anos</td>\n",
" <td>15000</td>\n",
" <td>40 anos</td>\n",
" <td>7500.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Pedro</th>\n",
" <td>2 anos</td>\n",
" <td>10000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Experiência Vendas Totais Idade Salário\n",
"José 15 anos 20000 55 anos 10000.0\n",
"João 10 anos 15000 40 anos 7500.0\n",
"Pedro 2 anos 10000 NaN NaN"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# O parâmetro 'how' é a forma de união, são quatro opções: left, right, inner, outer \n",
"\n",
"func1.join(func2) # left é o padrão: vai usar o índice do df da esquerda, no nosso caso func1"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Experiência</th>\n",
" <th>Vendas Totais</th>\n",
" <th>Idade</th>\n",
" <th>Salário</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>José</th>\n",
" <td>15 anos</td>\n",
" <td>20000.0</td>\n",
" <td>55 anos</td>\n",
" <td>10000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>João</th>\n",
" <td>10 anos</td>\n",
" <td>15000.0</td>\n",
" <td>40 anos</td>\n",
" <td>7500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Paulo</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>22 anos</td>\n",
" <td>5000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Experiência Vendas Totais Idade Salário\n",
"José 15 anos 20000.0 55 anos 10000\n",
"João 10 anos 15000.0 40 anos 7500\n",
"Paulo NaN NaN 22 anos 5000"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"func1.join(func2, how='right') # right usa o índice do df da direita, no nosso caso func2"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Experiência</th>\n",
" <th>Vendas Totais</th>\n",
" <th>Idade</th>\n",
" <th>Salário</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>José</th>\n",
" <td>15 anos</td>\n",
" <td>20000</td>\n",
" <td>55 anos</td>\n",
" <td>10000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>João</th>\n",
" <td>10 anos</td>\n",
" <td>15000</td>\n",
" <td>40 anos</td>\n",
" <td>7500</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Experiência Vendas Totais Idade Salário\n",
"José 15 anos 20000 55 anos 10000\n",
"João 10 anos 15000 40 anos 7500"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"func1.join(func2, how='inner') # inner só usa os índices em comum"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Experiência</th>\n",
" <th>Vendas Totais</th>\n",
" <th>Idade</th>\n",
" <th>Salário</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>José</th>\n",
" <td>15 anos</td>\n",
" <td>20000.0</td>\n",
" <td>55 anos</td>\n",
" <td>10000.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>João</th>\n",
" <td>10 anos</td>\n",
" <td>15000.0</td>\n",
" <td>40 anos</td>\n",
" <td>7500.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Paulo</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>22 anos</td>\n",
" <td>5000.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Pedro</th>\n",
" <td>2 anos</td>\n",
" <td>10000.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Experiência Vendas Totais Idade Salário\n",
"José 15 anos 20000.0 55 anos 10000.0\n",
"João 10 anos 15000.0 40 anos 7500.0\n",
"Paulo NaN NaN 22 anos 5000.0\n",
"Pedro 2 anos 10000.0 NaN NaN"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"func1.join(func2, how='outer') # outter usa todos os índices"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<b> MERGE( ): </b> é a mais completa das funções de junção, permitindo fazer praticamente qualquer operação de união de dataframes, apesar de ser mais complexa que as outras, por possuir mais parâmetros"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"f1 =pd.DataFrame(columns = ['Experiência', 'Vendas Totais', 'Vendedor'], index = [0, 1, 2], data = [['15 anos', 20000, 'José'],\n",
" ['10 anos',15000, 'João' ],\n",
" ['2 anos', 10000, 'Pedro']])\n",
"f2 =pd.DataFrame(columns = ['Idade', 'Salário', 'Vendedor'], index = [3, 4, 5], data = [['55 anos', 10000, 'José' ],\n",
" ['40 anos', 7500, 'João' ],\n",
" ['22 anos', 5000, 'Paulo']])"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Experiência</th>\n",
" <th>Vendas Totais</th>\n",
" <th>Vendedor</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>15 anos</td>\n",
" <td>20000</td>\n",
" <td>José</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>10 anos</td>\n",
" <td>15000</td>\n",
" <td>João</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2 anos</td>\n",
" <td>10000</td>\n",
" <td>Pedro</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Experiência Vendas Totais Vendedor\n",
"0 15 anos 20000 José\n",
"1 10 anos 15000 João\n",
"2 2 anos 10000 Pedro"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"f1"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Idade</th>\n",
" <th>Salário</th>\n",
" <th>Vendedor</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>55 anos</td>\n",
" <td>10000</td>\n",
" <td>José</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>40 anos</td>\n",
" <td>7500</td>\n",
" <td>João</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>22 anos</td>\n",
" <td>5000</td>\n",
" <td>Paulo</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Idade Salário Vendedor\n",
"3 55 anos 10000 José\n",
"4 40 anos 7500 João\n",
"5 22 anos 5000 Paulo"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"f2"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Experiência</th>\n",
" <th>Vendas Totais</th>\n",
" <th>Vendedor</th>\n",
" <th>Idade</th>\n",
" <th>Salário</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>15 anos</td>\n",
" <td>20000</td>\n",
" <td>José</td>\n",
" <td>55 anos</td>\n",
" <td>10000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>10 anos</td>\n",
" <td>15000</td>\n",
" <td>João</td>\n",
" <td>40 anos</td>\n",
" <td>7500</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Experiência Vendas Totais Vendedor Idade Salário\n",
"0 15 anos 20000 José 55 anos 10000\n",
"1 10 anos 15000 João 40 anos 7500"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Para fundir 2 df's usando colunas em comum, basta passar as colunas no parâmetro 'on'\n",
"# Também podemos usar o parâmetro 'how' que funciona da mesma forma que explicado anteriormente\n",
"\n",
"pd.merge(f1, f2, on='Vendedor', how='inner')"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Experiência</th>\n",
" <th>Vendas Totais</th>\n",
" <th>Vendedor</th>\n",
" <th>Idade</th>\n",
" <th>Salário</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>15 anos</td>\n",
" <td>20000.0</td>\n",
" <td>José</td>\n",
" <td>55 anos</td>\n",
" <td>10000.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>10 anos</td>\n",
" <td>15000.0</td>\n",
" <td>João</td>\n",
" <td>40 anos</td>\n",
" <td>7500.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2 anos</td>\n",
" <td>10000.0</td>\n",
" <td>Pedro</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Paulo</td>\n",
" <td>22 anos</td>\n",
" <td>5000.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Experiência Vendas Totais Vendedor Idade Salário\n",
"0 15 anos 20000.0 José 55 anos 10000.0\n",
"1 10 anos 15000.0 João 40 anos 7500.0\n",
"2 2 anos 10000.0 Pedro NaN NaN\n",
"3 NaN NaN Paulo 22 anos 5000.0"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.merge(f1, f2, on='Vendedor', how='outer')"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Experiência</th>\n",
" <th>Vendas Totais</th>\n",
" <th>Vendedor</th>\n",
" <th>Idade</th>\n",
" <th>Salário</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>15 anos</td>\n",
" <td>20000.0</td>\n",
" <td>José</td>\n",
" <td>55 anos</td>\n",
" <td>10000.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>10 anos</td>\n",
" <td>15000.0</td>\n",
" <td>João</td>\n",
" <td>40 anos</td>\n",
" <td>7500.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Paulo</td>\n",
" <td>22 anos</td>\n",
" <td>5000.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2 anos</td>\n",
" <td>10000.0</td>\n",
" <td>Pedro</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Experiência Vendas Totais Vendedor Idade Salário\n",
"0 15 anos 20000.0 José 55 anos 10000.0\n",
"1 10 anos 15000.0 João 40 anos 7500.0\n",
"2 NaN NaN Paulo 22 anos 5000.0\n",
"3 2 anos 10000.0 Pedro NaN NaN"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.merge(f1, f2, on='Vendedor', how='outer', sort=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Outros parâmetros de pd.merge(): <br>\n",
" - left_index: recebe um bool, se True a chave da união será o índice do df da esquerda \n",
" - right_index: recebe um bool, se True a chave da união será o índice do df da direita \n",
" - left_on: passar coluna ou lista de colunas do df da esquerda que será usado na união\n",
" - right_on: passar coluna ou lista de colunas do df da direita que será usado na união\n",
"Para entender: no nosso caso tanto faz passar \"left_on='Vendedor', right_on = 'Vendedor'\" ou apenas \"on='Vendedor'\"\n",
" - sort: recebe um bool, se True coloca a chave em ordem alfabética"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Com esse tutorial rápido já dá pra fazer muita coisa em termo de união de dataframes. <br>\n",
"Para mais informações sobre merge, join e concat, leia a documentação: http://pandas.pydata.org/pandas-docs/stable/merging.html"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"### Mudando o formato de Dataframes:"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>cor</th>\n",
" <th>letra</th>\n",
" <th>num</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>preto</td>\n",
" <td>A</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>preto</td>\n",
" <td>E</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>preto</td>\n",
" <td>I</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>branco</td>\n",
" <td>A</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>branco</td>\n",
" <td>E</td>\n",
" <td>5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>branco</td>\n",
" <td>I</td>\n",
" <td>6</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" cor letra num\n",
"0 preto A 1\n",
"1 preto E 2\n",
"2 preto I 3\n",
"3 branco A 4\n",
"4 branco E 5\n",
"5 branco I 6"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.DataFrame({'cor': ['preto','preto','preto','branco','branco','branco'],\n",
" 'letra': ['A', 'E', 'I', 'A', 'E', 'I'],\n",
" 'num': [1, 2, 3, 4, 5, 6]})\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" <th>1</th>\n",
" <th>2</th>\n",
" <th>3</th>\n",
" <th>4</th>\n",
" <th>5</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>cor</th>\n",
" <td>preto</td>\n",
" <td>preto</td>\n",
" <td>preto</td>\n",
" <td>branco</td>\n",
" <td>branco</td>\n",
" <td>branco</td>\n",
" </tr>\n",
" <tr>\n",
" <th>letra</th>\n",
" <td>A</td>\n",
" <td>E</td>\n",
" <td>I</td>\n",
" <td>A</td>\n",
" <td>E</td>\n",
" <td>I</td>\n",
" </tr>\n",
" <tr>\n",
" <th>num</th>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>3</td>\n",
" <td>4</td>\n",
" <td>5</td>\n",
" <td>6</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 0 1 2 3 4 5\n",
"cor preto preto preto branco branco branco\n",
"letra A E I A E I\n",
"num 1 2 3 4 5 6"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Uma forma rápida de trocar as colunas pelos índices \n",
"\n",
"df.T # ou\n",
"\n",
"df.transpose() # mesmo resultado"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th>letra</th>\n",
" <th>A</th>\n",
" <th>E</th>\n",
" <th>I</th>\n",
" </tr>\n",
" <tr>\n",
" <th>cor</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>branco</th>\n",
" <td>4</td>\n",
" <td>5</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>preto</th>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>3</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"letra A E I\n",
"cor \n",
"branco 4 5 6\n",
"preto 1 2 3"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Com a função pivot, é possível escolher índice, colunas e valores para reformatar o df\n",
"\n",
"df.pivot(index ='cor', columns = 'letra', values = 'num')"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>cor</th>\n",
" <th>vogal</th>\n",
" <th>num</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>preto</td>\n",
" <td>A</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>preto</td>\n",
" <td>E</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>preto</td>\n",
" <td>I</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>branco</td>\n",
" <td>A</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>branco</td>\n",
" <td>E</td>\n",
" <td>5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>branco</td>\n",
" <td>I</td>\n",
" <td>6</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" cor vogal num\n",
"0 preto A 1\n",
"1 preto E 2\n",
"2 preto I 3\n",
"3 branco A 4\n",
"4 branco E 5\n",
"5 branco I 6"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Para renomear as colunas:\n",
"\n",
"df.rename(columns= {\"letra\":'vogal'})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Aplicando funções e Operações nos dataframes:"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0 1\n",
"1 2\n",
"2 3\n",
"3 4\n",
"4 5\n",
"5 6\n",
"Name: num, dtype: int64"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Vamos usar a coluna 'num' para aplicar algumas funções\n",
"\n",
"df['num']"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Primeiro passo é escrever uma função\n",
"\n",
"def dobrar(x): # função simples que retorna o dobro do número passado\n",
" return x * 2"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Agora é só aplicar na coluna desejada com .apply(), passando a nossa função como argumento\n",
"\n",
"df['num'] = df['num'].apply(dobrar) # para tornar a alteração permanente, atribuímos o resultado a nossa coluna"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0 2\n",
"1 4\n",
"2 6\n",
"3 8\n",
"4 10\n",
"5 12\n",
"Name: num, dtype: int64"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['num']"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0 4\n",
"1 8\n",
"2 12\n",
"3 16\n",
"4 20\n",
"5 24\n",
"Name: num, dtype: int64"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# O mesmo resultado pode ser obtido através de uma função lambda\n",
"\n",
"df['num'].apply(lambda x: x*2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Acredito que deu pra trazer pelo menos o básico da biblioteca Pandas nesse tutorial. Eu sei que é muita coisa, mas tem muito mais\n",
"coisa ainda para ser estudado! Se quiser se aprofundar mais, leia a documentação oficial, links abaixo."
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"<b>FONTES:</b>\n",
"\n",
"1 - http://pandas.pydata.org/ <br>\n",
"2 - http://pandas.pydata.org/pandas-docs/stable/<br>\n",
"3 - http://pandas.pydata.org/pandas-docs/stable/10min.html#min<br>\n",
"4 - http://pandas.pydata.org/pandas-docs/stable/api.html#general-functions<br>"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python [conda env:k35]",
"language": "python",
"name": "conda-env-k35-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment