Skip to content

Instantly share code, notes, and snippets.

@pfokin92
Created December 24, 2019 18:17
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pfokin92/1cdde6557f785b46565207f9ca66c2b8 to your computer and use it in GitHub Desktop.
Save pfokin92/1cdde6557f785b46565207f9ca66c2b8 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Домашнее задание к лекции «Продвинутый pandas 2»"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Задание 1\n",
"Используйте файл с оценками фильмов ml-latest-small/ratings.csv. Посчитайте среднее время жизни пользователей, которые выставили более 100 оценок. Под временем жизни понимается разница между максимальным и минимальным значением столбца timestamp для данного значения userId."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"ratings=pd.read_csv('ml-latest-small/ratings.csv', sep=',')"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>userId</th>\n",
" <th>rating</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>20</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>76</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>51</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>204</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>100</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" userId rating\n",
"0 1 20\n",
"1 2 76\n",
"2 3 51\n",
"3 4 204\n",
"4 5 100"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ratings_count=ratings.groupby('userId').count().reset_index()\n",
"ratings_count=ratings_count[['userId','rating']]\n",
"ratings_count.head()"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"C:\\Users\\fokinpavel2007\\Anaconda3\\lib\\site-packages\\pandas\\core\\frame.py:4025: SettingWithCopyWarning: \n",
"A value is trying to be set on a copy of a slice from a DataFrame\n",
"\n",
"See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy\n",
" return super(DataFrame, self).rename(**kwargs)\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>userId</th>\n",
" <th>rating_count</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>204</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>8</td>\n",
" <td>116</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>15</td>\n",
" <td>1700</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>17</td>\n",
" <td>363</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>19</td>\n",
" <td>423</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" userId rating_count\n",
"3 4 204\n",
"7 8 116\n",
"14 15 1700\n",
"16 17 363\n",
"18 19 423"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"count_more_than_100=ratings_count[ratings_count['rating']>100]\n",
"count_more_than_100.rename(columns={'rating': 'rating_count'}, inplace=True)\n",
"count_more_than_100.head()"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"rating_max=ratings.groupby('userId').max().reset_index()\n",
"rating_min=ratings.groupby('userId').min().reset_index()"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [],
"source": [
"rating_max_timestamp=rating_max[['userId','timestamp']]\n",
"rating_min_timestamp=rating_min[['userId','timestamp']]"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"rating_max_timestamp.rename(columns={'timestamp': 'timestamp_max'}, inplace=True)\n",
"rating_min_timestamp.rename(columns={'timestamp': 'timestamp_min'}, inplace=True)"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>userId</th>\n",
" <th>timestamp_max</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>1260759205</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>835356246</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>1298932787</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>949982274</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>1163375145</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" userId timestamp_max\n",
"0 1 1260759205\n",
"1 2 835356246\n",
"2 3 1298932787\n",
"3 4 949982274\n",
"4 5 1163375145"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"rating_max_timestamp.head()"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>userId</th>\n",
" <th>timestamp_min</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>1260759108</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>835355395</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>1298861589</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>949778714</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>1163373044</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" userId timestamp_min\n",
"0 1 1260759108\n",
"1 2 835355395\n",
"2 3 1298861589\n",
"3 4 949778714\n",
"4 5 1163373044"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"rating_min_timestamp.head()"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [],
"source": [
"avg_timestamp=count_more_than_100.merge(rating_max_timestamp, how='inner').merge(rating_min_timestamp, how='inner')"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [],
"source": [
"def average(row):\n",
" mean_=(row['timestamp_max']-row['timestamp_min'])\n",
" return mean_"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"avg_timestamp['avg']=avg_timestamp.apply(average, axis=1)\n",
"len(count_more_than_100)==len(avg_timestamp)"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>userId</th>\n",
" <th>rating_count</th>\n",
" <th>timestamp_max</th>\n",
" <th>timestamp_min</th>\n",
" <th>avg</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>4</td>\n",
" <td>204</td>\n",
" <td>949982274</td>\n",
" <td>949778714</td>\n",
" <td>203560</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>8</td>\n",
" <td>116</td>\n",
" <td>1154474527</td>\n",
" <td>1154389340</td>\n",
" <td>85187</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>15</td>\n",
" <td>1700</td>\n",
" <td>1469330735</td>\n",
" <td>997937239</td>\n",
" <td>471393496</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>17</td>\n",
" <td>363</td>\n",
" <td>1127476640</td>\n",
" <td>1127468587</td>\n",
" <td>8053</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>19</td>\n",
" <td>423</td>\n",
" <td>855195373</td>\n",
" <td>855190091</td>\n",
" <td>5282</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>21</td>\n",
" <td>162</td>\n",
" <td>854522908</td>\n",
" <td>853157476</td>\n",
" <td>1365432</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>22</td>\n",
" <td>220</td>\n",
" <td>1131753381</td>\n",
" <td>1131661890</td>\n",
" <td>91491</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>23</td>\n",
" <td>726</td>\n",
" <td>1166728253</td>\n",
" <td>1148386124</td>\n",
" <td>18342129</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>26</td>\n",
" <td>172</td>\n",
" <td>1371811577</td>\n",
" <td>1351544316</td>\n",
" <td>20267261</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>30</td>\n",
" <td>1011</td>\n",
" <td>1060795346</td>\n",
" <td>944943070</td>\n",
" <td>115852276</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>33</td>\n",
" <td>138</td>\n",
" <td>1037009003</td>\n",
" <td>1032676792</td>\n",
" <td>4332211</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>34</td>\n",
" <td>187</td>\n",
" <td>973749049</td>\n",
" <td>973746214</td>\n",
" <td>2835</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>36</td>\n",
" <td>104</td>\n",
" <td>853005800</td>\n",
" <td>847056510</td>\n",
" <td>5949290</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>38</td>\n",
" <td>111</td>\n",
" <td>1449693163</td>\n",
" <td>1389721498</td>\n",
" <td>59971665</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>41</td>\n",
" <td>199</td>\n",
" <td>1109817081</td>\n",
" <td>1093886586</td>\n",
" <td>15930495</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" userId rating_count timestamp_max timestamp_min avg\n",
"0 4 204 949982274 949778714 203560\n",
"1 8 116 1154474527 1154389340 85187\n",
"2 15 1700 1469330735 997937239 471393496\n",
"3 17 363 1127476640 1127468587 8053\n",
"4 19 423 855195373 855190091 5282\n",
"5 21 162 854522908 853157476 1365432\n",
"6 22 220 1131753381 1131661890 91491\n",
"7 23 726 1166728253 1148386124 18342129\n",
"8 26 172 1371811577 1351544316 20267261\n",
"9 30 1011 1060795346 944943070 115852276\n",
"10 33 138 1037009003 1032676792 4332211\n",
"11 34 187 973749049 973746214 2835\n",
"12 36 104 853005800 847056510 5949290\n",
"13 38 111 1449693163 1389721498 59971665\n",
"14 41 199 1109817081 1093886586 15930495"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"avg_timestamp.head(15)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Задание 2\n",
"Дана статистика услуг перевозок клиентов компании по типам (см. файл с кодом занятия). Необходимо сформировать две таблицы:\n",
"\n",
" таблицу с тремя типами выручки для каждого client_id без указания адреса клиента\n",
" аналогичную таблицу по типам выручки с указанием адреса клиента"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>client_id</th>\n",
" <th>rzd_revenue</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>111</td>\n",
" <td>1093</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>112</td>\n",
" <td>2810</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>113</td>\n",
" <td>10283</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>114</td>\n",
" <td>5774</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>115</td>\n",
" <td>981</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" client_id rzd_revenue\n",
"0 111 1093\n",
"1 112 2810\n",
"2 113 10283\n",
"3 114 5774\n",
"4 115 981"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"rzd = pd.DataFrame(\n",
" {\n",
" 'client_id': [111, 112, 113, 114, 115],\n",
" 'rzd_revenue': [1093, 2810, 10283, 5774, 981]\n",
" }\n",
")\n",
"rzd"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>client_id</th>\n",
" <th>auto_revenue</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>113</td>\n",
" <td>57483</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>114</td>\n",
" <td>83</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>115</td>\n",
" <td>912</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>116</td>\n",
" <td>4834</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>117</td>\n",
" <td>98</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" client_id auto_revenue\n",
"0 113 57483\n",
"1 114 83\n",
"2 115 912\n",
"3 116 4834\n",
"4 117 98"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"auto = pd.DataFrame(\n",
" {\n",
" 'client_id': [113, 114, 115, 116, 117],\n",
" 'auto_revenue': [57483, 83, 912, 4834, 98]\n",
" }\n",
")\n",
"auto"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>client_id</th>\n",
" <th>air_revenue</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>115</td>\n",
" <td>81</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>116</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>117</td>\n",
" <td>13</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>118</td>\n",
" <td>173</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" client_id air_revenue\n",
"0 115 81\n",
"1 116 4\n",
"2 117 13\n",
"3 118 173"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"air = pd.DataFrame(\n",
" {\n",
" 'client_id': [115, 116, 117, 118],\n",
" 'air_revenue': [81, 4, 13, 173]\n",
" }\n",
")\n",
"air"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>client_id</th>\n",
" <th>address</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>111</td>\n",
" <td>Комсомольская 4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>112</td>\n",
" <td>Энтузиастов 8а</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>113</td>\n",
" <td>Левобережная 1а</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>114</td>\n",
" <td>Мира 14</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>115</td>\n",
" <td>ЗЖБИиДК 1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>116</td>\n",
" <td>Строителей 18</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>117</td>\n",
" <td>Панфиловская 33</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>118</td>\n",
" <td>Мастеркова 4</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" client_id address\n",
"0 111 Комсомольская 4\n",
"1 112 Энтузиастов 8а\n",
"2 113 Левобережная 1а\n",
"3 114 Мира 14\n",
"4 115 ЗЖБИиДК 1\n",
"5 116 Строителей 18\n",
"6 117 Панфиловская 33\n",
"7 118 Мастеркова 4"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"client_base = pd.DataFrame(\n",
" {\n",
" 'client_id': [111, 112, 113, 114, 115, 116, 117, 118],\n",
" 'address': ['Комсомольская 4', 'Энтузиастов 8а', 'Левобережная 1а', 'Мира 14', 'ЗЖБИиДК 1', \n",
" 'Строителей 18', 'Панфиловская 33', 'Мастеркова 4']\n",
" }\n",
")\n",
"client_base"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"table=rzd.merge(auto, how='outer', on='client_id')"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"table=table.merge(air, how='outer', on='client_id')"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"table.loc[table.rzd_revenue.isnull(), 'rzd_revenue',]=0\n",
"table.loc[table.auto_revenue.isnull(), 'auto_revenue',]=0\n",
"table.loc[table.air_revenue.isnull(), 'air_revenue',]=0"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"full_table=table.merge(client_base, how='outer', on='client_id')"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>client_id</th>\n",
" <th>rzd_revenue</th>\n",
" <th>auto_revenue</th>\n",
" <th>air_revenue</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>111</td>\n",
" <td>1093.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>112</td>\n",
" <td>2810.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>113</td>\n",
" <td>10283.0</td>\n",
" <td>57483.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>114</td>\n",
" <td>5774.0</td>\n",
" <td>83.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>115</td>\n",
" <td>981.0</td>\n",
" <td>912.0</td>\n",
" <td>81.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>116</td>\n",
" <td>0.0</td>\n",
" <td>4834.0</td>\n",
" <td>4.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>117</td>\n",
" <td>0.0</td>\n",
" <td>98.0</td>\n",
" <td>13.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>118</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>173.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" client_id rzd_revenue auto_revenue air_revenue\n",
"0 111 1093.0 0.0 0.0\n",
"1 112 2810.0 0.0 0.0\n",
"2 113 10283.0 57483.0 0.0\n",
"3 114 5774.0 83.0 0.0\n",
"4 115 981.0 912.0 81.0\n",
"5 116 0.0 4834.0 4.0\n",
"6 117 0.0 98.0 13.0\n",
"7 118 0.0 0.0 173.0"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"table"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>client_id</th>\n",
" <th>rzd_revenue</th>\n",
" <th>auto_revenue</th>\n",
" <th>air_revenue</th>\n",
" <th>address</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>111</td>\n",
" <td>1093.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>Комсомольская 4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>112</td>\n",
" <td>2810.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>Энтузиастов 8а</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>113</td>\n",
" <td>10283.0</td>\n",
" <td>57483.0</td>\n",
" <td>0.0</td>\n",
" <td>Левобережная 1а</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>114</td>\n",
" <td>5774.0</td>\n",
" <td>83.0</td>\n",
" <td>0.0</td>\n",
" <td>Мира 14</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>115</td>\n",
" <td>981.0</td>\n",
" <td>912.0</td>\n",
" <td>81.0</td>\n",
" <td>ЗЖБИиДК 1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>116</td>\n",
" <td>0.0</td>\n",
" <td>4834.0</td>\n",
" <td>4.0</td>\n",
" <td>Строителей 18</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>117</td>\n",
" <td>0.0</td>\n",
" <td>98.0</td>\n",
" <td>13.0</td>\n",
" <td>Панфиловская 33</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>118</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>173.0</td>\n",
" <td>Мастеркова 4</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" client_id rzd_revenue auto_revenue air_revenue address\n",
"0 111 1093.0 0.0 0.0 Комсомольская 4\n",
"1 112 2810.0 0.0 0.0 Энтузиастов 8а\n",
"2 113 10283.0 57483.0 0.0 Левобережная 1а\n",
"3 114 5774.0 83.0 0.0 Мира 14\n",
"4 115 981.0 912.0 81.0 ЗЖБИиДК 1\n",
"5 116 0.0 4834.0 4.0 Строителей 18\n",
"6 117 0.0 98.0 13.0 Панфиловская 33\n",
"7 118 0.0 0.0 173.0 Мастеркова 4"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"full_table"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Задание 3 \n",
"В задаче сквозной аналитики вам предоставили данные по местоположению пользователей. Т. е. для каждого user_id известна последовательность координат (широта/долгота), когда они требовались приложению для полноценной работы. Как бы вы добавили эти сведения в таблицу визитов и покупок? Для составления ответа можно использовать вопросы:\n",
"\n",
" У каждого пользователя известен набор координат. А для связывания с визитом или фактом покупки скорее всего потребуется одно-два числа. Как их получить?\n",
" Наборы координат одного и того же пользователя могут быть значительно удалены друг от друга. Как это отразится на вопросе расчетах пункта 1?\n",
" Какие дополнительные признаки можно получить из координат? Ведь это просто числа, которые сами по себе мало что дают."
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>user_id</th>\n",
" <th>visit_id</th>\n",
" <th>coordinate</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>11</td>\n",
" <td>101</td>\n",
" <td>55.755831 37.617673</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>22</td>\n",
" <td>301</td>\n",
" <td>56.755831 37.617673</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>55</td>\n",
" <td>305</td>\n",
" <td>55.755831 38.617673</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>11</td>\n",
" <td>896</td>\n",
" <td>55.755831 37.617673</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>99</td>\n",
" <td>896</td>\n",
" <td>55.255831 37.017673</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" user_id visit_id coordinate\n",
"0 11 101 55.755831 37.617673\n",
"1 22 301 56.755831 37.617673\n",
"2 55 305 55.755831 38.617673\n",
"3 11 896 55.755831 37.617673\n",
"4 99 896 55.255831 37.017673"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Для того что бы добавить координаты нужно ввести индификатор визита так как пользователь может зайти из разных мест\n",
"coordinates = pd.DataFrame(\n",
" {\n",
" 'user_id': [11, 22, 55, 11, 99],\n",
" 'visit_id': [101, 301, 305, 896, 896],\n",
" 'coordinate': ['55.755831 37.617673', '56.755831 37.617673', '55.755831 38.617673', '55.755831 37.617673', '55.255831 37.017673'],\n",
" }\n",
")\n",
"\n",
"coordinates"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Из координат можно получить данные откуда делаются больше всего запросов, что в этой области предпочитают и т. д."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment