Skip to content

Instantly share code, notes, and snippets.

@esuji5
Created June 28, 2018 05:20
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save esuji5/e904288bdda2bcc8e34e3f0996b167e9 to your computer and use it in GitHub Desktop.
Save esuji5/e904288bdda2bcc8e34e3f0996b167e9 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Jリーグ公式戦2014年シーズン後半戦全試合の観客動員数を予測\n",
"\n",
"- サイト:https://signate.jp/competitions/27/\n",
"- 締切: 無期限\n",
"- 応募:3354件 / 352人\n",
"- 最終結果(2018.6.9時点):\n",
" - スコア:3,433.90189\n",
" - 順位:62位\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 考え方\n",
"- 分類手法は1つにし、アンサンブルで頑張らない\n",
"\n",
"### 精度改善に繋がったもの\n",
"- ホームチームとアウェイチームの情報\n",
"- 会場になるスタジアム情報\n",
" - ホームチームで固定されているわけではなく、2,3個あるチームもあった\n",
"- 土日祝日の開催かどうか\n",
"- テレビ情報の振り分けとして、スカパー系のみか、地上波やBSでの放送があるかを考慮\n",
"- ザスパ草津がザスパクサツ群馬に改名していたのを名寄せ\n",
"- Google経路検索でアウェイスタジアムからホームのスタジアムまでの車移動距離と時間を得る\n",
" - 公共交通機関での移動時間も考慮したかったがAPI制限で面倒になってしまった\n",
"- yの値をlog変換してから学習させる\n",
"- スタジアムのキャパよりも予測値が大きい場合はキャパの値に置き換える\n",
"- 最初はXGBoostを使っていたが、Ridgeの精度が良かったため、そちらを採用\n",
"\n",
"### 精度改善に繋がらなかった\n",
"- J1とJ2で観客数に差があったため、J1とJ2でデータを分けて学習させる\n",
"- match(第何節か)の考慮\n",
" - 前半と後半に分けたり、四半期に分けたりもしたが、実際の精度にはつながらず\n",
"- 前の半期での順位\n",
" - 順位によって客数はあまり変わらないようだった\n",
"- デーゲームかナイトゲームか\n",
"- その他、いろいろ試したが、内容は忘れた"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# カラムを全部表示\n",
"pd.options.display.max_columns = None\n",
"\n",
"# 追加データを統合\n",
"df = pd.read_csv('data/j_league/train.csv')\n",
"dfadd = pd.read_csv('data/j_league/train_add.csv')\n",
"dft = pd.read_csv('data/j_league/test.csv')\n",
"\n",
"# 文字列系をいい感じに加工しておく\n",
"df = pd.concat([df, dfadd, dft], sort=False)\n",
"df['week'] = df.gameday.apply(lambda x: x[-3:])\n",
"df['day_night'] = df.time.apply(lambda x: 'day' if x < '16:00' else 'night')\n",
"df['match'] = df.match.apply(lambda x: x[:-3])\n",
"df['tv_rec'] = df['tv'].apply(lambda x: 1 if '(録)' in x else 0)\n",
"df['tv'] = df['tv'].str.replace('(録)', '')\n",
"df['tv'] = df['tv'].str.replace('スカパー!/スカパー!プレミアムサービス', '')\n",
"df['tv'] = df['tv'].str.replace('スカパー!', '')\n",
"df['tv'] = df['tv'].str.replace('スカパー/e2/スカパー光', '')\n",
"df['tv'] = df['tv'].str.replace('スカパー/e2(スカイ・A sports+)/スカパー光', '')\n",
"df['tv_cs_only'] = df['tv'].apply(lambda x: 1 if x == '' else 0)\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# コンディション情報\n",
"dfc = pd.read_csv('data/j_league/condition.csv')\n",
"dfca = pd.read_csv('data/j_league/condition_add.csv')\n",
"dfc = pd.concat([dfc,dfca])\n",
"\n",
"# pandas_profiling.ProfileReport(df)\n",
"dfc['score_diff'] = dfc.home_score - dfc.away_score\n",
"dfc['home_win'] = 0\n",
"dfc.loc[dfc['score_diff'] > 0, 'home_win'] = 1\n",
"dfc.loc[dfc['score_diff'] < 0, 'home_win'] = -1\n",
"dfc['weather'] = dfc['weather'].str.replace('一時雨', '')\n",
"dfc['weather'] = dfc['weather'].apply(lambda x: x[0] if len(x) >= 3 else x)\n",
"# dfc.head()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"# スタジアム情報\n",
"dfsta = pd.read_csv('data/j_league/stadium.csv')\n",
"stadium_capa_dict = dict(zip(dfsta.name.values, dfsta.capa.values))\n",
"home_dict = {'アスルクラロ沼津': '愛鷹広域公園多目的競技場', 'FC岐阜': '岐阜メモリアルセンター長良川競技場', '京都サンガF.C.': '京都市西京極総合運動公園陸上競技場兼球技場', '鹿島アントラーズ': '茨城県立カシマサッカースタジアム', '川崎フロンターレ': '等々力陸上競技場', '浦和レッズ': '埼玉スタジアム2002', '藤枝MYFC': '藤枝市総合運動公園サッカー場', 'カターレ富山': '富山県総合運動公園陸上競技場', 'ツエーゲン金沢': '石川県西部緑地公園陸上競技場', '栃木SC': '栃木県グリーンスタジアム', 'AC長野パルセイロ': '長野Uスタジアム', 'コンサドーレ札幌': '札幌ドーム', 'FC町田ゼルビア': '町田市立陸上競技場', '名古屋グランパス': '豊田スタジアム', '鹿児島ユナイテッドFC': '鹿児島県立鴨池陸上競技場', 'FC琉球': '沖縄県総合運動公園陸上競技場', '大宮アルディージャ': 'NACK5スタジアム大宮', 'ジュビロ磐田': 'ヤマハスタジアム', '柏レイソル': '三協フロンテア柏スタジアム', '清水エスパルス': 'IAIスタジアム日本平', 'ガンバ大阪': 'パナソニックスタジアム吹田', 'ガイナーレ鳥取': 'とりぎんバードスタジアム', 'グルージャ盛岡': 'いわぎんスタジアム', 'ベガルタ仙台': 'ユアテックスタジアム仙台', 'スタブラウブリッツ秋田': 'あきぎんスタジアム', 'ジェフユナイテッド千葉': 'フクダ電子アリーナ', 'Y.S.C.C.横浜': 'ニッパツ三ツ沢球技場', '横浜FC': 'ニッパツ三ツ沢球技場', '松本山雅FC': 'アルウィン', 'セレッソ大阪': 'キンチョウスタジアム', 'ヴィッセル神戸': 'ノエビアスタジアム神戸', 'ギラヴァンツ北九州': 'ミクニワールドスタジアム北九州', 'アビスパ福岡': 'レベルファイブスタジアム', 'サガン鳥栖': 'ベストアメニティスタジアム', 'モンテディオ山形': 'NDソフトスタジアム山形', '福島ユナイテッドFC': 'とうほう・みんなのスタジアム', '水戸ホーリーホック': 'ケーズデンキスタジアム水戸', 'ザスパ草津': '正田醤油スタジアム群馬','ザスパクサツ群馬': '正田醤油スタジアム群馬', 'FC東京': '味の素スタジアム', '東京ヴェルディ': '味の素スタジアム', '横浜F・マリノス': '日産スタジアム', 'SC相模原': '相模原ギオンスタジアム', '湘南ベルマーレ': 'Shonan BMWスタジアム平塚', 'ヴァンフォーレ甲府': '山梨中銀スタジアム', 'アルビレックス新潟': 'デンカビッグスワンスタジアム', '名古屋グランパス': 'パロマ瑞穂スタジアム', 'セレッソ大阪': 'ヤンマースタジアム長居', 'ファジアーノ岡山': 'シティライトスタジアム', 'サンフレッチェ広島': 'エディオンスタジアム広島', 'レノファ山口FC': '維新みらいふスタジアム', '徳島ヴォルティス': '徳島県鳴門総合運動公園陸上競技場', 'カマタマーレ讃岐': 'Pikaraスタジアム', '愛媛FC': 'ニンジニアスタジアム', 'V・ファーレン長崎': 'トランスコスモススタジアム長崎', 'ロアッソ熊本': 'えがお健康スタジアム', '大分トリニータ': '大分銀行ドーム'}"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"import unicodedata\n",
"\n",
"# 情報を統合したdfを作成\n",
"dfs = pd.read_csv('data/j_league/stadium.csv')\n",
"dfa = df.merge(dfc)\n",
"dfa = dfa.merge(dfs, left_on='stadium', right_on='name')\n",
"dfa = dfa.sort_values('id')\n",
"\n",
"# y=0のデータをドロップ\n",
"dfa = dfa.drop(dfa[dfa.id == 15699].index)\n",
"\n",
"# 全角英数を半角英数に\n",
"dfa.stage = dfa.stage.apply(lambda x: unicodedata.normalize('NFKC', x))\n",
"dfa.match = dfa.match.apply(lambda x: unicodedata.normalize('NFKC', x))\n",
"dfa.match = dfa.match.apply(lambda x: x[:3].replace(x[1], '0{}'.format(x[1]))+x[3:] if not x[1:3].isdigit() else x)\n",
"dfa['half'] = dfa.match.apply(lambda x: 'first' if x <= \"第17節\" else 'second')\n",
"dfa['quarter'] = dfa.match.apply(lambda x: 'first' if x <= \"第06節\" else 'second' if x <= \"第17節\" else 'third' if x <= \"第23節\" else 'forth')\n",
"\n",
"dfa['capa_per'] = dfa.y / dfa.capa\n",
"dfa['home'] = dfa.home.apply(lambda x: x if x != 'ザスパクサツ群馬' else 'ザスパ草津')\n",
"dfa['home_team'] = dfa.home.apply(lambda x: x if x != 'ザスパクサツ群馬' else 'ザスパ草津')\n",
"dfa['away'] = dfa.away.apply(lambda x: x if x != 'ザスパクサツ群馬' else 'ザスパ草津')\n",
"dfa['away_team'] = dfa.away.apply(lambda x: x if x != 'ザスパクサツ群馬' else 'ザスパ草津')\n",
"dfa['aways_home'] = dfa.away.map(home_dict)\n",
"dfa['stadium_aways_home'] = dfa.stadium + ',' + dfa.aways_home\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"import requests\n",
"\n",
"# Google経路検索でアウェイスタジアムからホームのスタジアムまでの車移動距離と時間を得る\n",
"my_api_key = 'set your key'\n",
"my_api_key = 'AIzaSyD5MFMlI4GHJX_31lvxAcEyJy1csdYegew'\n",
"try:\n",
" fro_to_dict = utils.pickle_load('data/j_league/fro_to_dict.pickle')\n",
"except FileNotFoundError:\n",
" fro_to_dict = {}\n",
"\n",
"def calc_distance(fro, to):\n",
" def json_proc(mode):\n",
" url = url_temp.format(mode=mode, fro=fro, to=to, key=my_api_key)\n",
" res = requests.get(url)\n",
" orig_json = res.json()\n",
" try:\n",
" leg = orig_json['routes'][0]['legs'][0]\n",
" dist, dura = leg['distance']['value'], leg['duration']['value']\n",
" except:\n",
" print('dame:', mode, fro, to)\n",
" if 'error_message' in orig_json:\n",
" print(orig_json)\n",
" raise Exception\n",
" dist, dura = None, None\n",
" return fro, to, dist, dura, orig_json\n",
" url_temp = 'https://maps.googleapis.com/maps/api/directions/json?mode={mode}&origin={fro}&destination={to}&key={key}'\n",
" driv_data = json_proc('driving') \n",
"# tran_data = json_proc('transit') # 公共交通機関移動を考慮したければこっちも\n",
" return driv_data\n",
"\n",
"if not fro_to_dict:\n",
" stadium_comb = set(dfa.stadium_aways_home.values)\n",
" for idx, comb in enumerate(stadium_comb):\n",
" if idx % 100 == 0:\n",
" print(idx)\n",
" home, away = comb.split(',')\n",
" driv_data = calc_distance(away, home)\n",
" fro_to_dict[comb] = driv_data\n",
" utils.pickle_dump(fro_to_dict, 'data/j_league/fro_to_dict.pickle')"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>y</th>\n",
" <th>year</th>\n",
" <th>stage</th>\n",
" <th>match</th>\n",
" <th>gameday</th>\n",
" <th>time</th>\n",
" <th>home</th>\n",
" <th>away</th>\n",
" <th>stadium</th>\n",
" <th>tv</th>\n",
" <th>week</th>\n",
" <th>day_night</th>\n",
" <th>tv_rec</th>\n",
" <th>tv_cs_only</th>\n",
" <th>home_score</th>\n",
" <th>away_score</th>\n",
" <th>weather</th>\n",
" <th>temperature</th>\n",
" <th>humidity</th>\n",
" <th>referee</th>\n",
" <th>home_team</th>\n",
" <th>home_01</th>\n",
" <th>home_02</th>\n",
" <th>home_03</th>\n",
" <th>home_04</th>\n",
" <th>home_05</th>\n",
" <th>home_06</th>\n",
" <th>home_07</th>\n",
" <th>home_08</th>\n",
" <th>home_09</th>\n",
" <th>home_10</th>\n",
" <th>home_11</th>\n",
" <th>away_team</th>\n",
" <th>away_01</th>\n",
" <th>away_02</th>\n",
" <th>away_03</th>\n",
" <th>away_04</th>\n",
" <th>away_05</th>\n",
" <th>away_06</th>\n",
" <th>away_07</th>\n",
" <th>away_08</th>\n",
" <th>away_09</th>\n",
" <th>away_10</th>\n",
" <th>away_11</th>\n",
" <th>score_diff</th>\n",
" <th>home_win</th>\n",
" <th>name</th>\n",
" <th>address</th>\n",
" <th>capa</th>\n",
" <th>half</th>\n",
" <th>quarter</th>\n",
" <th>capa_per</th>\n",
" <th>aways_home</th>\n",
" <th>stadium_aways_home</th>\n",
" <th>driv_dist</th>\n",
" <th>driv_dura</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>13994</td>\n",
" <td>18250.0</td>\n",
" <td>2012</td>\n",
" <td>J1</td>\n",
" <td>第01節</td>\n",
" <td>03/10(土)</td>\n",
" <td>14:04</td>\n",
" <td>ベガルタ仙台</td>\n",
" <td>鹿島アントラーズ</td>\n",
" <td>ユアテックスタジアム仙台</td>\n",
" <td>/NHK総合</td>\n",
" <td>(土)</td>\n",
" <td>day</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>雨</td>\n",
" <td>3.8</td>\n",
" <td>66%</td>\n",
" <td>木村 博之</td>\n",
" <td>ベガルタ仙台</td>\n",
" <td>林 卓人</td>\n",
" <td>菅井 直樹</td>\n",
" <td>鎌田 次郎</td>\n",
" <td>上本 大海</td>\n",
" <td>田村 直也</td>\n",
" <td>富田 晋伍</td>\n",
" <td>角田 誠</td>\n",
" <td>太田 吉彰</td>\n",
" <td>関口 訓充</td>\n",
" <td>ウイルソン</td>\n",
" <td>赤嶺 真吾</td>\n",
" <td>鹿島アントラーズ</td>\n",
" <td>曽ヶ端 準</td>\n",
" <td>新井場 徹</td>\n",
" <td>岩政 大樹</td>\n",
" <td>中田 浩二</td>\n",
" <td>アレックス</td>\n",
" <td>青木 剛</td>\n",
" <td>増田 誓志</td>\n",
" <td>小笠原 満男</td>\n",
" <td>本山 雅志</td>\n",
" <td>大迫 勇也</td>\n",
" <td>ジュニーニョ</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>ユアテックスタジアム仙台</td>\n",
" <td>宮城県仙台市泉区七北田字柳78</td>\n",
" <td>19694</td>\n",
" <td>first</td>\n",
" <td>first</td>\n",
" <td>0.926678</td>\n",
" <td>茨城県立カシマサッカースタジアム</td>\n",
" <td>ユアテックスタジアム仙台,茨城県立カシマサッカースタジアム</td>\n",
" <td>312696</td>\n",
" <td>15286</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49</th>\n",
" <td>13995</td>\n",
" <td>24316.0</td>\n",
" <td>2012</td>\n",
" <td>J1</td>\n",
" <td>第01節</td>\n",
" <td>03/10(土)</td>\n",
" <td>14:04</td>\n",
" <td>名古屋グランパス</td>\n",
" <td>清水エスパルス</td>\n",
" <td>豊田スタジアム</td>\n",
" <td>(J SPORTS 4)/NHK名古屋</td>\n",
" <td>(土)</td>\n",
" <td>day</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>屋内</td>\n",
" <td>12.4</td>\n",
" <td>43%</td>\n",
" <td>西村 雄一</td>\n",
" <td>名古屋グランパス</td>\n",
" <td>楢﨑 正剛</td>\n",
" <td>田中 隼磨</td>\n",
" <td>田中 マルクス闘莉王</td>\n",
" <td>増川 隆洋</td>\n",
" <td>阿部 翔平</td>\n",
" <td>中村 直志</td>\n",
" <td>ダニルソン</td>\n",
" <td>藤本 淳吾</td>\n",
" <td>金崎 夢生</td>\n",
" <td>ケネディ</td>\n",
" <td>玉田 圭司</td>\n",
" <td>清水エスパルス</td>\n",
" <td>林 彰洋</td>\n",
" <td>吉田 豊</td>\n",
" <td>岩下 敬輔</td>\n",
" <td>カルフィン ヨン ア ピン</td>\n",
" <td>李 記帝</td>\n",
" <td>村松 大輔</td>\n",
" <td>河井 陽介</td>\n",
" <td>枝村 匠馬</td>\n",
" <td>高木 俊幸</td>\n",
" <td>アレックス</td>\n",
" <td>大前 元紀</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>豊田スタジアム</td>\n",
" <td>愛知県豊田市千石町7-2</td>\n",
" <td>40000</td>\n",
" <td>first</td>\n",
" <td>first</td>\n",
" <td>0.607900</td>\n",
" <td>IAIスタジアム日本平</td>\n",
" <td>豊田スタジアム,IAIスタジアム日本平</td>\n",
" <td>156539</td>\n",
" <td>8247</td>\n",
" </tr>\n",
" <tr>\n",
" <th>77</th>\n",
" <td>13996</td>\n",
" <td>17066.0</td>\n",
" <td>2012</td>\n",
" <td>J1</td>\n",
" <td>第01節</td>\n",
" <td>03/10(土)</td>\n",
" <td>14:04</td>\n",
" <td>ガンバ大阪</td>\n",
" <td>ヴィッセル神戸</td>\n",
" <td>万博記念競技場</td>\n",
" <td>(J SPORTS 1)/NHK大阪</td>\n",
" <td>(土)</td>\n",
" <td>day</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>3</td>\n",
" <td>晴</td>\n",
" <td>11.3</td>\n",
" <td>41%</td>\n",
" <td>高山 啓義</td>\n",
" <td>ガンバ大阪</td>\n",
" <td>藤ヶ谷 陽介</td>\n",
" <td>加地 亮</td>\n",
" <td>中澤 聡太</td>\n",
" <td>今野 泰幸</td>\n",
" <td>藤春 廣輝</td>\n",
" <td>明神 智和</td>\n",
" <td>遠藤 保仁</td>\n",
" <td>佐々木 勇人</td>\n",
" <td>二川 孝広</td>\n",
" <td>ラフィーニャ</td>\n",
" <td>パウリーニョ</td>\n",
" <td>ヴィッセル神戸</td>\n",
" <td>徳重 健太</td>\n",
" <td>近藤 岳登</td>\n",
" <td>北本 久仁衛</td>\n",
" <td>伊野波 雅彦</td>\n",
" <td>相馬 崇人</td>\n",
" <td>三原 雅俊</td>\n",
" <td>田中 英雄</td>\n",
" <td>野沢 拓也</td>\n",
" <td>橋本 英郎</td>\n",
" <td>森岡 亮太</td>\n",
" <td>大久保 嘉人</td>\n",
" <td>-1</td>\n",
" <td>-1</td>\n",
" <td>万博記念競技場</td>\n",
" <td>大阪府吹田市千里万博公園5-2</td>\n",
" <td>21000</td>\n",
" <td>first</td>\n",
" <td>first</td>\n",
" <td>0.812667</td>\n",
" <td>ノエビアスタジアム神戸</td>\n",
" <td>万博記念競技場,ノエビアスタジアム神戸</td>\n",
" <td>48402</td>\n",
" <td>2962</td>\n",
" </tr>\n",
" <tr>\n",
" <th>131</th>\n",
" <td>13997</td>\n",
" <td>29603.0</td>\n",
" <td>2012</td>\n",
" <td>J1</td>\n",
" <td>第01節</td>\n",
" <td>03/10(土)</td>\n",
" <td>14:06</td>\n",
" <td>サンフレッチェ広島</td>\n",
" <td>浦和レッズ</td>\n",
" <td>エディオンスタジアム広島</td>\n",
" <td>/NHK広島</td>\n",
" <td>(土)</td>\n",
" <td>day</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>曇</td>\n",
" <td>11.4</td>\n",
" <td>52%</td>\n",
" <td>松尾 一</td>\n",
" <td>サンフレッチェ広島</td>\n",
" <td>西川 周作</td>\n",
" <td>森脇 良太</td>\n",
" <td>千葉 和彦</td>\n",
" <td>水本 裕貴</td>\n",
" <td>ミキッチ</td>\n",
" <td>青山 敏弘</td>\n",
" <td>森﨑 和幸</td>\n",
" <td>山岸 智</td>\n",
" <td>石原 直樹</td>\n",
" <td>髙萩 洋次郎</td>\n",
" <td>佐藤 寿人</td>\n",
" <td>浦和レッズ</td>\n",
" <td>加藤 順大</td>\n",
" <td>濱田 水輝</td>\n",
" <td>阿部 勇樹</td>\n",
" <td>槙野 智章</td>\n",
" <td>平川 忠亮</td>\n",
" <td>鈴木 啓太</td>\n",
" <td>山田 直輝</td>\n",
" <td>梅崎 司</td>\n",
" <td>柏木 陽介</td>\n",
" <td>原口 元気</td>\n",
" <td>田中 達也</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>エディオンスタジアム広島</td>\n",
" <td>広島県広島市安佐南区大塚西5-1-1</td>\n",
" <td>50000</td>\n",
" <td>first</td>\n",
" <td>first</td>\n",
" <td>0.592060</td>\n",
" <td>埼玉スタジアム2002</td>\n",
" <td>エディオンスタジアム広島,埼玉スタジアム2002</td>\n",
" <td>843082</td>\n",
" <td>36437</td>\n",
" </tr>\n",
" <tr>\n",
" <th>181</th>\n",
" <td>13998</td>\n",
" <td>25353.0</td>\n",
" <td>2012</td>\n",
" <td>J1</td>\n",
" <td>第01節</td>\n",
" <td>03/10(土)</td>\n",
" <td>14:04</td>\n",
" <td>コンサドーレ札幌</td>\n",
" <td>ジュビロ磐田</td>\n",
" <td>札幌ドーム</td>\n",
" <td>(スカイ・A sports+)/NHK札幌</td>\n",
" <td>(土)</td>\n",
" <td>day</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>屋内</td>\n",
" <td>22.5</td>\n",
" <td>32%</td>\n",
" <td>廣瀬 格</td>\n",
" <td>コンサドーレ札幌</td>\n",
" <td>李 昊乗</td>\n",
" <td>高木 純平</td>\n",
" <td>ジェイド ノース</td>\n",
" <td>奈良 竜樹</td>\n",
" <td>岩沼 俊介</td>\n",
" <td>河合 竜二</td>\n",
" <td>山本 真希</td>\n",
" <td>近藤 祐介</td>\n",
" <td>内村 圭宏</td>\n",
" <td>岡本 賢明</td>\n",
" <td>前田 俊介</td>\n",
" <td>ジュビロ磐田</td>\n",
" <td>川口 能活</td>\n",
" <td>駒野 友一</td>\n",
" <td>チョ ビョングク</td>\n",
" <td>藤田 義明</td>\n",
" <td>山本 脩斗</td>\n",
" <td>小林 裕紀</td>\n",
" <td>山本 康裕</td>\n",
" <td>山田 大記</td>\n",
" <td>松浦 拓弥</td>\n",
" <td>菅沼 実</td>\n",
" <td>前田 遼一</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>札幌ドーム</td>\n",
" <td>北海道札幌市豊平区羊ヶ丘1</td>\n",
" <td>39232</td>\n",
" <td>first</td>\n",
" <td>first</td>\n",
" <td>0.646233</td>\n",
" <td>ヤマハスタジアム</td>\n",
" <td>札幌ドーム,ヤマハスタジアム</td>\n",
" <td>1379802</td>\n",
" <td>67252</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" id y year stage match gameday time home away \\\n",
"0 13994 18250.0 2012 J1 第01節 03/10(土) 14:04 ベガルタ仙台 鹿島アントラーズ \n",
"49 13995 24316.0 2012 J1 第01節 03/10(土) 14:04 名古屋グランパス 清水エスパルス \n",
"77 13996 17066.0 2012 J1 第01節 03/10(土) 14:04 ガンバ大阪 ヴィッセル神戸 \n",
"131 13997 29603.0 2012 J1 第01節 03/10(土) 14:06 サンフレッチェ広島 浦和レッズ \n",
"181 13998 25353.0 2012 J1 第01節 03/10(土) 14:04 コンサドーレ札幌 ジュビロ磐田 \n",
"\n",
" stadium tv week day_night tv_rec tv_cs_only \\\n",
"0 ユアテックスタジアム仙台 /NHK総合 (土) day 0 0 \n",
"49 豊田スタジアム (J SPORTS 4)/NHK名古屋 (土) day 0 0 \n",
"77 万博記念競技場 (J SPORTS 1)/NHK大阪 (土) day 0 0 \n",
"131 エディオンスタジアム広島 /NHK広島 (土) day 0 0 \n",
"181 札幌ドーム (スカイ・A sports+)/NHK札幌 (土) day 0 0 \n",
"\n",
" home_score away_score weather temperature humidity referee home_team \\\n",
"0 1 0 雨 3.8 66% 木村 博之 ベガルタ仙台 \n",
"49 1 0 屋内 12.4 43% 西村 雄一 名古屋グランパス \n",
"77 2 3 晴 11.3 41% 高山 啓義 ガンバ大阪 \n",
"131 1 0 曇 11.4 52% 松尾 一 サンフレッチェ広島 \n",
"181 0 0 屋内 22.5 32% 廣瀬 格 コンサドーレ札幌 \n",
"\n",
" home_01 home_02 home_03 home_04 home_05 home_06 home_07 home_08 \\\n",
"0 林 卓人 菅井 直樹 鎌田 次郎 上本 大海 田村 直也 富田 晋伍 角田 誠 太田 吉彰 \n",
"49 楢﨑 正剛 田中 隼磨 田中 マルクス闘莉王 増川 隆洋 阿部 翔平 中村 直志 ダニルソン 藤本 淳吾 \n",
"77 藤ヶ谷 陽介 加地 亮 中澤 聡太 今野 泰幸 藤春 廣輝 明神 智和 遠藤 保仁 佐々木 勇人 \n",
"131 西川 周作 森脇 良太 千葉 和彦 水本 裕貴 ミキッチ 青山 敏弘 森﨑 和幸 山岸 智 \n",
"181 李 昊乗 高木 純平 ジェイド ノース 奈良 竜樹 岩沼 俊介 河合 竜二 山本 真希 近藤 祐介 \n",
"\n",
" home_09 home_10 home_11 away_team away_01 away_02 away_03 \\\n",
"0 関口 訓充 ウイルソン 赤嶺 真吾 鹿島アントラーズ 曽ヶ端 準 新井場 徹 岩政 大樹 \n",
"49 金崎 夢生 ケネディ 玉田 圭司 清水エスパルス 林 彰洋 吉田 豊 岩下 敬輔 \n",
"77 二川 孝広 ラフィーニャ パウリーニョ ヴィッセル神戸 徳重 健太 近藤 岳登 北本 久仁衛 \n",
"131 石原 直樹 髙萩 洋次郎 佐藤 寿人 浦和レッズ 加藤 順大 濱田 水輝 阿部 勇樹 \n",
"181 内村 圭宏 岡本 賢明 前田 俊介 ジュビロ磐田 川口 能活 駒野 友一 チョ ビョングク \n",
"\n",
" away_04 away_05 away_06 away_07 away_08 away_09 away_10 away_11 \\\n",
"0 中田 浩二 アレックス 青木 剛 増田 誓志 小笠原 満男 本山 雅志 大迫 勇也 ジュニーニョ \n",
"49 カルフィン ヨン ア ピン 李 記帝 村松 大輔 河井 陽介 枝村 匠馬 高木 俊幸 アレックス 大前 元紀 \n",
"77 伊野波 雅彦 相馬 崇人 三原 雅俊 田中 英雄 野沢 拓也 橋本 英郎 森岡 亮太 大久保 嘉人 \n",
"131 槙野 智章 平川 忠亮 鈴木 啓太 山田 直輝 梅崎 司 柏木 陽介 原口 元気 田中 達也 \n",
"181 藤田 義明 山本 脩斗 小林 裕紀 山本 康裕 山田 大記 松浦 拓弥 菅沼 実 前田 遼一 \n",
"\n",
" score_diff home_win name address capa half \\\n",
"0 1 1 ユアテックスタジアム仙台 宮城県仙台市泉区七北田字柳78 19694 first \n",
"49 1 1 豊田スタジアム 愛知県豊田市千石町7-2 40000 first \n",
"77 -1 -1 万博記念競技場 大阪府吹田市千里万博公園5-2 21000 first \n",
"131 1 1 エディオンスタジアム広島 広島県広島市安佐南区大塚西5-1-1 50000 first \n",
"181 0 0 札幌ドーム 北海道札幌市豊平区羊ヶ丘1 39232 first \n",
"\n",
" quarter capa_per aways_home stadium_aways_home \\\n",
"0 first 0.926678 茨城県立カシマサッカースタジアム ユアテックスタジアム仙台,茨城県立カシマサッカースタジアム \n",
"49 first 0.607900 IAIスタジアム日本平 豊田スタジアム,IAIスタジアム日本平 \n",
"77 first 0.812667 ノエビアスタジアム神戸 万博記念競技場,ノエビアスタジアム神戸 \n",
"131 first 0.592060 埼玉スタジアム2002 エディオンスタジアム広島,埼玉スタジアム2002 \n",
"181 first 0.646233 ヤマハスタジアム 札幌ドーム,ヤマハスタジアム \n",
"\n",
" driv_dist driv_dura \n",
"0 312696 15286 \n",
"49 156539 8247 \n",
"77 48402 2962 \n",
"131 843082 36437 \n",
"181 1379802 67252 "
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fro_to_dict_dist = {comb:t[2] for comb, t in fro_to_dict.items()}\n",
"fro_to_dict_dura = {comb:t[3] for comb, t in fro_to_dict.items()}\n",
"dfa['driv_dist'] = dfa.stadium_aways_home.map(fro_to_dict_dist)\n",
"dfa['driv_dura'] = dfa.stadium_aways_home.map(fro_to_dict_dura)\n",
"dfa.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.model_selection import train_test_split\n",
"from sklearn.model_selection import cross_val_score\n",
"from sklearn.linear_model import LinearRegression\n",
"from sklearn.metrics import accuracy_score, classification_report\n",
"from sklearn.linear_model import Ridge, RidgeCV, ElasticNet, LassoCV, LassoLarsCV\n",
"import xgboost as xgb\n",
"\n",
"# 最終的に使用するカラムを選定\n",
"def calc_prediction(dfta, use_log_y=False):\n",
" dfu = dfta[[\n",
"# 'year', \n",
"# 'stage', \n",
"# 'match', \n",
" 'home',\n",
" 'away', \n",
" 'stadium',\n",
" 'week',\n",
" 'driv_dist', \n",
" 'driv_dura',\n",
"# 'last_half_period_rank_home',\n",
"# 'last_half_period_rank_away',\n",
"# 'half',\n",
"# 'quarter',\n",
" 'tv_cs_only',\n",
"# 'day_night',\n",
" ]]\n",
" dfu = pd.get_dummies(dfu)\n",
" if use_log_y:\n",
" dfu['y'] = np.log(dfta.y)\n",
" else:\n",
" dfu['y'] = dfta.y\n",
"\n",
" dfutr = dfu[dfu.y.notna()]\n",
" dfutt = dfu[dfu.y.isna()]\n",
" print(dfutr.shape)\n",
" print(dfutt.shape)\n",
"\n",
" def get_clf_datas(df, rate=0.3, verbose=True, clf=None):\n",
" # dfnが最終のデータ形かつ右端にclassifyする列がある前提\n",
" datas = df.values\n",
" X, y = datas[:, :-1], datas[:, -1]\n",
" X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=rate, random_state=0)\n",
"\n",
" # 分類機はお好みで\n",
" if not clf:\n",
" clf = LinearRegression(normalize=True, n_jobs=-1)\n",
" if verbose:\n",
" print('train:', X_train.shape[0], '\\ntest: ', X_test.shape[0])\n",
" return clf, X_train, X_test, y_train, y_test\n",
"\n",
"# clf, X_train, X_test, y_train, y_test = get_clf_datas(dfutr, rate=0.2, clf=xgb.XGBRegressor())\n",
" clf, X_train, X_test, y_train, y_test = get_clf_datas(dfutr, rate=0.2, clf=Ridge(alpha=.8))\n",
"# clf, X_train, X_test, y_train, y_test = get_clf_datas(dfutr, rate=0.2)\n",
" clf_test = clf\n",
"\n",
" # 予測モデルを作成\n",
" clf.fit(X_train, y_train)\n",
" scores = cross_val_score(clf, X_train, y_train, cv=6)\n",
" print(\"train Accuracy: {:0.3f} (+/- {:0.3f})\".format(scores.mean(), scores.std() * 2), scores)\n",
" scores = cross_val_score(clf, X_test, y_test, cv=6)\n",
" print(\"test Accuracy: {:0.3f} (+/- {:0.3f})\".format(scores.mean(), scores.std() * 2), scores)\n",
"\n",
" datas = dfutr.values\n",
" X, y = datas[:, :-1], datas[:, -1] \n",
" clf_test.fit(X, y)\n",
" pred = clf_test.predict(dfutt.values[:, :-1])\n",
" return pred, clf_test.predict(X)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(1952, 156)\n",
"(313, 156)\n",
"train: 1561 \n",
"test: 391\n",
"train Accuracy: 0.836 (+/- 0.032) [0.84667975 0.83739686 0.81455509 0.82318292 0.82943072 0.86326184]\n",
"test Accuracy: 0.793 (+/- 0.076) [0.83189404 0.79744988 0.71563874 0.78313423 0.82008836 0.81191538]\n"
]
}
],
"source": [
"# yをlog変換するならuse_log_yを使う\n",
"pred, pred_tr = calc_prediction(dfa, use_log_y=True)\n",
"# pred,pred_tr = calc_prediction(dfa, use_log_y=False)\n"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"dfans = dfa[dfa.y.isna()][['id','y','capa']]\n",
"dfans.loc[:, 'y'] = pred\n",
"# log変換した値を元に戻す\n",
"dfans.y = np.e ** dfans.y\n"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" id y capa capa_diff\n",
"613 15892 20281 20281 -755.639411\n",
"46 15912 19694 19694 -1938.952717\n"
]
}
],
"source": [
"# スタジアムのキャパよりも予測値が大きい場合はキャパの値に置き換える\n",
"dfans['capa_diff'] = dfans.capa - dfans.y\n",
"dfans.loc[(dfans['capa_diff'] < 0), 'y'] = dfans[dfans['capa_diff'] < 0].capa * 1\n",
"\n",
"dfans.y = dfans.y.apply(lambda x: round(x))\n",
"print(dfans[dfans['capa_diff'] < 0])\n",
"dfans = dfans[['id','y']]\n",
"\n",
"# csv出力\n",
"dfans.to_csv('ans.csv', header=None, index=None)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 提出準備"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"#!open ans.csv"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"cp ans.csv /Users/esuji/Desktop/"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 以下、精度に繋がらなかった案"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# J1とJ2に分離して前年度順位を考慮\n",
"dfa_j1 = dfa[dfa.stage == 'J1']\n",
"dfa_j2 = dfa[dfa.stage == 'J2']"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# 前年度順位を考慮\n",
"df_2011_j1 = pd.read_html('https://data.j-league.or.jp/SFRT01/?competitionSectionIdLabel=%E6%9C%80%E6%96%B0%E7%AF%80&competitionIdLabel=%EF%BC%AA%E3%83%AA%E3%83%BC%E3%82%B0%E3%80%80%E3%83%87%E3%82%A3%E3%83%93%E3%82%B8%E3%83%A7%E3%83%B3%EF%BC%91&yearIdLabel=2011%E5%B9%B4&yearId=2011&competitionId=298&competitionSectionId=0&search=search')[0]\n",
"df_2011_j1h = pd.read_html('https://data.j-league.or.jp/SFRT01/?competitionSectionIdLabel=%E7%AC%AC%EF%BC%91%EF%BC%97%E7%AF%80&competitionIdLabel=%EF%BC%AA%E3%83%AA%E3%83%BC%E3%82%B0%E3%80%80%E3%83%87%E3%82%A3%E3%83%93%E3%82%B8%E3%83%A7%E3%83%B3%EF%BC%91&yearIdLabel=2011%E5%B9%B4&yearId=2011&competitionId=298&competitionSectionId=17&search=search')[0]\n",
"df_2012_j1 = pd.read_html('https://data.j-league.or.jp/SFRT01/?competitionSectionIdLabel=%E7%AC%AC%EF%BC%93%EF%BC%94%E7%AF%80&competitionIdLabel=%EF%BC%AA%E3%83%AA%E3%83%BC%E3%82%B0%E3%80%80%E3%83%87%E3%82%A3%E3%83%93%E3%82%B8%E3%83%A7%E3%83%B3%EF%BC%91&yearIdLabel=2012%E5%B9%B4&yearId=2012&competitionId=322&competitionSectionId=34&search=search')[0]\n",
"df_2012_j1h = pd.read_html('https://data.j-league.or.jp/SFRT01/?competitionSectionIdLabel=%E7%AC%AC%EF%BC%93%EF%BC%94%E7%AF%80&competitionIdLabel=%EF%BC%AA%E3%83%AA%E3%83%BC%E3%82%B0%E3%80%80%E3%83%87%E3%82%A3%E3%83%93%E3%82%B8%E3%83%A7%E3%83%B3%EF%BC%91&yearIdLabel=2012%E5%B9%B4&yearId=2012&competitionId=322&competitionSectionId=17&search=search')[0]\n",
"df_2013_j1 = pd.read_html('https://data.j-league.or.jp/SFRT01/?competitionSectionIdLabel=%E6%9C%80%E6%96%B0%E7%AF%80&competitionIdLabel=%EF%BC%AA%E3%83%AA%E3%83%BC%E3%82%B0%E3%80%80%E3%83%87%E3%82%A3%E3%83%93%E3%82%B8%E3%83%A7%E3%83%B3%EF%BC%91&yearIdLabel=2013%E5%B9%B4&yearId=2013&competitionId=347&competitionSectionId=0&search=search')[0]\n",
"df_2013_j1h = pd.read_html('https://data.j-league.or.jp/SFRT01/?competitionSectionIdLabel=%E6%9C%80%E6%96%B0%E7%AF%80&competitionIdLabel=%EF%BC%AA%E3%83%AA%E3%83%BC%E3%82%B0%E3%80%80%E3%83%87%E3%82%A3%E3%83%93%E3%82%B8%E3%83%A7%E3%83%B3%EF%BC%91&yearIdLabel=2013%E5%B9%B4&yearId=2013&competitionId=347&competitionSectionId=17&search=search')[0]\n",
"df_2014_j1 = pd.read_html('https://data.j-league.or.jp/SFRT01/?competitionSectionIdLabel=%E7%AC%AC%EF%BC%91%EF%BC%97%E7%AF%80&competitionIdLabel=%EF%BC%AA%E3%83%AA%E3%83%BC%E3%82%B0%E3%80%80%E3%83%87%E3%82%A3%E3%83%93%E3%82%B8%E3%83%A7%E3%83%B3%EF%BC%91&yearIdLabel=2014%E5%B9%B4&yearId=2014&competitionId=372&competitionSectionId=17&search=search')[0]\n",
"\n",
"# 前年度順位を考慮 J2\n",
"df_2011_j2 = pd.read_html('https://data.j-league.or.jp/SFRT01/?competitionSectionIdLabel=%E6%9C%80%E6%96%B0%E7%AF%80&competitionIdLabel=%EF%BC%AA%E3%83%AA%E3%83%BC%E3%82%B0%E3%80%80%E3%83%87%E3%82%A3%E3%83%93%E3%82%B8%E3%83%A7%E3%83%B3%EF%BC%92&yearIdLabel=2011%E5%B9%B4&yearId=2011&competitionId=299&competitionSectionId=0&search=search')[0]\n",
"df_2011_j2h = pd.read_html('https://data.j-league.or.jp/SFRT01/?competitionSectionIdLabel=%E6%9C%80%E6%96%B0%E7%AF%80&competitionIdLabel=%EF%BC%AA%E3%83%AA%E3%83%BC%E3%82%B0%E3%80%80%E3%83%87%E3%82%A3%E3%83%93%E3%82%B8%E3%83%A7%E3%83%B3%EF%BC%92&yearIdLabel=2011%E5%B9%B4&yearId=2011&competitionId=299&competitionSectionId=17&search=search')[0]\n",
"df_2012_j2 = pd.read_html('https://data.j-league.or.jp/SFRT01/?competitionSectionIdLabel=%E6%9C%80%E6%96%B0%E7%AF%80&competitionIdLabel=%EF%BC%AA%E3%83%AA%E3%83%BC%E3%82%B0%E3%80%80%E3%83%87%E3%82%A3%E3%83%93%E3%82%B8%E3%83%A7%E3%83%B3%EF%BC%92&yearIdLabel=2012%E5%B9%B4&yearId=2012&competitionId=323&competitionSectionId=0&search=search')[0]\n",
"df_2012_j2h = pd.read_html('https://data.j-league.or.jp/SFRT01/?competitionSectionIdLabel=%E6%9C%80%E6%96%B0%E7%AF%80&competitionIdLabel=%EF%BC%AA%E3%83%AA%E3%83%BC%E3%82%B0%E3%80%80%E3%83%87%E3%82%A3%E3%83%93%E3%82%B8%E3%83%A7%E3%83%B3%EF%BC%92&yearIdLabel=2012%E5%B9%B4&yearId=2012&competitionId=323&competitionSectionId=17&search=search')[0]\n",
"df_2013_j2 = pd.read_html('https://data.j-league.or.jp/SFRT01/?competitionSectionIdLabel=%E6%9C%80%E6%96%B0%E7%AF%80&competitionIdLabel=%EF%BC%AA%E3%83%AA%E3%83%BC%E3%82%B0%E3%80%80%E3%83%87%E3%82%A3%E3%83%93%E3%82%B8%E3%83%A7%E3%83%B3%EF%BC%92&yearIdLabel=2013%E5%B9%B4&yearId=2013&competitionId=348&competitionSectionId=0&search=search')[0]\n",
"df_2013_j2h = pd.read_html('https://data.j-league.or.jp/SFRT01/?competitionSectionIdLabel=%E6%9C%80%E6%96%B0%E7%AF%80&competitionIdLabel=%EF%BC%AA%E3%83%AA%E3%83%BC%E3%82%B0%E3%80%80%E3%83%87%E3%82%A3%E3%83%93%E3%82%B8%E3%83%A7%E3%83%B3%EF%BC%92&yearIdLabel=2013%E5%B9%B4&yearId=2013&competitionId=348&competitionSectionId=17&search=search')[0]\n",
"df_2014_j2 = pd.read_html('https://data.j-league.or.jp/SFRT01/?competitionSectionIdLabel=%E7%AC%AC%EF%BC%91%EF%BC%97%E7%AF%80&competitionIdLabel=%EF%BC%AA%E3%83%AA%E3%83%BC%E3%82%B0%E3%80%80%E3%83%87%E3%82%A3%E3%83%93%E3%82%B8%E3%83%A7%E3%83%B3%EF%BC%92&yearIdLabel=2014%E5%B9%B4&yearId=2014&competitionId=373&competitionSectionId=17&search=search')[0]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data_dfs_j1 = [df_2011_j1, df_2011_j1h, df_2012_j1, df_2012_j1h, df_2013_j1, df_2013_j1h, df_2014_j1]\n",
"data_dfs_j2 = [df_2011_j2, df_2011_j2h, df_2012_j2, df_2012_j2h, df_2013_j2, df_2013_j2h, df_2014_j2]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dfa_j1.loc[:, 'last_half_period_rank_home'] = ''\n",
"dfa_j1.loc[:, 'last_half_period_rank_away'] = ''\n",
"dfa_j2.loc[:, 'last_half_period_rank_home'] = ''\n",
"dfa_j2.loc[:, 'last_half_period_rank_away'] = ''\n",
"\n",
"new_dfs_j1 = []\n",
"new_dfs_j2 = []\n",
"for idx, data_dfs in enumerate(zip(data_dfs_j1,data_dfs_j2)):\n",
" data_df_j1, data_df_j2 = data_dfs\n",
" rank_dict_j1 = dict(zip(data_df_j1['チーム'], data_df_j1['順位'].apply(lambda x: 'j1_'+str(x).zfill(2))))\n",
" rank_dict_j2 = dict(zip(data_df_j2['チーム'], data_df_j2['順位'].apply(lambda x: 'j2_'+str(x).zfill(2))))\n",
" rank_dict = dict(**rank_dict_j1, **rank_dict_j2)\n",
" \n",
" if idx % 2 == 0: # 前半\n",
" new_df_j1 = dfa_j1.query('year == {} & match <= \"第17節\"'.format(2012 + (idx//2)))\n",
" new_df_j2 = dfa_j2.query('year == {} & match <= \"第17節\"'.format(2012 + (idx//2)))\n",
" elif idx % 2 == 1: # 前半\n",
" new_df_j1 = dfa_j1.query('year == {} & match > \"第17節\"'.format(2012 + (idx//2)))\n",
" new_df_j2 = dfa_j2.query('year == {} & match > \"第17節\"'.format(2012 + (idx//2)))\n",
" \n",
" new_df_j1.loc[:, 'last_half_period_rank_home'] = new_df_j1.home.map(rank_dict)\n",
" new_df_j1.loc[:, 'last_half_period_rank_away'] = new_df_j1.away.map(rank_dict)\n",
" new_df_j2.loc[:, 'last_half_period_rank_home'] = new_df_j2.home.map(rank_dict)\n",
" new_df_j2.loc[:, 'last_half_period_rank_away'] = new_df_j2.away.map(rank_dict)\n",
" \n",
" new_dfs_j1.append(new_df_j1)\n",
" new_dfs_j2.append(new_df_j2)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ranked_df_j1 = pd.concat(new_dfs_j1)# rank_map_j1\n",
"dfa_j1 = dfa_j1.merge(ranked_df_j1, how='right')\n",
"ranked_df_j2 = pd.concat(new_dfs_j2)# rank_map_j1\n",
"dfa_j2 = dfa_j2.merge(ranked_df_j2, how='right')\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment