Skip to content

Instantly share code, notes, and snippets.

@reouno
Last active December 7, 2020 03:52
Show Gist options
  • Save reouno/56a0462ef8cbb73e377df25e22ac0c4e to your computer and use it in GitHub Desktop.
Save reouno/56a0462ef8cbb73e377df25e22ac0c4e to your computer and use it in GitHub Desktop.
金先物が下落した翌日にESは上昇するか
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 小さいエッジの大量探索第一段階 - 他銘柄のデータを利用\n",
"\n",
"常にターゲットはE-mini S&P500先物(ES)。足の単位は自由。予測の方法も自由。予測の内容も自由(次の足の上昇を予測するとか、N足以内の下落を予測するとか)。ただし点推定ではなく分布推定。\n",
"\n",
"[前回の分析](https://gist.github.com/reouno/e67e01064b916de20b78a2bc0ac4e397)で、上昇日の翌日の収益率分布を全期間分布と比較した。今回は金先物が下落した日の翌日のES収益率分布が全期間分布と比較する。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 金先物が下落した翌日にESは上昇するか\n",
"\n",
"### 仮説\n",
"\n",
"株式市場と金は逆相関しやすいと言われている。金先物が下落した翌日はESは上昇すると仮定すると、その収益率分布は全期間収益率分布より右に移動すると予想できる。\n",
"\n",
"### 結論・考察\n",
"\n",
"推定・比較結果については最後の方のセルを参照。\n",
"\n",
"金先物が下落した日の翌日のES収益率分布は、無条件の全期間分布と全くと言って良いほど差がなかったため、金先物の上昇や下落をシグナルとして次の日のESの値動きを予測することはほぼ不可能だと思われる(もちろん、日足データを使った今回の条件では、という意味)。\n",
"\n",
"しかしこの結果は、前提としていた「株式市場と金は逆相関しやすい」という主張を否定する根拠には全くならない。そもそも今回は相関を直接測ったわけではない。また何よりも、今回の比較から推測できることは、「金先物」と「翌日のES」の収益率の関係であって、同日の金先物とESの関係ではない。\n",
"\n",
"ESと他銘柄の相関・逆相関関係をシグナルに利用するなら、今回のような1足のみを見るのではなく、移動平均など複数足の統計データをもとに現在の他銘柄のトレンドを推定し、ESがそれに順行・逆行するという仮説をもとにルールを作るという方法もある。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 準備"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'en_US.UTF-8'"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%matplotlib inline\n",
"\n",
"import numpy as np\n",
"import pandas as pd\n",
"from pandas.api.types import CategoricalDtype\n",
"import matplotlib as mpl\n",
"mpl.rcParams['font.family'] = 'sans-serif'\n",
"mpl.rcParams['font.sans-serif'] = ['Hiragino Maru Gothic Pro', 'Yu Gothic', 'Meirio', 'Takao', 'IPAexGothic', 'IPAPGothic', 'VL PGothic', 'Noto Sans CJK JP']\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"from scipy import stats\n",
"import datetime as dt\n",
"from dateutil.relativedelta import relativedelta\n",
"import locale\n",
"\n",
"# 月や曜日を英語で取得するためこの設定をしておく\n",
"locale.setlocale(locale.LC_TIME, 'en_US.UTF-8')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### E-Mini S&P500先物と金先物データ読み込み\n",
"\n",
"データは、[TradeStationのDesktop Platform](https://www.tradestation.com/platforms-and-tools/desktop/)を使って出力したCSVファイル。公開は禁止されているので公開できないが、Quandlからも[E-mini S&P500先物](https://www.quandl.com/data/CHRIS/CME_ES1-E-mini-S-P-500-Futures-Continuous-Contract-1-ES1-Front-Month)と[金先物](https://www.quandl.com/data/CHRIS/CME_GC1-Gold-Futures-Continuous-Contract-1-GC1-Front-Month)の最新の日足データ(今回の分析で使っているのと同じ)をAPI経由で取得できるので、そのデータを使用可能。ただしquandl経由で取得したDFは若干構造が違うから、以下のセルは多少修正が必要。\n",
"- quandl経由で取得したDFはすでにDatetimeindexになっている。また、終値は'Close'ではなく'Last'。おそらくこの2点だけ違う。"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/leo/src/pyproject/py-envs/py383env/lib/python3.6/site-packages/pandas/core/series.py:679: RuntimeWarning: divide by zero encountered in log\n",
" result = getattr(ufunc, method)(*inputs, **kwargs)\n"
]
}
],
"source": [
"dfsp_tmp = pd.read_csv('data/e-mini-sp500-200530/e-mini-sp500-daily.csv')\n",
"dfg_tmp = pd.read_csv(f'data/gold-200626/gold-daily.csv')\n",
"dfs = [dfsp_tmp, dfg_tmp]\n",
"\n",
"prods = ['S&P500先物', '金先物']\n",
"\n",
"# datetime indexに変換\n",
"def to_datetime_index(df):\n",
" # DateTime列を追加\n",
" df['datetime'] = (df['Date'] + '-' + df['Time']).map(lambda s: dt.datetime.strptime(s, '%m/%d/%Y-%H:%M'))\n",
" df = df.set_index('datetime', drop=True)\n",
" df = df.drop(columns=['Date', 'Time'])\n",
" return df\n",
"\n",
"dfs = [to_datetime_index(df) for df in dfs]\n",
"\n",
"# 対数変換した列を追加\n",
"def add_log_values(df):\n",
" df['logO'] = np.log(df['Open'])\n",
" df['logH'] = np.log(df['High'])\n",
" df['logL'] = np.log(df['Low'])\n",
" df['logC'] = np.log(df['Close'])\n",
" df['logV'] = np.log(df['Vol'])\n",
" df['logOI'] = np.log(df['OI'])\n",
"\n",
"_ = [add_log_values(df) for df in dfs]"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Date</th>\n",
" <th>Time</th>\n",
" <th>Open</th>\n",
" <th>High</th>\n",
" <th>Low</th>\n",
" <th>Close</th>\n",
" <th>Vol</th>\n",
" <th>OI</th>\n",
" <th>datetime</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>09/11/1997</td>\n",
" <td>17:00</td>\n",
" <td>1071.25</td>\n",
" <td>1082.25</td>\n",
" <td>1062.75</td>\n",
" <td>1068.50</td>\n",
" <td>11825</td>\n",
" <td>2909</td>\n",
" <td>1997-09-11 17:00:00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>09/12/1997</td>\n",
" <td>17:00</td>\n",
" <td>1070.50</td>\n",
" <td>1089.00</td>\n",
" <td>1066.00</td>\n",
" <td>1071.25</td>\n",
" <td>9759</td>\n",
" <td>4059</td>\n",
" <td>1997-09-12 17:00:00</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Date Time Open High Low Close Vol OI \\\n",
"0 09/11/1997 17:00 1071.25 1082.25 1062.75 1068.50 11825 2909 \n",
"1 09/12/1997 17:00 1070.50 1089.00 1066.00 1071.25 9759 4059 \n",
"\n",
" datetime \n",
"0 1997-09-11 17:00:00 \n",
"1 1997-09-12 17:00:00 "
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 参考:生のデータフレーム(ES)\n",
"dfsp_tmp.head(2)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Open</th>\n",
" <th>High</th>\n",
" <th>Low</th>\n",
" <th>Close</th>\n",
" <th>Vol</th>\n",
" <th>OI</th>\n",
" <th>logO</th>\n",
" <th>logH</th>\n",
" <th>logL</th>\n",
" <th>logC</th>\n",
" <th>logV</th>\n",
" <th>logOI</th>\n",
" </tr>\n",
" <tr>\n",
" <th>datetime</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1997-09-11 17:00:00</th>\n",
" <td>1071.25</td>\n",
" <td>1082.25</td>\n",
" <td>1062.75</td>\n",
" <td>1068.50</td>\n",
" <td>11825</td>\n",
" <td>2909</td>\n",
" <td>6.976581</td>\n",
" <td>6.986797</td>\n",
" <td>6.968615</td>\n",
" <td>6.974011</td>\n",
" <td>9.377971</td>\n",
" <td>7.975565</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1997-09-12 17:00:00</th>\n",
" <td>1070.50</td>\n",
" <td>1089.00</td>\n",
" <td>1066.00</td>\n",
" <td>1071.25</td>\n",
" <td>9759</td>\n",
" <td>4059</td>\n",
" <td>6.975881</td>\n",
" <td>6.993015</td>\n",
" <td>6.971669</td>\n",
" <td>6.976581</td>\n",
" <td>9.185945</td>\n",
" <td>8.308692</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Open High Low Close Vol OI \\\n",
"datetime \n",
"1997-09-11 17:00:00 1071.25 1082.25 1062.75 1068.50 11825 2909 \n",
"1997-09-12 17:00:00 1070.50 1089.00 1066.00 1071.25 9759 4059 \n",
"\n",
" logO logH logL logC logV \\\n",
"datetime \n",
"1997-09-11 17:00:00 6.976581 6.986797 6.968615 6.974011 9.377971 \n",
"1997-09-12 17:00:00 6.975881 6.993015 6.971669 6.976581 9.185945 \n",
"\n",
" logOI \n",
"datetime \n",
"1997-09-11 17:00:00 7.975565 \n",
"1997-09-12 17:00:00 8.308692 "
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 参考:対数変換データ追加後のデータフレーム(ES)\n",
"dfs[0].head(2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 価格、対数価格、価格階差、対数差収益率(100倍)のDFを作成"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/leo/src/pyproject/py-envs/py383env/lib/python3.6/site-packages/ipykernel_launcher.py:6: SettingWithCopyWarning: \n",
"A value is trying to be set on a copy of a slice from a DataFrame.\n",
"Try using .loc[row_indexer,col_indexer] = value instead\n",
"\n",
"See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
" \n"
]
}
],
"source": [
"def to_log_return_ratio_df(df):\n",
" diff_df = df.diff()\n",
" close_df = df[['Close', 'logC']]\n",
" diff_df = diff_df.rename(columns={'Close': 'CloseDiff', 'logC': 'logCDiff'})\n",
" close_diff_df = diff_df[['CloseDiff', 'logCDiff']]\n",
" close_diff_df['logCDiff'] = close_diff_df['logCDiff'] * 100\n",
" rr_df = pd.concat([close_df, close_diff_df], axis=1)\n",
" rr_df = rr_df.dropna()\n",
" return rr_df\n",
"\n",
"rr_dfs = [to_log_return_ratio_df(df) for df in dfs]"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Close</th>\n",
" <th>logC</th>\n",
" <th>CloseDiff</th>\n",
" <th>logCDiff</th>\n",
" </tr>\n",
" <tr>\n",
" <th>datetime</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1997-09-12 17:00:00</th>\n",
" <td>1071.25</td>\n",
" <td>6.976581</td>\n",
" <td>2.75</td>\n",
" <td>0.257040</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1997-09-15 17:00:00</th>\n",
" <td>1083.75</td>\n",
" <td>6.988183</td>\n",
" <td>12.50</td>\n",
" <td>1.160106</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Close logC CloseDiff logCDiff\n",
"datetime \n",
"1997-09-12 17:00:00 1071.25 6.976581 2.75 0.257040\n",
"1997-09-15 17:00:00 1083.75 6.988183 12.50 1.160106"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 参考:対数差分をとった後は、初日のデータはなくなる(ES)\n",
"rr_dfs[0].head(2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 分布の推定と比較\n",
"\n",
"まず、分析用DF作成。次に全期間のES収益率分布と、金先物下落日の翌日のES収益率分布を推定・比較"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 分析用DF作成\n",
"分析では対数差収益率しか使わないからそれ以外の列は不要\n",
"\n",
"1. ESデータと金先物データを結合したデータフレームを作る\n",
"2. ESの「翌足データ」列を追加\n",
"3. 欠損行を削除\n",
" - 最後の行には翌足データがないため欠損\n",
" - その他にもESと金先物でどちらかが欠けている日というのがあるかもしれない"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"サンプル数: 4759\n"
]
}
],
"source": [
"# まずESの対数差収益率\n",
"df = rr_dfs[0][['logCDiff']].copy()\n",
"df = df.rename(columns={'logCDiff':'ES0'})\n",
"\n",
"# 金先物の対数差収益率\n",
"df['Gold0'] = rr_dfs[1]['logCDiff']\n",
"\n",
"# ESの翌足データ列\n",
"df['ES1'] = df['ES0'][1:].append(pd.Series([np.nan]*1)).to_numpy()\n",
"\n",
"# 欠損行削除\n",
"df = df.dropna()\n",
"print('サンプル数:', df.shape[0])"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ES0</th>\n",
" <th>Gold0</th>\n",
" <th>ES1</th>\n",
" </tr>\n",
" <tr>\n",
" <th>datetime</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2001-05-15 17:00:00</th>\n",
" <td>0.453609</td>\n",
" <td>0.114482</td>\n",
" <td>2.518923</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2001-05-16 17:00:00</th>\n",
" <td>2.518923</td>\n",
" <td>0.635440</td>\n",
" <td>0.240433</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ES0 Gold0 ES1\n",
"datetime \n",
"2001-05-15 17:00:00 0.453609 0.114482 2.518923\n",
"2001-05-16 17:00:00 2.518923 0.635440 0.240433"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 参考:分析用データフレーム\n",
"df.head(2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 金先物下落日の翌日のES収益率データ"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"金先物下落日の割合 = 2227 / 4759 = 46.80%\n"
]
}
],
"source": [
"gold_down = df[df['Gold0'] < 0]\n",
"n_all = df.shape[0]\n",
"n_down = gold_down.shape[0]\n",
"print(f'金先物下落日の割合 = {n_down} / {n_all} = {n_down / n_all * 100:.02f}%')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 全期間のES収益率分布と金先物下落日の翌日のES収益率分布を推定・比較"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"全期間の収益率 のt分布パラメータ\n",
"df=2.124709860191633, loc=0.08216264036605168, scale=0.6710383434367057\n",
"金先物下落日の翌日のES収益率 のt分布パラメータ\n",
"df=2.034014010825797, loc=0.09135443345659722, scale=0.6607535462539509\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 1440x1296 with 3 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"fig, ax = plt.subplots(3, 1, figsize=(20, 18))\n",
"\n",
"# 全期間の収益率分布を描く\n",
"# 金先物下落日の翌日の収益率分布を描く\n",
"# 両者を重ねる\n",
"# 各々をt分布と仮定してパラメータ推定\n",
"plot_data = [df['ES1'], gold_down['ES1']]\n",
"titles = ['全期間の収益率', '金先物下落日の翌日のES収益率']\n",
"colors=['tab:green','tab:pink']\n",
"\n",
"# x軸の範囲を広い方(全期間)に合わせる\n",
"xmin = plot_data[0].min()\n",
"xmax = plot_data[0].max()\n",
"\n",
"# t分布の描画範囲\n",
"xs = np.linspace(xmin, xmax, 300)\n",
"\n",
"t_params = []\n",
"t_ys = []\n",
"for i in range(2):\n",
" sns.histplot(plot_data[i], kde=False, stat='density', color='lightblue', ax=ax[i])\n",
"\n",
" # t分布の当てはめ\n",
" t_params.append(stats.t.fit(plot_data[i]))\n",
" t_ys.append(stats.t.pdf(xs, df=t_params[i][0], loc=t_params[i][1], scale=t_params[i][2]))\n",
"\n",
" ax[i].set_xlim(xmin, xmax)\n",
" ax[i].plot(xs, t_ys[i], color=colors[i])\n",
"\n",
" ax[i].set_title(titles[i], fontweight='semibold', fontsize=16)\n",
"\n",
"# 推定分布を重ねて比較\n",
"ax[2].plot(xs, t_ys[0], label=titles[0], color=colors[0])\n",
"ax[2].plot(xs, t_ys[1], label=titles[1], color=colors[1])\n",
"ax[2].set_xlim(xmin, xmax)\n",
"ax[2].legend()\n",
"\n",
"# パラメータ推定値の確認\n",
"for i in range(2):\n",
" print(titles[i], 'のt分布パラメータ')\n",
" print(f'df={t_params[i][0]}, loc={t_params[i][1]}, scale={t_params[i][2]}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 結果\n",
"\n",
"金先物が下落した日の翌日のES収益率は、全くと言って良いほど変化がなかった。統計的な判断は不要だろう。"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment