FilippoGuerrieri26/fama_macbeth.ipynb

## fama_macbeth.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "ebbe8f0d-f857-4ea2-9eeb-7f851f91905c",
   "metadata": {},
   "source": [
    "# Fama French Regression"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8cb3cc5c-b9a4-4d54-a31d-62e958e70af1",
   "metadata": {},
   "source": [
    "The Fama–MacBeth regression is a method used to estimate parameters for asset pricing models such as the capital asset pricing model (CAPM) or multi factor models. <br>\n",
    "The method estimates the betas and risk premia for any risk factors that are expected to determine asset prices. <br>\n",
    "\n",
    "The Fama MacBeth Regression is a two-step procedure based on:\n",
    "1. Multiple Time series regressions simultaneously performed across different assets (Seemingly Unrelated Regression)\n",
    "2. Multiple Cross Sectional regressions performed across time\n",
    "\n",
    "In this notebook I'll show you how to code the Fama MacBeth procedure and test for statistical significance of risk premia.\n",
    "\n",
    "A few links to relevant papers: <br>\n",
    "https://www.jstor.org/stable/1831028 <br>\n",
    "https://www.jstor.org/stable/2325486 <br>\n",
    "https://www.sciencedirect.com/science/article/abs/pii/S0304405X1630215X <br>\n",
    "https://www.sciencedirect.com/science/article/abs/pii/S0304405X14002323 <br>\n",
    "https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1540-6261.1996.tb05233.x <br>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "329b8de8-b394-491d-b4bc-9e137f819ecc",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "import statsmodels.api as sm\n",
    "from scipy.stats import chi2, f, ttest_1samp, skew, kurtosis\n",
    "from datetime import datetime"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b065024c-9a4a-4313-bec5-4c263bd3b6f9",
   "metadata": {},
   "source": [
    "## 1. Load Data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "97a73575-6849-4c77-9aab-d04bccb63f77",
   "metadata": {},
   "source": [
    "As of data, we make use of stock returns from companies belonging to S&P500 and factors downloaded from Kenneth French website. <br>\n",
    "Both series are resampled to monthly requency"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "b66d8c76-0690-4095-aea9-ec642fbf2aa8",
   "metadata": {},
   "outputs": [],
   "source": [
    "returns = pd.read_excel(\"../sep500_returns.xlsx\")\n",
    "factors = pd.read_excel(\"../FF_factors_monthly.xlsx\")\n",
    "returns.set_index(\"Date\", inplace=True)\n",
    "factors.set_index(\"Date\", inplace=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "2b05c4f0-f399-4d8a-89a9-9e6b54ae8ad0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(                   A       AAL      AAPL       ABT      ACGL       ACN  \\\n",
       " Date                                                                     \n",
       " 2010-02-28  0.122369  0.380414  0.065396  0.025312  0.034107 -0.024884   \n",
       " 2010-03-31  0.093134  0.002728  0.148471 -0.029477  0.030684  0.049537   \n",
       " 2010-04-30  0.054376 -0.038095  0.111021 -0.020672 -0.008787  0.049417   \n",
       " \n",
       "                 ADBE       ADI       ADM       ADP  ...       CPT      GOOG  \\\n",
       " Date                                                ...                       \n",
       " 2010-02-28  0.072755  0.084570 -0.015393  0.020103  ...  0.033015 -0.005925   \n",
       " 2010-03-31  0.020779 -0.007663 -0.015667  0.077120  ...  0.050538  0.076538   \n",
       " 2010-04-30 -0.050042  0.038514 -0.033218 -0.024735  ...  0.163344 -0.073036   \n",
       " \n",
       "                GOOGL       KMX       LNT       MMM        MO         T  \\\n",
       " Date                                                                     \n",
       " 2010-02-28 -0.005925 -0.021328  0.013782  0.002315  0.013092 -0.021688   \n",
       " 2010-03-31  0.076538  0.244180  0.051533  0.042670  0.037319  0.041516   \n",
       " 2010-04-30 -0.073036 -0.021895  0.040277  0.061026  0.032651  0.030290   \n",
       " \n",
       "                 TECH       WRB  \n",
       " Date                            \n",
       " 2010-02-28 -0.021751  0.057953  \n",
       " 2010-03-31 -0.005319  0.015962  \n",
       " 2010-04-30  0.041995  0.034879  \n",
       " \n",
       " [3 rows x 87 columns],\n",
       "               Mkt-RF       SMB       HML       RF\n",
       " Date                                             \n",
       " 2010-02-28  0.034190  0.011231  0.032191  0.00000\n",
       " 2010-03-31  0.062818  0.013868  0.021340  0.00000\n",
       " 2010-04-30  0.020073  0.047517  0.028705  0.00021)"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "returns.head(3), factors.head(3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "81619371-155e-4e0c-b396-ead5577f96be",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Data Massaging\n",
    "excess_returns = returns.sub(factors.RF, axis=0).reset_index(drop=True)\n",
    "factors.drop(columns=[\"RF\"], inplace=True)\n",
    "factors.reset_index(drop=True, inplace=True)\n",
    "factor_names = factors.columns.to_list()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "f1f9bcb6-e103-4c51-a994-33dfe78e2c8a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "((162, 87), (162, 3))"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "excess_returns.shape, factors.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4d2a1793-3b1c-4a8e-a4bb-4b4d886fc4b1",
   "metadata": {},
   "source": [
    "## 2. FMB Step 1: Rolling Time Series Regression"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4d5913af-ce1d-416c-ab60-45a073b770e6",
   "metadata": {},
   "source": [
    "The first step is a rolling time series regression to obtain estimates of stocks' exposures to factors (a.k.a. \"loadings)\" <br>\n",
    "For each asset in our dataset, we estimate a regression that looks like:\n",
    "\n",
    "$r_{i,t} = \\alpha_i + \\beta_{i, F_i}F_{i,t} + ... + \\beta_{i, F_k}F_{k,t} + \\epsilon_{i,t}$\n",
    "\n",
    "where k is the number of factors. <br>\n",
    "\n",
    "We store each asset's beta exposure, alongside with the mean returns and the regression's residuals."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "7145704e-5c14-4095-84e0-3f2742847bf3",
   "metadata": {},
   "outputs": [],
   "source": [
    "window = 12\n",
    "result_dict = dict()\n",
    "mean_excess_ret = dict()\n",
    "mean_residuals = dict()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "d36363e8-e275-499a-8b9b-19e6a8327967",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "21.5 s ± 1.08 s per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
     ]
    }
   ],
   "source": [
    "%%timeit\n",
    "for j in excess_returns.columns:\n",
    "    stock_mean_ret = []\n",
    "    coeff_res = []\n",
    "    idiosyncratic_res = []\n",
    "    n_periods = excess_returns.shape[0] - window + 1\n",
    "\n",
    "    # print(\"Running regressions for stock {j}\".format(j=j))\n",
    "\n",
    "    for n in range(n_periods):\n",
    "        stock_excess_ret = excess_returns.loc[n: n + window, j].values\n",
    "        factor_returns = sm.add_constant(factors.loc[n : n + window, :].copy()).values\n",
    "        stock_mean_ret.append(stock_excess_ret.mean())\n",
    "\n",
    "        model = sm.OLS(stock_excess_ret, factor_returns)\n",
    "        model_res = model.fit()\n",
    "        coeff_res.append(model_res.params)\n",
    "        idiosyncratic_res.append(model_res.resid.mean())\n",
    "\n",
    "    result_dict[j] = pd.DataFrame(coeff_res, columns=[\"Intercept\"] + factor_names)\n",
    "    mean_excess_ret[j] = pd.DataFrame(stock_mean_ret, columns=[j])\n",
    "    mean_residuals[j] = pd.DataFrame(idiosyncratic_res, columns=[j])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "d6677dff-d3ec-4f1a-991a-e17d6dbdcc2d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Export output to dataframes for ease of access\n",
    "expected_returns = pd.concat([mean_excess_ret[k] for k in result_dict.keys()], axis=1)\n",
    "expected_returns.index = pd.to_datetime(returns.index[window- 1  :], format=\"%Y%m\")\n",
    "\n",
    "residuals = pd.concat([mean_residuals[k] for k in result_dict.keys()], axis=1)\n",
    "residuals.index = pd.to_datetime(returns.index[window- 1  :], format=\"%Y%m\")\n",
    "\n",
    "alphas = pd.DataFrame({k: result_dict[k].Intercept for k in result_dict.keys()})\n",
    "alphas.index = pd.to_datetime(returns.index[window- 1  :], format=\"%Y%m\")\n",
    "\n",
    "excess_mkt_ret = pd.DataFrame({k: result_dict[k][\"Mkt-RF\"] for k in result_dict.keys()})\n",
    "excess_mkt_ret.index = pd.to_datetime(returns.index[window- 1  :], format=\"%Y%m\")\n",
    "\n",
    "smb = pd.DataFrame({k: result_dict[k][\"SMB\"] for k in result_dict.keys()})\n",
    "smb.index = pd.to_datetime(returns.index[window- 1  :], format=\"%Y%m\")\n",
    "\n",
    "hml = pd.DataFrame({k: result_dict[k][\"HML\"] for k in result_dict.keys()})\n",
    "hml.index = pd.to_datetime(returns.index[window- 1  :], format=\"%Y%m\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "91622d55-a004-423b-8255-74c825c6bc5a",
   "metadata": {},
   "source": [
    "Should we wish to test whether our factor model performs well in explaining stock returns, we may perform a joint test on alpha across all assets <br>\n",
    "This can be done either via a GRS (Gibbons-Ross-Shanken) Test, or a Wald Test"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "8654b4e8-25e9-45a9-94ee-ed9d98069ff8",
   "metadata": {},
   "outputs": [],
   "source": [
    "T = excess_returns.shape[0]  # Number of observations\n",
    "N = excess_returns.shape[1]  # Number of assets\n",
    "k = factors.shape[1]  # Number of Factors included in our model"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cc920fc2-0b0e-4135-ae58-613a0ac062f0",
   "metadata": {},
   "source": [
    "### GRS Test"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c704a783-2f03-443a-8eef-5ca08805052c",
   "metadata": {},
   "source": [
    "The Gibbons-Ross-Shanken Test is used to jointly assess wheter the vector of alphas is statistically different from zero. <br>\n",
    "Using the test in small samples might be problematic though, as it requires a normal distribution of residuals. Central Limit Theorem does not \"bite\" in finite sample, so we might have to perform another test. <br>\n",
    "Let's have a look!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "911c7968-b907-4dd0-8a56-28a7b49aa648",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "p_value [[0.]]\n"
     ]
    }
   ],
   "source": [
    "# GRS Test\n",
    "sigma_hat = residuals.cov()\n",
    "sigma_inv = np.linalg.inv(sigma_hat)\n",
    "omega_hat = np.linalg.inv(factors.cov())\n",
    "alpha_hat_roll = alphas.mean().to_frame()\n",
    "mu_hat = factors.mean()\n",
    "\n",
    "a = (T - N - k) / N\n",
    "b = np.dot(np.dot(alpha_hat_roll.T, sigma_inv), alpha_hat_roll)\n",
    "c = (mu_hat.T @ omega_hat @ mu_hat) ** (-1)\n",
    "\n",
    "grs = a * b * c\n",
    "dfn = N\n",
    "dfd = T - N - k\n",
    "\n",
    "p_val = 1 - f.cdf(grs, dfn, dfd)\n",
    "print(\"p_value\", p_val)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ed3b72c2-cf4b-43d0-ae88-ec21d5dacff6",
   "metadata": {},
   "source": [
    "We can now test for normality by implementing a Jarque-Bera Test"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "b10d2505-684f-405e-a619-3028623954bb",
   "metadata": {},
   "outputs": [],
   "source": [
    "def jarque_bera_test(data, cutoff=0.05):\n",
    "    \"\"\"\n",
    "    Returns the result for a JB test. If the data come from a normal distribution, the JB stat\n",
    "    has a chi-squared distribution with two degrees of freedom.\n",
    "    JB = n(S^2 / 6 + (K - 3)^2 / 24)\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    data: univariate series\n",
    "    cutoff: level of significance of the test\n",
    "\n",
    "    \"\"\"\n",
    "    n = data.shape[0]\n",
    "\n",
    "    if n < 500:\n",
    "        print('Warning: JB Test works better with large sample sizes (> 500)')\n",
    "    skewness = skew(data)\n",
    "    kurt = kurtosis(data)\n",
    "\n",
    "    S = float(n) * (skewness ** 2 / 6 + (kurt - 3) ** 2 / 24)\n",
    "\n",
    "    t = chi2(2).ppf(cutoff)\n",
    "    if S < t:\n",
    "        print(f'No evidence to reject as non-Normal according to the Jarque-Bera test.'\n",
    "              f'\\n t_stat {S} < critical_value {t}')\n",
    "    else:\n",
    "        print(f\"Reject that is Normal according to the Jarque-Bera test.\"\n",
    "              f\"\\n t_stat {S} > critical_value {t}\")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "ede04cb3-13c4-4e84-98d6-b03116e70f50",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Warning: JB Test works better with large sample sizes (> 500)\n",
      "Reject that is Normal according to the Jarque-Bera test.\n",
      " t_stat 15.111063033095249 > critical_value 0.10258658877510106\n"
     ]
    }
   ],
   "source": [
    "jarque_bera_test(residuals.mean())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "978c6cbe-fa8d-4d74-a802-3330cabc2080",
   "metadata": {},
   "source": [
    "Given that we reject the hypothesis of normality, we'd rather perform a Wald Test"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6507b17b-750e-41dd-8b26-c0b0b0bfa24b",
   "metadata": {},
   "source": [
    "### Wald Test"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "7e19b349-988f-4a91-b245-0e59f3b6c49e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "p_value [[0.]]\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/var/folders/km/kckkd8tx1td_4pncndqqh7lm0000gn/T/ipykernel_19057/1223772719.py:4: RuntimeWarning: invalid value encountered in reciprocal\n",
      "  b = (1 + a) ** (-1)\n"
     ]
    }
   ],
   "source": [
    "# Wald Test\n",
    "alpha_hat_roll = alphas.mean().to_frame()\n",
    "a = ((alpha_hat_roll.mean() / alpha_hat_roll.std()) ** 2).values\n",
    "b = (1 + a) ** (-1)\n",
    "sigma_hat = residuals.cov()\n",
    "sigma_inv = np.linalg.inv(sigma_hat)\n",
    "c = np.dot(np.dot(alpha_hat_roll.T, sigma_inv), alpha_hat_roll)\n",
    "\n",
    "w_test = T * b * c\n",
    "\n",
    "# print(w_test)\n",
    "print(\"p_value\", 1 - chi2.cdf(w_test, N))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "da61fd48-d8d0-42e5-ac21-9dbd64576b13",
   "metadata": {},
   "source": [
    "## 3. FMB Step 2: Cross-Sectional Regression at each point in time"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4b6c57d7-eb7f-496b-9f6f-58a5421f6be8",
   "metadata": {},
   "source": [
    "The second step consists in running T cross-sectional regressions at each point in time, to obtain estimates of the risk premia.<br>\n",
    "The cross sectional regressions will look like:\n",
    "\n",
    "$r_t = \\gamma_{t,0} + \\gamma_{t,1}\\beta_{1,F_1} + ... + \\gamma_{t,K}\\beta_{t,K} + \\epsilon_t$\n",
    "\n",
    "So, when running our T regressions, we store the vectors of risk premia estimates in order ot test for their significance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "69562c4d-78dc-4650-99cd-9535eb8537ad",
   "metadata": {},
   "outputs": [],
   "source": [
    "cs_alphas = []\n",
    "cs_lambdas = []\n",
    "\n",
    "for d in alphas.index:\n",
    "    y_d = (expected_returns.loc[d, :].copy()).values\n",
    "    x_d = sm.add_constant(pd.concat([excess_mkt_ret.loc[d], smb.loc[d], hml.loc[d]], axis=1)).values\n",
    "\n",
    "    model = sm.OLS(y_d, x_d)\n",
    "    model_res = model.fit()\n",
    "    cs_alphas.append(model_res.params[0])\n",
    "    cs_lambdas.append(model_res.params[1 :])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "8f8cb7b6-1808-45a2-856e-47aac5053cbf",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(               Alpha\n",
       " Date                \n",
       " 2011-01-31  0.017211\n",
       " 2011-02-28  0.016512\n",
       " 2011-03-31  0.016904\n",
       " 2011-04-30  0.015093\n",
       " 2011-05-31  0.022389,\n",
       "               Mkt-RF       SMB       HML\n",
       " Date                                    \n",
       " 2011-01-31  0.009308  0.007133  0.004950\n",
       " 2011-02-28  0.007738  0.006092  0.004076\n",
       " 2011-03-31  0.004843  0.004566  0.000417\n",
       " 2011-04-30  0.003518  0.001721 -0.002297\n",
       " 2011-05-31  0.001506  0.003647 -0.001471)"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cs_alphas = pd.DataFrame(cs_alphas, index=alphas.index, columns=[\"Alpha\"])\n",
    "risk_premia = pd.DataFrame(cs_lambdas, index=alphas.index, columns=factor_names)\n",
    "cs_alphas.head(), risk_premia.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "95f6922d-42ff-45e0-89c9-958c9de296bf",
   "metadata": {},
   "source": [
    "### Test Risk Premia"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c050fad7-b35b-4470-8572-f553600c5c8f",
   "metadata": {},
   "source": [
    "We now perform a ttest on each factor's risk premium to estblish whether it is statistically different from 0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "91d94148-f10f-48a8-b4c7-a8d07076192a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Factor Mkt-RF risk premia is statistically significant!\n",
      "Factor SMB risk premia is NOT statistically significant!\n",
      "Factor HML risk premia is statistically significant!\n"
     ]
    }
   ],
   "source": [
    "for factor in factor_names:\n",
    "    pval = ttest_1samp(risk_premia[factor], popmean=0.0)[1]\n",
    "    print(\"Factor {f} risk premia is statistically significant!\".format(f=factor)) if pval < 0.05 else print(\"Factor {f} risk premia is NOT statistically significant!\".format(f=factor))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
	{
	"cells": [
	{
	"cell_type": "markdown",
	"id": "ebbe8f0d-f857-4ea2-9eeb-7f851f91905c",
	"metadata": {},
	"source": [
	"# Fama French Regression"
	]
	},
	{
	"cell_type": "markdown",
	"id": "8cb3cc5c-b9a4-4d54-a31d-62e958e70af1",
	"metadata": {},
	"source": [
	"The Fama–MacBeth regression is a method used to estimate parameters for asset pricing models such as the capital asset pricing model (CAPM) or multi factor models. <br>\n",
	"The method estimates the betas and risk premia for any risk factors that are expected to determine asset prices. <br>\n",
	"\n",
	"The Fama MacBeth Regression is a two-step procedure based on:\n",
	"1. Multiple Time series regressions simultaneously performed across different assets (Seemingly Unrelated Regression)\n",
	"2. Multiple Cross Sectional regressions performed across time\n",
	"\n",
	"In this notebook I'll show you how to code the Fama MacBeth procedure and test for statistical significance of risk premia.\n",
	"\n",
	"A few links to relevant papers: <br>\n",
	"https://www.jstor.org/stable/1831028 <br>\n",
	"https://www.jstor.org/stable/2325486 <br>\n",
	"https://www.sciencedirect.com/science/article/abs/pii/S0304405X1630215X <br>\n",
	"https://www.sciencedirect.com/science/article/abs/pii/S0304405X14002323 <br>\n",
	"https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1540-6261.1996.tb05233.x <br>"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 1,
	"id": "329b8de8-b394-491d-b4bc-9e137f819ecc",
	"metadata": {},
	"outputs": [],
	"source": [
	"import pandas as pd\n",
	"import numpy as np\n",
	"import statsmodels.api as sm\n",
	"from scipy.stats import chi2, f, ttest_1samp, skew, kurtosis\n",
	"from datetime import datetime"
	]
	},
	{
	"cell_type": "markdown",
	"id": "b065024c-9a4a-4313-bec5-4c263bd3b6f9",
	"metadata": {},
	"source": [
	"## 1. Load Data"
	]
	},
	{
	"cell_type": "markdown",
	"id": "97a73575-6849-4c77-9aab-d04bccb63f77",
	"metadata": {},
	"source": [
	"As of data, we make use of stock returns from companies belonging to S&P500 and factors downloaded from Kenneth French website. <br>\n",
	"Both series are resampled to monthly requency"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 2,
	"id": "b66d8c76-0690-4095-aea9-ec642fbf2aa8",
	"metadata": {},
	"outputs": [],
	"source": [
	"returns = pd.read_excel(\"../sep500_returns.xlsx\")\n",
	"factors = pd.read_excel(\"../FF_factors_monthly.xlsx\")\n",
	"returns.set_index(\"Date\", inplace=True)\n",
	"factors.set_index(\"Date\", inplace=True)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 3,
	"id": "2b05c4f0-f399-4d8a-89a9-9e6b54ae8ad0",
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"( A AAL AAPL ABT ACGL ACN \\\n",
	" Date \n",
	" 2010-02-28 0.122369 0.380414 0.065396 0.025312 0.034107 -0.024884 \n",
	" 2010-03-31 0.093134 0.002728 0.148471 -0.029477 0.030684 0.049537 \n",
	" 2010-04-30 0.054376 -0.038095 0.111021 -0.020672 -0.008787 0.049417 \n",
	" \n",
	" ADBE ADI ADM ADP ... CPT GOOG \\\n",
	" Date ... \n",
	" 2010-02-28 0.072755 0.084570 -0.015393 0.020103 ... 0.033015 -0.005925 \n",
	" 2010-03-31 0.020779 -0.007663 -0.015667 0.077120 ... 0.050538 0.076538 \n",
	" 2010-04-30 -0.050042 0.038514 -0.033218 -0.024735 ... 0.163344 -0.073036 \n",
	" \n",
	" GOOGL KMX LNT MMM MO T \\\n",
	" Date \n",
	" 2010-02-28 -0.005925 -0.021328 0.013782 0.002315 0.013092 -0.021688 \n",
	" 2010-03-31 0.076538 0.244180 0.051533 0.042670 0.037319 0.041516 \n",
	" 2010-04-30 -0.073036 -0.021895 0.040277 0.061026 0.032651 0.030290 \n",
	" \n",
	" TECH WRB \n",
	" Date \n",
	" 2010-02-28 -0.021751 0.057953 \n",
	" 2010-03-31 -0.005319 0.015962 \n",
	" 2010-04-30 0.041995 0.034879 \n",
	" \n",
	" [3 rows x 87 columns],\n",
	" Mkt-RF SMB HML RF\n",
	" Date \n",
	" 2010-02-28 0.034190 0.011231 0.032191 0.00000\n",
	" 2010-03-31 0.062818 0.013868 0.021340 0.00000\n",
	" 2010-04-30 0.020073 0.047517 0.028705 0.00021)"
	]
	},
	"execution_count": 3,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"returns.head(3), factors.head(3)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 4,
	"id": "81619371-155e-4e0c-b396-ead5577f96be",
	"metadata": {},
	"outputs": [],
	"source": [
	"# Data Massaging\n",
	"excess_returns = returns.sub(factors.RF, axis=0).reset_index(drop=True)\n",
	"factors.drop(columns=[\"RF\"], inplace=True)\n",
	"factors.reset_index(drop=True, inplace=True)\n",
	"factor_names = factors.columns.to_list()"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 5,
	"id": "f1f9bcb6-e103-4c51-a994-33dfe78e2c8a",
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"((162, 87), (162, 3))"
	]
	},
	"execution_count": 5,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"excess_returns.shape, factors.shape"
	]
	},
	{
	"cell_type": "markdown",
	"id": "4d2a1793-3b1c-4a8e-a4bb-4b4d886fc4b1",
	"metadata": {},
	"source": [
	"## 2. FMB Step 1: Rolling Time Series Regression"
	]
	},
	{
	"cell_type": "markdown",
	"id": "4d5913af-ce1d-416c-ab60-45a073b770e6",
	"metadata": {},
	"source": [
	"The first step is a rolling time series regression to obtain estimates of stocks' exposures to factors (a.k.a. \"loadings)\" <br>\n",
	"For each asset in our dataset, we estimate a regression that looks like:\n",
	"\n",
	"$r_{i,t} = \\alpha_i + \\beta_{i, F_i}F_{i,t} + ... + \\beta_{i, F_k}F_{k,t} + \\epsilon_{i,t}$\n",
	"\n",
	"where k is the number of factors. <br>\n",
	"\n",
	"We store each asset's beta exposure, alongside with the mean returns and the regression's residuals."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 6,
	"id": "7145704e-5c14-4095-84e0-3f2742847bf3",
	"metadata": {},
	"outputs": [],
	"source": [
	"window = 12\n",
	"result_dict = dict()\n",
	"mean_excess_ret = dict()\n",
	"mean_residuals = dict()"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 7,
	"id": "d36363e8-e275-499a-8b9b-19e6a8327967",
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"21.5 s ± 1.08 s per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
	]
	}
	],
	"source": [
	"%%timeit\n",
	"for j in excess_returns.columns:\n",
	" stock_mean_ret = []\n",
	" coeff_res = []\n",
	" idiosyncratic_res = []\n",
	" n_periods = excess_returns.shape[0] - window + 1\n",
	"\n",
	" # print(\"Running regressions for stock {j}\".format(j=j))\n",
	"\n",
	" for n in range(n_periods):\n",
	" stock_excess_ret = excess_returns.loc[n: n + window, j].values\n",
	" factor_returns = sm.add_constant(factors.loc[n : n + window, :].copy()).values\n",
	" stock_mean_ret.append(stock_excess_ret.mean())\n",
	"\n",
	" model = sm.OLS(stock_excess_ret, factor_returns)\n",
	" model_res = model.fit()\n",
	" coeff_res.append(model_res.params)\n",
	" idiosyncratic_res.append(model_res.resid.mean())\n",
	"\n",
	" result_dict[j] = pd.DataFrame(coeff_res, columns=[\"Intercept\"] + factor_names)\n",
	" mean_excess_ret[j] = pd.DataFrame(stock_mean_ret, columns=[j])\n",
	" mean_residuals[j] = pd.DataFrame(idiosyncratic_res, columns=[j])"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 8,
	"id": "d6677dff-d3ec-4f1a-991a-e17d6dbdcc2d",
	"metadata": {},
	"outputs": [],
	"source": [
	"# Export output to dataframes for ease of access\n",
	"expected_returns = pd.concat([mean_excess_ret[k] for k in result_dict.keys()], axis=1)\n",
	"expected_returns.index = pd.to_datetime(returns.index[window- 1 :], format=\"%Y%m\")\n",
	"\n",
	"residuals = pd.concat([mean_residuals[k] for k in result_dict.keys()], axis=1)\n",
	"residuals.index = pd.to_datetime(returns.index[window- 1 :], format=\"%Y%m\")\n",
	"\n",
	"alphas = pd.DataFrame({k: result_dict[k].Intercept for k in result_dict.keys()})\n",
	"alphas.index = pd.to_datetime(returns.index[window- 1 :], format=\"%Y%m\")\n",
	"\n",
	"excess_mkt_ret = pd.DataFrame({k: result_dict[k][\"Mkt-RF\"] for k in result_dict.keys()})\n",
	"excess_mkt_ret.index = pd.to_datetime(returns.index[window- 1 :], format=\"%Y%m\")\n",
	"\n",
	"smb = pd.DataFrame({k: result_dict[k][\"SMB\"] for k in result_dict.keys()})\n",
	"smb.index = pd.to_datetime(returns.index[window- 1 :], format=\"%Y%m\")\n",
	"\n",
	"hml = pd.DataFrame({k: result_dict[k][\"HML\"] for k in result_dict.keys()})\n",
	"hml.index = pd.to_datetime(returns.index[window- 1 :], format=\"%Y%m\")"
	]
	},
	{
	"cell_type": "markdown",
	"id": "91622d55-a004-423b-8255-74c825c6bc5a",
	"metadata": {},
	"source": [
	"Should we wish to test whether our factor model performs well in explaining stock returns, we may perform a joint test on alpha across all assets <br>\n",
	"This can be done either via a GRS (Gibbons-Ross-Shanken) Test, or a Wald Test"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 9,
	"id": "8654b4e8-25e9-45a9-94ee-ed9d98069ff8",
	"metadata": {},
	"outputs": [],
	"source": [
	"T = excess_returns.shape[0] # Number of observations\n",
	"N = excess_returns.shape[1] # Number of assets\n",
	"k = factors.shape[1] # Number of Factors included in our model"
	]
	},
	{
	"cell_type": "markdown",
	"id": "cc920fc2-0b0e-4135-ae58-613a0ac062f0",
	"metadata": {},
	"source": [
	"### GRS Test"
	]
	},
	{
	"cell_type": "markdown",
	"id": "c704a783-2f03-443a-8eef-5ca08805052c",
	"metadata": {},
	"source": [
	"The Gibbons-Ross-Shanken Test is used to jointly assess wheter the vector of alphas is statistically different from zero. <br>\n",
	"Using the test in small samples might be problematic though, as it requires a normal distribution of residuals. Central Limit Theorem does not \"bite\" in finite sample, so we might have to perform another test. <br>\n",
	"Let's have a look!"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 10,
	"id": "911c7968-b907-4dd0-8a56-28a7b49aa648",
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"p_value [[0.]]\n"
	]
	}
	],
	"source": [
	"# GRS Test\n",
	"sigma_hat = residuals.cov()\n",
	"sigma_inv = np.linalg.inv(sigma_hat)\n",
	"omega_hat = np.linalg.inv(factors.cov())\n",
	"alpha_hat_roll = alphas.mean().to_frame()\n",
	"mu_hat = factors.mean()\n",
	"\n",
	"a = (T - N - k) / N\n",
	"b = np.dot(np.dot(alpha_hat_roll.T, sigma_inv), alpha_hat_roll)\n",
	"c = (mu_hat.T @ omega_hat @ mu_hat) ** (-1)\n",
	"\n",
	"grs = a * b * c\n",
	"dfn = N\n",
	"dfd = T - N - k\n",
	"\n",
	"p_val = 1 - f.cdf(grs, dfn, dfd)\n",
	"print(\"p_value\", p_val)"
	]
	},
	{
	"cell_type": "markdown",
	"id": "ed3b72c2-cf4b-43d0-ae88-ec21d5dacff6",
	"metadata": {},
	"source": [
	"We can now test for normality by implementing a Jarque-Bera Test"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 11,
	"id": "b10d2505-684f-405e-a619-3028623954bb",
	"metadata": {},
	"outputs": [],
	"source": [
	"def jarque_bera_test(data, cutoff=0.05):\n",
	" \"\"\"\n",
	" Returns the result for a JB test. If the data come from a normal distribution, the JB stat\n",
	" has a chi-squared distribution with two degrees of freedom.\n",
	" JB = n(S^2 / 6 + (K - 3)^2 / 24)\n",
	"\n",
	" Parameters\n",
	" ----------\n",
	" data: univariate series\n",
	" cutoff: level of significance of the test\n",
	"\n",
	" \"\"\"\n",
	" n = data.shape[0]\n",
	"\n",
	" if n < 500:\n",
	" print('Warning: JB Test works better with large sample sizes (> 500)')\n",
	" skewness = skew(data)\n",
	" kurt = kurtosis(data)\n",
	"\n",
	" S = float(n) * (skewness 2 / 6 + (kurt - 3) 2 / 24)\n",
	"\n",
	" t = chi2(2).ppf(cutoff)\n",
	" if S < t:\n",
	" print(f'No evidence to reject as non-Normal according to the Jarque-Bera test.'\n",
	" f'\\n t_stat {S} < critical_value {t}')\n",
	" else:\n",
	" print(f\"Reject that is Normal according to the Jarque-Bera test.\"\n",
	" f\"\\n t_stat {S} > critical_value {t}\")\n"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 12,
	"id": "ede04cb3-13c4-4e84-98d6-b03116e70f50",
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"Warning: JB Test works better with large sample sizes (> 500)\n",
	"Reject that is Normal according to the Jarque-Bera test.\n",
	" t_stat 15.111063033095249 > critical_value 0.10258658877510106\n"
	]
	}
	],
	"source": [
	"jarque_bera_test(residuals.mean())"
	]
	},
	{
	"cell_type": "markdown",
	"id": "978c6cbe-fa8d-4d74-a802-3330cabc2080",
	"metadata": {},
	"source": [
	"Given that we reject the hypothesis of normality, we'd rather perform a Wald Test"
	]
	},
	{
	"cell_type": "markdown",
	"id": "6507b17b-750e-41dd-8b26-c0b0b0bfa24b",
	"metadata": {},
	"source": [
	"### Wald Test"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 13,
	"id": "7e19b349-988f-4a91-b245-0e59f3b6c49e",
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"p_value [[0.]]\n"
	]
	},
	{
	"name": "stderr",
	"output_type": "stream",
	"text": [
	"/var/folders/km/kckkd8tx1td_4pncndqqh7lm0000gn/T/ipykernel_19057/1223772719.py:4: RuntimeWarning: invalid value encountered in reciprocal\n",
	" b = (1 + a) ** (-1)\n"
	]
	}
	],
	"source": [
	"# Wald Test\n",
	"alpha_hat_roll = alphas.mean().to_frame()\n",
	"a = ((alpha_hat_roll.mean() / alpha_hat_roll.std()) ** 2).values\n",
	"b = (1 + a) ** (-1)\n",
	"sigma_hat = residuals.cov()\n",
	"sigma_inv = np.linalg.inv(sigma_hat)\n",
	"c = np.dot(np.dot(alpha_hat_roll.T, sigma_inv), alpha_hat_roll)\n",
	"\n",
	"w_test = T * b * c\n",
	"\n",
	"# print(w_test)\n",
	"print(\"p_value\", 1 - chi2.cdf(w_test, N))"
	]
	},
	{
	"cell_type": "markdown",
	"id": "da61fd48-d8d0-42e5-ac21-9dbd64576b13",
	"metadata": {},
	"source": [
	"## 3. FMB Step 2: Cross-Sectional Regression at each point in time"
	]
	},
	{
	"cell_type": "markdown",
	"id": "4b6c57d7-eb7f-496b-9f6f-58a5421f6be8",
	"metadata": {},
	"source": [
	"The second step consists in running T cross-sectional regressions at each point in time, to obtain estimates of the risk premia.<br>\n",
	"The cross sectional regressions will look like:\n",
	"\n",
	"$r_t = \\gamma_{t,0} + \\gamma_{t,1}\\beta_{1,F_1} + ... + \\gamma_{t,K}\\beta_{t,K} + \\epsilon_t$\n",
	"\n",
	"So, when running our T regressions, we store the vectors of risk premia estimates in order ot test for their significance."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 14,
	"id": "69562c4d-78dc-4650-99cd-9535eb8537ad",
	"metadata": {},
	"outputs": [],
	"source": [
	"cs_alphas = []\n",
	"cs_lambdas = []\n",
	"\n",
	"for d in alphas.index:\n",
	" y_d = (expected_returns.loc[d, :].copy()).values\n",
	" x_d = sm.add_constant(pd.concat([excess_mkt_ret.loc[d], smb.loc[d], hml.loc[d]], axis=1)).values\n",
	"\n",
	" model = sm.OLS(y_d, x_d)\n",
	" model_res = model.fit()\n",
	" cs_alphas.append(model_res.params[0])\n",
	" cs_lambdas.append(model_res.params[1 :])"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 15,
	"id": "8f8cb7b6-1808-45a2-856e-47aac5053cbf",
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"( Alpha\n",
	" Date \n",
	" 2011-01-31 0.017211\n",
	" 2011-02-28 0.016512\n",
	" 2011-03-31 0.016904\n",
	" 2011-04-30 0.015093\n",
	" 2011-05-31 0.022389,\n",
	" Mkt-RF SMB HML\n",
	" Date \n",
	" 2011-01-31 0.009308 0.007133 0.004950\n",
	" 2011-02-28 0.007738 0.006092 0.004076\n",
	" 2011-03-31 0.004843 0.004566 0.000417\n",
	" 2011-04-30 0.003518 0.001721 -0.002297\n",
	" 2011-05-31 0.001506 0.003647 -0.001471)"
	]
	},
	"execution_count": 15,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"cs_alphas = pd.DataFrame(cs_alphas, index=alphas.index, columns=[\"Alpha\"])\n",
	"risk_premia = pd.DataFrame(cs_lambdas, index=alphas.index, columns=factor_names)\n",
	"cs_alphas.head(), risk_premia.head()"
	]
	},
	{
	"cell_type": "markdown",
	"id": "95f6922d-42ff-45e0-89c9-958c9de296bf",
	"metadata": {},
	"source": [
	"### Test Risk Premia"
	]
	},
	{
	"cell_type": "markdown",
	"id": "c050fad7-b35b-4470-8572-f553600c5c8f",
	"metadata": {},
	"source": [
	"We now perform a ttest on each factor's risk premium to estblish whether it is statistically different from 0"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 16,
	"id": "91d94148-f10f-48a8-b4c7-a8d07076192a",
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"Factor Mkt-RF risk premia is statistically significant!\n",
	"Factor SMB risk premia is NOT statistically significant!\n",
	"Factor HML risk premia is statistically significant!\n"
	]
	}
	],
	"source": [
	"for factor in factor_names:\n",
	" pval = ttest_1samp(risk_premia[factor], popmean=0.0)[1]\n",
	" print(\"Factor {f} risk premia is statistically significant!\".format(f=factor)) if pval < 0.05 else print(\"Factor {f} risk premia is NOT statistically significant!\".format(f=factor))"
	]
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3 (ipykernel)",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.9.12"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 5
	}