Skip to content

Instantly share code, notes, and snippets.

@degerhan
Created May 29, 2021 19:15
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save degerhan/d02d510213281f65ba80171aed7edccf to your computer and use it in GitHub Desktop.
Save degerhan/d02d510213281f65ba80171aed7edccf to your computer and use it in GitHub Desktop.
calculate_signals_target22.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "calculate_signals_target22.ipynb",
"provenance": [],
"collapsed_sections": [],
"authorship_tag": "ABX9TyPzDXaONTnmxPJTV5zTW/tY",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/degerhan/d02d510213281f65ba80171aed7edccf/calculate_signals_target22.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "YvLFuhf3VGRZ"
},
"source": [
"# Signals target22 calculation (22day_2day target)\n",
"\n",
"This notebook calculates the **unofficial** 22day_2day Signals target as proposed in [Longer Signals Target - A Proposal For Higher Payouts](https://forum.numer.ai/t/longer-signals-target-a-proposal-for-higher-payouts/3357). The goal is to start building models and receive corr20 feedback ahead of the official target22 release.\n",
"\n",
"The output is `historical_targets_22.csv`, which adds a target22 column to the official historical target file. Dates/tickers where target22 cannot be computed are dropped, resulting in a reduction of target rows from approx 4.3m to 2.7m (this is with yfinance data, your mileage may vary with other data sources).\n",
"\n",
"For reference, see [Decoding the signals target](https://forum.numer.ai/t/decoding-the-signals-target/2501)\n",
"that demonstrates the official 6day_2day target is primarily (but maybe not exclusively) **the binned ranked 2day to 6day return**. We use the same method for the 22day_2day target."
]
},
{
"cell_type": "code",
"metadata": {
"id": "IZdGOWiJ-fmy"
},
"source": [
"!pip install yfinance"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "lqgL-pKI_EXW"
},
"source": [
"import os\n",
"import datetime\n",
"\n",
"import pandas as pd\n",
"import numpy as np\n",
"\n",
"import yfinance"
],
"execution_count": 2,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "Dhn4ZxMJ-u9r"
},
"source": [
"# get ticker map\n",
"ticker_map = (\n",
" pd.read_csv(\n",
" \"https://numerai-signals-public-data.s3-us-west-2.amazonaws.com/signals_ticker_map_w_bbg.csv\"\n",
" )\n",
" .drop_duplicates(subset=\"yahoo\")\n",
" .apply(lambda x: x.astype(str).str.upper())\n",
")\n",
"map_bbg_to_yahoo = dict(zip(ticker_map[\"bloomberg_ticker\"], ticker_map[\"yahoo\"]))\n",
"map_yahoo_to_bbg = dict(zip(ticker_map[\"yahoo\"], ticker_map[\"bloomberg_ticker\"]))"
],
"execution_count": 3,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "trXT5sIs_Nxp"
},
"source": [
"# get ticker universe\n",
"live_ticker_universe = pd.read_csv(\n",
" \"https://numerai-signals-public-data.s3-us-west-2.amazonaws.com/latest_universe.csv\",\n",
" squeeze=True,\n",
").drop_duplicates()\n",
"yfinance_tickers = live_ticker_universe.map(map_bbg_to_yahoo).dropna().to_list()"
],
"execution_count": 4,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "rKV4iYKVAilY"
},
"source": [
"# get official historical 6day_2day targets\n",
"targets = pd.read_csv(\n",
" \"https://numerai-signals-public-data.s3-us-west-2.amazonaws.com/signals_train_val_bbg.csv\"\n",
")\n",
"targets[\"Date\"] = pd.to_datetime(targets[\"friday_date\"], format=\"%Y%m%d\")\n",
"targets.set_index([\"Date\", \"bloomberg_ticker\"], inplace=True)"
],
"execution_count": 5,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "dngIc8TV_fjn"
},
"source": [
"# download yahoo prices or read from cached pickle\n",
"today = datetime.date.today().strftime(\"%Y-%m-%d\")\n",
"if os.path.exists(f\"{today}-prices.pkl\"):\n",
" prices = pd.read_pickle(f\"{today}-prices.pkl\")\n",
"else:\n",
" prices = yfinance.download(\n",
" tickers=yfinance_tickers,\n",
" start=\"2002-01-01\",\n",
" end=today,\n",
" interval=\"1d\",\n",
" threads=True,\n",
" group_by=\"column\",\n",
" )[\"Adj Close\"]\n",
" prices.to_pickle(f\"{today}-prices.pkl\")"
],
"execution_count": 6,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 446
},
"id": "IZP96vApSyxm",
"outputId": "3b48440f-9ed2-4b53-814a-37dc561a3ac0"
},
"source": [
"# calculate 22day-2day (20 market days) returns for every market date\n",
"period_returns = prices.pct_change(20)\n",
"\n",
"# calculate return rankings for every market date\n",
"period_ranks = pd.DataFrame(\n",
" period_returns.rank(axis=1, pct=True, method=\"first\").stack()\n",
").reset_index()\n",
"\n",
"period_ranks.columns = [\"end_date\", \"yahoo_ticker\", \"22day_2day_return_rank\"]\n",
"period_ranks[\"bloomberg_ticker\"] = period_ranks[\"yahoo_ticker\"].map(map_yahoo_to_bbg)\n",
"\n",
"# round prediction friday_date is 32 calendar days before the final pricing Tuesday\n",
"period_ranks[\"Date\"] = period_ranks[\"end_date\"] - datetime.timedelta(days=32)\n",
"period_ranks.set_index([\"Date\", \"bloomberg_ticker\"], inplace=True)\n",
"period_ranks"
],
"execution_count": 7,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th>end_date</th>\n",
" <th>yahoo_ticker</th>\n",
" <th>22day_2day_return_rank</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Date</th>\n",
" <th>bloomberg_ticker</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th rowspan=\"5\" valign=\"top\">2001-12-28</th>\n",
" <th>1 HK</th>\n",
" <td>2002-01-29</td>\n",
" <td>0001.HK</td>\n",
" <td>0.345725</td>\n",
" </tr>\n",
" <tr>\n",
" <th>000100 KS</th>\n",
" <td>2002-01-29</td>\n",
" <td>000100.KS</td>\n",
" <td>0.789963</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3 HK</th>\n",
" <td>2002-01-29</td>\n",
" <td>0003.HK</td>\n",
" <td>0.672862</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4 HK</th>\n",
" <td>2002-01-29</td>\n",
" <td>0004.HK</td>\n",
" <td>0.211896</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6 HK</th>\n",
" <td>2002-01-29</td>\n",
" <td>0006.HK</td>\n",
" <td>0.391264</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"5\" valign=\"top\">2021-04-26</th>\n",
" <th>ZUMZ US</th>\n",
" <td>2021-05-28</td>\n",
" <td>ZUMZ</td>\n",
" <td>0.535614</td>\n",
" </tr>\n",
" <tr>\n",
" <th>ZUO US</th>\n",
" <td>2021-05-28</td>\n",
" <td>ZUO</td>\n",
" <td>0.216313</td>\n",
" </tr>\n",
" <tr>\n",
" <th>ZURN SW</th>\n",
" <td>2021-05-28</td>\n",
" <td>ZURN.SW</td>\n",
" <td>0.484120</td>\n",
" </tr>\n",
" <tr>\n",
" <th>ZYXI US</th>\n",
" <td>2021-05-28</td>\n",
" <td>ZYXI</td>\n",
" <td>0.592746</td>\n",
" </tr>\n",
" <tr>\n",
" <th>ZZZ CN</th>\n",
" <td>2021-05-28</td>\n",
" <td>ZZZ.TO</td>\n",
" <td>0.052058</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>20005663 rows × 3 columns</p>\n",
"</div>"
],
"text/plain": [
" end_date yahoo_ticker 22day_2day_return_rank\n",
"Date bloomberg_ticker \n",
"2001-12-28 1 HK 2002-01-29 0001.HK 0.345725\n",
" 000100 KS 2002-01-29 000100.KS 0.789963\n",
" 3 HK 2002-01-29 0003.HK 0.672862\n",
" 4 HK 2002-01-29 0004.HK 0.211896\n",
" 6 HK 2002-01-29 0006.HK 0.391264\n",
"... ... ... ...\n",
"2021-04-26 ZUMZ US 2021-05-28 ZUMZ 0.535614\n",
" ZUO US 2021-05-28 ZUO 0.216313\n",
" ZURN SW 2021-05-28 ZURN.SW 0.484120\n",
" ZYXI US 2021-05-28 ZYXI 0.592746\n",
" ZZZ CN 2021-05-28 ZZZ.TO 0.052058\n",
"\n",
"[20005663 rows x 3 columns]"
]
},
"metadata": {
"tags": []
},
"execution_count": 7
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "1-o6INlydwDH"
},
"source": [
"# filter down to the official friday_dates by matching new targets to official targets\n",
"targets[\"22day_2day_return_rank\"] = period_ranks[\"22day_2day_return_rank\"]"
],
"execution_count": 8,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "hUHo2_A8dxlp"
},
"source": [
"# drop rows without target22 data so target6 and target22 models work of same data\n",
"targets = targets.dropna().reset_index().drop(columns=[\"Date\"])"
],
"execution_count": 9,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "iD8_8xs0d1AX"
},
"source": [
"# re-rank the 22 day targets as many rows were lost in the filtering\n",
"targets[\"22day_2day_return_rank\"] = targets.groupby(\"friday_date\")[\n",
" \"22day_2day_return_rank\"\n",
"].rank(pct=True, method=\"first\")\n",
"\n",
"# bin target22 ranks into numerai target distribution\n",
"targets[\"target22\"] = pd.cut(\n",
" targets[\"22day_2day_return_rank\"],\n",
" bins=[0, 0.05, 0.25, 0.75, 0.95, 1],\n",
" right=True,\n",
" labels=[0, 0.25, 0.50, 0.75, 1],\n",
" include_lowest=True,\n",
")"
],
"execution_count": 10,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 415
},
"id": "jLijblgPSyo_",
"outputId": "85b8214b-b150-4838-ffc3-82d1b4c403f4"
},
"source": [
"targets.drop(columns=\"22day_2day_return_rank\", inplace=True)\n",
"targets"
],
"execution_count": 11,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>bloomberg_ticker</th>\n",
" <th>friday_date</th>\n",
" <th>data_type</th>\n",
" <th>target</th>\n",
" <th>target22</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>002790 KS</td>\n",
" <td>20030131</td>\n",
" <td>train</td>\n",
" <td>0.25</td>\n",
" <td>0.25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>003490 KS</td>\n",
" <td>20030131</td>\n",
" <td>train</td>\n",
" <td>0.25</td>\n",
" <td>0.75</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>004370 KS</td>\n",
" <td>20030131</td>\n",
" <td>train</td>\n",
" <td>0.50</td>\n",
" <td>0.50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>005380 KS</td>\n",
" <td>20030131</td>\n",
" <td>train</td>\n",
" <td>0.75</td>\n",
" <td>0.50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>005490 KS</td>\n",
" <td>20030131</td>\n",
" <td>train</td>\n",
" <td>0.75</td>\n",
" <td>0.25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2746383</th>\n",
" <td>ZUMZ US</td>\n",
" <td>20210423</td>\n",
" <td>validation</td>\n",
" <td>0.75</td>\n",
" <td>0.50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2746384</th>\n",
" <td>ZUO US</td>\n",
" <td>20210423</td>\n",
" <td>validation</td>\n",
" <td>0.50</td>\n",
" <td>0.25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2746385</th>\n",
" <td>ZURN SW</td>\n",
" <td>20210423</td>\n",
" <td>validation</td>\n",
" <td>0.50</td>\n",
" <td>0.50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2746386</th>\n",
" <td>ZYXI US</td>\n",
" <td>20210423</td>\n",
" <td>validation</td>\n",
" <td>0.00</td>\n",
" <td>0.25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2746387</th>\n",
" <td>ZZZ CN</td>\n",
" <td>20210423</td>\n",
" <td>validation</td>\n",
" <td>0.50</td>\n",
" <td>0.25</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>2746388 rows × 5 columns</p>\n",
"</div>"
],
"text/plain": [
" bloomberg_ticker friday_date data_type target target22\n",
"0 002790 KS 20030131 train 0.25 0.25\n",
"1 003490 KS 20030131 train 0.25 0.75\n",
"2 004370 KS 20030131 train 0.50 0.50\n",
"3 005380 KS 20030131 train 0.75 0.50\n",
"4 005490 KS 20030131 train 0.75 0.25\n",
"... ... ... ... ... ...\n",
"2746383 ZUMZ US 20210423 validation 0.75 0.50\n",
"2746384 ZUO US 20210423 validation 0.50 0.25\n",
"2746385 ZURN SW 20210423 validation 0.50 0.50\n",
"2746386 ZYXI US 20210423 validation 0.00 0.25\n",
"2746387 ZZZ CN 20210423 validation 0.50 0.25\n",
"\n",
"[2746388 rows x 5 columns]"
]
},
"metadata": {
"tags": []
},
"execution_count": 11
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "gEGjvXsvVI_O",
"outputId": "bd88f90d-11be-4bec-87d5-ae25072033a0"
},
"source": [
"targets.value_counts(subset=\"target22\", normalize=True)"
],
"execution_count": 12,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"target22\n",
"0.5 0.499999\n",
"0.25 0.200030\n",
"0.75 0.199959\n",
"1.0 0.050170\n",
"0.0 0.049843\n",
"dtype: float64"
]
},
"metadata": {
"tags": []
},
"execution_count": 12
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "YsRTvRvQBelg"
},
"source": [
"targets.to_csv(\"historical_targets_22.csv\", header=True, index=False)"
],
"execution_count": 13,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "uXsBPrNhfm0Q"
},
"source": [
""
],
"execution_count": 13,
"outputs": []
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment