yhilpisch/00_odsc_east_2019.md Secret

## 00_odsc_east_2019.md

      
    Raw
  

              00_odsc_east_2019.md
            
          
    Artificial Intelligence in Finance

Workshop by Dr Yves J Hilpisch | The Python Quants GmbH
ODSC East, Boston, 30. April 2019
Short Link

http://bit.ly/odsc_east
Slides

http://hilpisch.com/odsc_east.pdf
Resources


http://tpq.io
http://hilpisch.com
http://twitter.com/dyjh
http://certificate.tpq.io
http://compfinance.tpq.io

Python for Finance (2nd ed.)

Sign up under http://py4fi.pqp.io to access all the Jupyter Notebooks and codes and execute them on our Quant Platform.

Cloud

Use this link to get a 10 USD bonus on DigitalOcean when signing up for a new account.


## 01_efficient_markets.ipynb

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              01_efficient_markets.ipynb
            
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## 02_machine_learning.ipynb

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              02_machine_learning.ipynb
            
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## 03_reinforcement_learning.ipynb

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              03_reinforcement_learning.ipynb
            
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## 04_deep_learning.ipynb

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              04_deep_learning.ipynb
            
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## 05_financial_features.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src='http://hilpisch.com/taim_logo.png' width=\"350px\" align=\"right\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Artificial Intelligence in Finance\n",
    "\n",
    "**Financial Features**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "&copy; Dr Yves J Hilpisch | The Python Quants GmbH\n",
    "\n",
    "http://tpq.io | http://twitter.com/dyjh"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Imports"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For the `tpqoa` package see http://github.com/yhilpisch/tpqoa."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import math\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "from pylab import plt\n",
    "plt.style.use('seaborn')\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "url = 'http://hilpisch.com/oanda_eur_usd.csv'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# url = 'oanda_eur_usd.csv'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "raw = pd.read_csv(url, index_col=0,\n",
    "                  parse_dates=True).dropna()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "raw.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data = raw.copy()\n",
    "data['r'] = np.log(data['c'] / data['c'].shift(1))\n",
    "data['rs'] = (data['r'] - data['r'].mean()) / data['r'].std()\n",
    "data['d'] = np.where(data['r'] > 0, 1, 0)\n",
    "data['c-o'] = data['c'] - data['o']\n",
    "data['u-d'] = np.where(data['c'] - data['o'] > 0, 1, 0)\n",
    "data['h-l'] = data['h'] - data['l']\n",
    "data['h-o'] = data['h'] - data['o']\n",
    "data['o-l'] = data['o'] - data['l']\n",
    "data['h-c'] = data['h'] - data['c']\n",
    "data['c-l'] = data['c'] - data['l']\n",
    "data['v1'] = data['r'].rolling(20).std()\n",
    "data['v2'] = data['r'].rolling(100).std()\n",
    "data['sma1'] = data['c'].rolling(20).mean()\n",
    "data['sma2'] = data['c'].rolling(100).mean()\n",
    "data['mom1'] = data['r'].rolling(5).mean()\n",
    "data['mom2'] = data['r'].rolling(20).mean()\n",
    "data.dropna(inplace=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "features = list(data.columns)\n",
    "features.remove('complete')\n",
    "# features"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ld = len(data)\n",
    "ld"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "split = int(len(data) * 0.6)\n",
    "val_size = int(split * 0.15)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "train = data.iloc[:split]\n",
    "val = train[-val_size:]\n",
    "train = train[:-val_size]\n",
    "test = data.iloc[split:].copy()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "lags = 10"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def gaussian(x):\n",
    "    mean = x.mean()\n",
    "    std = x.std()\n",
    "    return (x - mean) / std, mean, std"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def normalize_and_lag():\n",
    "    global cols\n",
    "    cols = []\n",
    "    for f in features:\n",
    "        for lag in range(1, lags + 1):\n",
    "            col = f'{f}_lag_{lag}'\n",
    "            if f in ['r', 'rs', 'd', 'u-d']:\n",
    "                train[col] = train[f].shift(lag)\n",
    "                val[col] = val[f].shift(lag)\n",
    "                test[col] = test[f].shift(lag)\n",
    "            else:\n",
    "                train[col], mean, std = gaussian(train[f].shift(lag))\n",
    "                val[col] = (val[f].shift(lag) - mean) / std\n",
    "                test[col] = (test[f].shift(lag) - mean) / std\n",
    "            cols.append(col)\n",
    "    train.dropna(inplace=True)\n",
    "    val.dropna(inplace=True)\n",
    "    test.dropna(inplace=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "normalize_and_lag()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "len(cols)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "train.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Backtesting"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Scikit-Learn"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.neural_network import MLPClassifier"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = MLPClassifier(hidden_layer_sizes=(256, 256),\n",
    "                      activation='relu',\n",
    "                      alpha=0.0001,\n",
    "                      random_state=100,\n",
    "                      max_iter=200,\n",
    "                      validation_fraction=0.1,\n",
    "                      shuffle=False,\n",
    "                      early_stopping=False,\n",
    "                      verbose=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%time model.fit(train[cols], train['d'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "test['p'] = model.predict(test[cols])\n",
    "test['p'] = np.where(test['p'] > 0, 1, -1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "test['s'] = test['p'] * test['r']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "test[['r', 's']].sum().apply(np.exp)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sum(test['p'].diff() != 0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "test['p'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "test[['r', 's']].cumsum().apply(np.exp).plot(figsize=(10, 6));"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Keras"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import tensorflow as tf\n",
    "from keras.layers import Dense, Dropout\n",
    "from keras.models import Sequential\n",
    "from keras.callbacks import EarlyStopping, ModelCheckpoint"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.random.seed(100)\n",
    "tf.random.set_random_seed(100)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = Sequential()\n",
    "\n",
    "model.add(Dense(128, activation='relu',\n",
    "                # kernel_regularizer=l2(0.001),\n",
    "                input_shape=(len(cols),)\n",
    "               )\n",
    "         )\n",
    "model.add(Dropout(0.3, seed=100))\n",
    "model.add(Dense(128, activation='relu',\n",
    "                # kernel_regularizer=l2(0.001)\n",
    "               )\n",
    "         )\n",
    "model.add(Dropout(0.3, seed=100))\n",
    "model.add(Dense(1, activation='sigmoid'))\n",
    "\n",
    "model.compile(optimizer='adam',\n",
    "              loss='binary_crossentropy',\n",
    "              metrics=['accuracy'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "callbacks = [EarlyStopping(monitor='val_acc', patience=25)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "model.fit(train[cols], train['d'],\n",
    "          epochs=250, batch_size=32, verbose=False,\n",
    "          validation_data=(val[cols], val['d']),\n",
    "          callbacks=callbacks);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "res = pd.DataFrame(model.history.history)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "res.tail(3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "res.plot(figsize=(10, 6), style=['--', '--', '-', '-']);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.evaluate(test[cols], test['d'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "test['p'] = model.predict_classes(test[cols])\n",
    "test['p'] = np.where(test['p'] > 0, 1, -1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "test['s'] = test['p'] * test['r']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "test[['r', 's']].sum().apply(np.exp)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sum(test['p'].diff() != 0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "test['p'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "test[['r', 's']].cumsum().apply(np.exp).plot(figsize=(10, 6));"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Feature Selection"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Selection"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.feature_selection import SelectKBest, f_classif"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "selector = SelectKBest(f_classif, k=50)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cols_sel = selector.fit(train[cols], train['d']).get_support(indices=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cols_sel"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "csel = train[cols].columns[cols_sel]\n",
    "csel"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Scikit-Learn"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.neural_network import MLPClassifier"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = MLPClassifier(hidden_layer_sizes=(128, 128),\n",
    "                      activation='relu',\n",
    "                      learning_rate_init=0.001,\n",
    "                      random_state=100,\n",
    "                      max_iter=500,\n",
    "                      validation_fraction=0.1,\n",
    "                      shuffle=False,\n",
    "                      early_stopping=False,\n",
    "                      verbose=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%time model.fit(train[csel], train['d'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "test['p'] = model.predict(test[csel])\n",
    "test['p'] = np.where(test['p'] > 0, 1, -1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "test['s'] = test['p'] * test['r']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "test[['r', 's']].sum().apply(np.exp)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sum(test['p'].diff() != 0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "test['p'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "test[['r', 's']].cumsum().apply(np.exp).plot(figsize=(10, 6));"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Keras"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.random.seed(100)\n",
    "tf.random.set_random_seed(100)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = Sequential()\n",
    "\n",
    "model.add(Dense(128, activation='relu',\n",
    "                input_shape=(len(csel),)\n",
    "               )\n",
    "         )\n",
    "model.add(Dropout(0.3, seed=100))\n",
    "model.add(Dense(128, activation='relu',\n",
    "               )\n",
    "         )\n",
    "model.add(Dropout(0.3, seed=100))\n",
    "model.add(Dense(1, activation='sigmoid'))\n",
    "\n",
    "model.compile(optimizer='adam',\n",
    "              loss='binary_crossentropy',\n",
    "              metrics=['accuracy'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "callbacks = [ModelCheckpoint(filepath='weights.hdf5',\n",
    "                             monitor='val_acc',\n",
    "                             verbose=0,\n",
    "                             save_best_only=True,\n",
    "                             save_weights_only=True,\n",
    "                             mode='auto', period=1)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "model.fit(train[csel], train['d'],\n",
    "          epochs=125, batch_size=32, verbose=False,\n",
    "          validation_data=(val[csel], val['d']),\n",
    "          callbacks=callbacks);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Regular Results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "res = pd.DataFrame(model.history.history)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "res.tail(3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "res[['acc', 'val_acc']].plot(figsize=(10, 6), style=['--', '--', '-', '-']);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.evaluate(test[csel], test['d'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "test['p'] = model.predict_classes(test[csel])\n",
    "test['p'] = np.where(test['p'] > 0, 1, -1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "test['s'] = test['p'] * test['r']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "test[['r', 's']].sum().apply(np.exp)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sum(test['p'].diff() != 0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "test['p'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "test[['r', 's']].cumsum().apply(np.exp).plot(figsize=(10, 6));"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Best Weights (Validation)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.load_weights('weights.hdf5')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.evaluate(test[csel], test['d'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "test['p'] = model.predict_classes(test[csel])\n",
    "test['p'] = np.where(test['p'] > 0, 1, -1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "test['s'] = test['p'] * test['r']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "test[['r', 's']].sum().apply(np.exp)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sum(test['p'].diff() != 0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "test['p'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "test[['r', 's']].cumsum().apply(np.exp).plot(figsize=(10, 6));"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"http://hilpisch.com/tpq_logo.png\" alt=\"The Python Quants\" width=\"35%\" align=\"right\" border=\"0\"><br>\n",
    "\n",
    "<a href=\"http://tpq.io\" target=\"_blank\">http://tpq.io</a> | <a href=\"http://twitter.com/dyjh\" target=\"_blank\">@dyjh</a> | <a href=\"mailto:training@tpq.io\">training@tpq.io</a>"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}

## aiif.txt


Artificial Intelligence in Finance
==================================

Slides  http://hilpisch.com/odsc_east.pdf

Gist    http://bit.ly/odsc_east
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"<img src='http://hilpisch.com/taim_logo.png' width=\"350px\" align=\"right\">"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"# Artificial Intelligence in Finance\n",
	"\n",
	"Financial Features"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"© Dr Yves J Hilpisch \| The Python Quants GmbH\n",
	"\n",
	"http://tpq.io \| http://twitter.com/dyjh"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Imports"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"For the `tpqoa` package see http://github.com/yhilpisch/tpqoa."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"import math\n",
	"import numpy as np\n",
	"import pandas as pd\n",
	"from pylab import plt\n",
	"plt.style.use('seaborn')\n",
	"%matplotlib inline"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Data"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"url = 'http://hilpisch.com/oanda_eur_usd.csv'"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"# url = 'oanda_eur_usd.csv'"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"raw = pd.read_csv(url, index_col=0,\n",
	" parse_dates=True).dropna()"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"raw.info()"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"data = raw.copy()\n",
	"data['r'] = np.log(data['c'] / data['c'].shift(1))\n",
	"data['rs'] = (data['r'] - data['r'].mean()) / data['r'].std()\n",
	"data['d'] = np.where(data['r'] > 0, 1, 0)\n",
	"data['c-o'] = data['c'] - data['o']\n",
	"data['u-d'] = np.where(data['c'] - data['o'] > 0, 1, 0)\n",
	"data['h-l'] = data['h'] - data['l']\n",
	"data['h-o'] = data['h'] - data['o']\n",
	"data['o-l'] = data['o'] - data['l']\n",
	"data['h-c'] = data['h'] - data['c']\n",
	"data['c-l'] = data['c'] - data['l']\n",
	"data['v1'] = data['r'].rolling(20).std()\n",
	"data['v2'] = data['r'].rolling(100).std()\n",
	"data['sma1'] = data['c'].rolling(20).mean()\n",
	"data['sma2'] = data['c'].rolling(100).mean()\n",
	"data['mom1'] = data['r'].rolling(5).mean()\n",
	"data['mom2'] = data['r'].rolling(20).mean()\n",
	"data.dropna(inplace=True)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"features = list(data.columns)\n",
	"features.remove('complete')\n",
	"# features"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"ld = len(data)\n",
	"ld"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"split = int(len(data) * 0.6)\n",
	"val_size = int(split * 0.15)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"train = data.iloc[:split]\n",
	"val = train[-val_size:]\n",
	"train = train[:-val_size]\n",
	"test = data.iloc[split:].copy()"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"lags = 10"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"def gaussian(x):\n",
	" mean = x.mean()\n",
	" std = x.std()\n",
	" return (x - mean) / std, mean, std"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"def normalize_and_lag():\n",
	" global cols\n",
	" cols = []\n",
	" for f in features:\n",
	" for lag in range(1, lags + 1):\n",
	" col = f'{f}_lag_{lag}'\n",
	" if f in ['r', 'rs', 'd', 'u-d']:\n",
	" train[col] = train[f].shift(lag)\n",
	" val[col] = val[f].shift(lag)\n",
	" test[col] = test[f].shift(lag)\n",
	" else:\n",
	" train[col], mean, std = gaussian(train[f].shift(lag))\n",
	" val[col] = (val[f].shift(lag) - mean) / std\n",
	" test[col] = (test[f].shift(lag) - mean) / std\n",
	" cols.append(col)\n",
	" train.dropna(inplace=True)\n",
	" val.dropna(inplace=True)\n",
	" test.dropna(inplace=True)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"normalize_and_lag()"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"len(cols)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"train.head(5)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Backtesting"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Scikit-Learn"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"from sklearn.neural_network import MLPClassifier"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"model = MLPClassifier(hidden_layer_sizes=(256, 256),\n",
	" activation='relu',\n",
	" alpha=0.0001,\n",
	" random_state=100,\n",
	" max_iter=200,\n",
	" validation_fraction=0.1,\n",
	" shuffle=False,\n",
	" early_stopping=False,\n",
	" verbose=False)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"%time model.fit(train[cols], train['d'])"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"test['p'] = model.predict(test[cols])\n",
	"test['p'] = np.where(test['p'] > 0, 1, -1)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"test['s'] = test['p'] * test['r']"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"test[['r', 's']].sum().apply(np.exp)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"sum(test['p'].diff() != 0)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"test['p'].value_counts()"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"scrolled": false
	},
	"outputs": [],
	"source": [
	"test[['r', 's']].cumsum().apply(np.exp).plot(figsize=(10, 6));"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Keras"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"import tensorflow as tf\n",
	"from keras.layers import Dense, Dropout\n",
	"from keras.models import Sequential\n",
	"from keras.callbacks import EarlyStopping, ModelCheckpoint"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"np.random.seed(100)\n",
	"tf.random.set_random_seed(100)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"model = Sequential()\n",
	"\n",
	"model.add(Dense(128, activation='relu',\n",
	" # kernel_regularizer=l2(0.001),\n",
	" input_shape=(len(cols),)\n",
	" )\n",
	" )\n",
	"model.add(Dropout(0.3, seed=100))\n",
	"model.add(Dense(128, activation='relu',\n",
	" # kernel_regularizer=l2(0.001)\n",
	" )\n",
	" )\n",
	"model.add(Dropout(0.3, seed=100))\n",
	"model.add(Dense(1, activation='sigmoid'))\n",
	"\n",
	"model.compile(optimizer='adam',\n",
	" loss='binary_crossentropy',\n",
	" metrics=['accuracy'])"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"model.summary()"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"callbacks = [EarlyStopping(monitor='val_acc', patience=25)]"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"%%time\n",
	"model.fit(train[cols], train['d'],\n",
	" epochs=250, batch_size=32, verbose=False,\n",
	" validation_data=(val[cols], val['d']),\n",
	" callbacks=callbacks);"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"res = pd.DataFrame(model.history.history)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"res.tail(3)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"res.plot(figsize=(10, 6), style=['--', '--', '-', '-']);"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"model.evaluate(test[cols], test['d'])"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"test['p'] = model.predict_classes(test[cols])\n",
	"test['p'] = np.where(test['p'] > 0, 1, -1)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"test['s'] = test['p'] * test['r']"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"test[['r', 's']].sum().apply(np.exp)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"sum(test['p'].diff() != 0)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"test['p'].value_counts()"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"scrolled": false
	},
	"outputs": [],
	"source": [
	"test[['r', 's']].cumsum().apply(np.exp).plot(figsize=(10, 6));"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Feature Selection"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Selection"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"from sklearn.feature_selection import SelectKBest, f_classif"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"selector = SelectKBest(f_classif, k=50)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"cols_sel = selector.fit(train[cols], train['d']).get_support(indices=True)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"cols_sel"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"csel = train[cols].columns[cols_sel]\n",
	"csel"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Scikit-Learn"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"from sklearn.neural_network import MLPClassifier"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"model = MLPClassifier(hidden_layer_sizes=(128, 128),\n",
	" activation='relu',\n",
	" learning_rate_init=0.001,\n",
	" random_state=100,\n",
	" max_iter=500,\n",
	" validation_fraction=0.1,\n",
	" shuffle=False,\n",
	" early_stopping=False,\n",
	" verbose=False)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"%time model.fit(train[csel], train['d'])"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"test['p'] = model.predict(test[csel])\n",
	"test['p'] = np.where(test['p'] > 0, 1, -1)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"test['s'] = test['p'] * test['r']"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"test[['r', 's']].sum().apply(np.exp)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"sum(test['p'].diff() != 0)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"test['p'].value_counts()"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"scrolled": false
	},
	"outputs": [],
	"source": [
	"test[['r', 's']].cumsum().apply(np.exp).plot(figsize=(10, 6));"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Keras"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"np.random.seed(100)\n",
	"tf.random.set_random_seed(100)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"model = Sequential()\n",
	"\n",
	"model.add(Dense(128, activation='relu',\n",
	" input_shape=(len(csel),)\n",
	" )\n",
	" )\n",
	"model.add(Dropout(0.3, seed=100))\n",
	"model.add(Dense(128, activation='relu',\n",
	" )\n",
	" )\n",
	"model.add(Dropout(0.3, seed=100))\n",
	"model.add(Dense(1, activation='sigmoid'))\n",
	"\n",
	"model.compile(optimizer='adam',\n",
	" loss='binary_crossentropy',\n",
	" metrics=['accuracy'])"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"callbacks = [ModelCheckpoint(filepath='weights.hdf5',\n",
	" monitor='val_acc',\n",
	" verbose=0,\n",
	" save_best_only=True,\n",
	" save_weights_only=True,\n",
	" mode='auto', period=1)]"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"%%time\n",
	"model.fit(train[csel], train['d'],\n",
	" epochs=125, batch_size=32, verbose=False,\n",
	" validation_data=(val[csel], val['d']),\n",
	" callbacks=callbacks);"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Regular Results"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"res = pd.DataFrame(model.history.history)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"res.tail(3)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"res[['acc', 'val_acc']].plot(figsize=(10, 6), style=['--', '--', '-', '-']);"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"model.evaluate(test[csel], test['d'])"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"test['p'] = model.predict_classes(test[csel])\n",
	"test['p'] = np.where(test['p'] > 0, 1, -1)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"test['s'] = test['p'] * test['r']"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"test[['r', 's']].sum().apply(np.exp)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"sum(test['p'].diff() != 0)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"test['p'].value_counts()"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"scrolled": false
	},
	"outputs": [],
	"source": [
	"test[['r', 's']].cumsum().apply(np.exp).plot(figsize=(10, 6));"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Best Weights (Validation)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"model.load_weights('weights.hdf5')"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"model.evaluate(test[csel], test['d'])"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"test['p'] = model.predict_classes(test[csel])\n",
	"test['p'] = np.where(test['p'] > 0, 1, -1)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"test['s'] = test['p'] * test['r']"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"test[['r', 's']].sum().apply(np.exp)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"sum(test['p'].diff() != 0)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"test['p'].value_counts()"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"scrolled": false
	},
	"outputs": [],
	"source": [
	"test[['r', 's']].cumsum().apply(np.exp).plot(figsize=(10, 6));"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"<img src=\"http://hilpisch.com/tpq_logo.png\" alt=\"The Python Quants\" width=\"35%\" align=\"right\" border=\"0\"><br>\n",
	"\n",
	"<a href=\"http://tpq.io\" target=\"_blank\">http://tpq.io</a> \| <a href=\"http://twitter.com/dyjh\" target=\"_blank\">@dyjh</a> \| <a href=\"mailto:training@tpq.io\">training@tpq.io</a>"
	]
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.6.8"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 2
	}


	Artificial Intelligence in Finance
	==================================

	Slides http://hilpisch.com/odsc_east.pdf

	Gist http://bit.ly/odsc_east