Skip to content

Instantly share code, notes, and snippets.

@yhilpisch
Last active April 1, 2024 01:00
Show Gist options
  • Star 7 You must be signed in to star a gist
  • Fork 9 You must be signed in to fork a gist
  • Save yhilpisch/08167918a51fc2adc8f20a92364886db to your computer and use it in GitHub Desktop.
Save yhilpisch/08167918a51fc2adc8f20a92364886db to your computer and use it in GitHub Desktop.

AI in Finance: An Introduction in Python

Dr Yves J Hilpisch | The Python Quants & The AI Machine

Webinar, DataCamp, 21. May 2019

Short Link

http://bit.ly/aiif_webinar

Slides

http://hilpisch.com/aiif_webinar.pdf

Python Certificate Programs

Python for Finance (2nd ed.)

Learn all about my most recent book under http://py4fi.tpq.io.

Sign up under http://py4fi.pqp.io to access and execute all Jupyter Notebooks and codes on our Quant Platform.

General Resources

Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src='http://hilpisch.com/taim_logo.png' width=\"350px\" align=\"right\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# AI in Finance\n",
"\n",
"**Market Prediction**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"&copy; Dr Yves J Hilpisch | The Python Quants GmbH\n",
"\n",
"http://tpq.io | http://twitter.com/dyjh"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Imports"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import math\n",
"import numpy as np\n",
"import pandas as pd\n",
"from pylab import plt\n",
"plt.style.use('seaborn')\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The Data"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"url = 'http://hilpisch.com/oanda_eur_usd.csv'"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"raw = pd.read_csv(url, index_col=0,\n",
" parse_dates=True).dropna()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"DatetimeIndex: 15456 entries, 2018-01-01 22:00:00 to 2019-03-29 20:30:00\n",
"Data columns (total 6 columns):\n",
"c 15456 non-null float64\n",
"complete 15456 non-null bool\n",
"h 15456 non-null float64\n",
"l 15456 non-null float64\n",
"o 15456 non-null float64\n",
"volume 15456 non-null int64\n",
"dtypes: bool(1), float64(4), int64(1)\n",
"memory usage: 739.6 KB\n"
]
}
],
"source": [
"raw.info()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The Features"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following creates a **set of financial features**."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"data = raw.copy()\n",
"\n",
"# log returns & direction\n",
"data['r'] = np.log(data['c'] / data['c'].shift(1))\n",
"data['rs'] = (data['r'] - data['r'].mean()) / data['r'].std()\n",
"data['d'] = np.where(data['r'] > 0, 1, 0)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"# rolling statistics\n",
"data['v1'] = data['r'].rolling(20).std()\n",
"data['v2'] = data['r'].rolling(100).std()\n",
"data['sma1'] = data['c'].rolling(20).mean()\n",
"data['sma2'] = data['c'].rolling(100).mean()\n",
"data['mom1'] = data['r'].rolling(5).mean()\n",
"data['mom2'] = data['r'].rolling(20).mean()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"data.dropna(inplace=True)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"features = list(data.columns)\n",
"features.remove('complete')"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"15356"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ld = len(data)\n",
"ld"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data Preprocessing"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The data set is split into **train, validation and test data sets**."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"split = int(len(data) * 0.7)\n",
"val_size = int(split * 0.15)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"train = data.iloc[:split]\n",
"val = train[-val_size:]\n",
"train = train[:-val_size]\n",
"test = data.iloc[split:].copy()"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"lags = 10"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"def gaussian(x):\n",
" mean = x.mean()\n",
" std = x.std()\n",
" return (x - mean) / std, mean, std"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following function creates **lags** of the features columns and **normalizes** the data."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"def normalize_and_lag():\n",
" global cols\n",
" cols = []\n",
" for f in features:\n",
" for lag in range(1, lags + 1):\n",
" col = f'{f}_lag_{lag}'\n",
" if f in ['r', 'rs', 'd', 'u-d']:\n",
" train[col] = train[f].shift(lag)\n",
" val[col] = val[f].shift(lag)\n",
" test[col] = test[f].shift(lag)\n",
" else:\n",
" train[col], mean, std = gaussian(train[f].shift(lag))\n",
" val[col] = (val[f].shift(lag) - mean) / std\n",
" test[col] = (test[f].shift(lag) - mean) / std\n",
" cols.append(col)\n",
" train.dropna(inplace=True)\n",
" val.dropna(inplace=True)\n",
" test.dropna(inplace=True)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"normalize_and_lag()"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"140"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(cols)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"# train.head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Backtesting"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Scikit-Learn"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, a `MLPClassifier` is trained."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.neural_network import MLPClassifier"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(100)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"model = MLPClassifier(hidden_layer_sizes=(256, 256),\n",
" activation='relu',\n",
" alpha=0.0001,\n",
" random_state=100,\n",
" max_iter=200,\n",
" validation_fraction=0.0,\n",
" shuffle=False,\n",
" early_stopping=False,\n",
" verbose=False)"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 1min 41s, sys: 9.83 s, total: 1min 51s\n",
"Wall time: 18.6 s\n"
]
},
{
"data": {
"text/plain": [
"MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,\n",
" beta_2=0.999, early_stopping=False, epsilon=1e-08,\n",
" hidden_layer_sizes=(256, 256), learning_rate='constant',\n",
" learning_rate_init=0.001, max_iter=200, momentum=0.9,\n",
" n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,\n",
" random_state=100, shuffle=False, solver='adam', tol=0.0001,\n",
" validation_fraction=0.0, verbose=False, warm_start=False)"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%time model.fit(train[cols], train['d'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Second, the trained model is used to predict the **future market direction**."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"test['p'] = model.predict(test[cols])\n",
"test['p'] = np.where(test['p'] > 0, 1, -1)"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
"test['s'] = test['p'] * test['r']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A look at the **positions** and the **number of trades**."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
" 1 3008\n",
"-1 1589\n",
"Name: p, dtype: int64"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"test['p'].value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1764"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sum(test['p'].diff() != 0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The model **outperforms** the passive benchmark investment by a few percentage points.\n",
"\n",
"<b style=\"color:red;\">CAUTION: The analysis is implemented under the assumption of zero transaction costs.<b>"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"r 0.992410\n",
"s 1.046602\n",
"dtype: float64"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"test[['r', 's']].sum().apply(np.exp)"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"r 0.000602\n",
"s 0.000602\n",
"dtype: float64"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"test[['r', 's']].std()"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 720x432 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"test[['r', 's']].cumsum().apply(np.exp).plot(figsize=(10, 6));"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Keras"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A dense neural network with `Keras` is built and trained."
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Using TensorFlow backend.\n"
]
}
],
"source": [
"import tensorflow as tf\n",
"tf.logging.set_verbosity(tf.logging.ERROR)\n",
"from keras.layers import Dense, Dropout\n",
"from keras.models import Sequential\n",
"from keras.regularizers import l2\n",
"from keras.callbacks import EarlyStopping"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(100)\n",
"tf.random.set_random_seed(100)"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [],
"source": [
"model = Sequential()\n",
"\n",
"model.add(Dense(128, activation='relu',\n",
" kernel_regularizer=l2(0.001),\n",
" input_shape=(len(cols),)\n",
" )\n",
" )\n",
"model.add(Dropout(0.3, seed=100))\n",
"model.add(Dense(128, activation='relu',\n",
" kernel_regularizer=l2(0.001)\n",
" )\n",
" )\n",
"model.add(Dropout(0.3, seed=100))\n",
"model.add(Dense(1, activation='sigmoid'))\n",
"\n",
"model.compile(optimizer='adam',\n",
" loss='binary_crossentropy',\n",
" metrics=['accuracy'])"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"_________________________________________________________________\n",
"Layer (type) Output Shape Param # \n",
"=================================================================\n",
"dense_1 (Dense) (None, 128) 18048 \n",
"_________________________________________________________________\n",
"dropout_1 (Dropout) (None, 128) 0 \n",
"_________________________________________________________________\n",
"dense_2 (Dense) (None, 128) 16512 \n",
"_________________________________________________________________\n",
"dropout_2 (Dropout) (None, 128) 0 \n",
"_________________________________________________________________\n",
"dense_3 (Dense) (None, 1) 129 \n",
"=================================================================\n",
"Total params: 34,689\n",
"Trainable params: 34,689\n",
"Non-trainable params: 0\n",
"_________________________________________________________________\n"
]
}
],
"source": [
"model.summary()"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [],
"source": [
"callbacks = [EarlyStopping(monitor='val_acc', patience=25)]"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 21.5 s, sys: 6.77 s, total: 28.2 s\n",
"Wall time: 10.6 s\n"
]
},
{
"data": {
"text/plain": [
"<keras.callbacks.History at 0x1a39cac828>"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"model.fit(train[cols], train['d'],\n",
" epochs=250, batch_size=32, verbose=False,\n",
" validation_data=(val[cols], val['d']),\n",
" callbacks=callbacks);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A look at how the **metrics** evolve over the training periods."
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [],
"source": [
"res = pd.DataFrame(model.history.history)"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>val_loss</th>\n",
" <th>val_acc</th>\n",
" <th>loss</th>\n",
" <th>acc</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>29</th>\n",
" <td>0.693369</td>\n",
" <td>0.530587</td>\n",
" <td>0.692790</td>\n",
" <td>0.522187</td>\n",
" </tr>\n",
" <tr>\n",
" <th>30</th>\n",
" <td>0.695514</td>\n",
" <td>0.503121</td>\n",
" <td>0.693074</td>\n",
" <td>0.521749</td>\n",
" </tr>\n",
" <tr>\n",
" <th>31</th>\n",
" <td>0.697411</td>\n",
" <td>0.503745</td>\n",
" <td>0.692624</td>\n",
" <td>0.526679</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" val_loss val_acc loss acc\n",
"29 0.693369 0.530587 0.692790 0.522187\n",
"30 0.695514 0.503121 0.693074 0.521749\n",
"31 0.697411 0.503745 0.692624 0.526679"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"res.tail(3)"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 720x432 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"res.plot(figsize=(10, 6), style=['--', '--', '-', '-']);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The trained model is used to generate **directional market predictions** on the test data set."
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"4597/4597 [==============================] - 0s 13us/step\n"
]
},
{
"data": {
"text/plain": [
"[0.6929880122218361, 0.522732216779735]"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model.evaluate(test[cols], test['d'])"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [],
"source": [
"test['p'] = model.predict_classes(test[cols])\n",
"test['p'] = np.where(test['p'] > 0, 1, -1)"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [],
"source": [
"test['s'] = test['p'] * test['r']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A look at the **positions** and the **number of trades**."
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
" 1 3042\n",
"-1 1555\n",
"Name: p, dtype: int64"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"test['p'].value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1123"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sum(test['p'].diff() != 0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The model **outperforms** the passive benchmark investment.\n",
"\n",
"<b style=\"color:red;\">CAUTION: The analysis is implemented under the assumption of zero transaction costs.<b>"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"r 0.992410\n",
"s 1.078215\n",
"dtype: float64"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"test[['r', 's']].sum().apply(np.exp)"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"r 0.000602\n",
"s 0.000602\n",
"dtype: float64"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"test[['r', 's']].std()"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 720x432 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"test[['r', 's']].cumsum().apply(np.exp).plot(figsize=(10, 6));"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"http://hilpisch.com/tpq_logo.png\" alt=\"The Python Quants\" width=\"35%\" align=\"right\" border=\"0\"><br>\n",
"\n",
"<a href=\"http://tpq.io\" target=\"_blank\">http://tpq.io</a> | <a href=\"http://twitter.com/dyjh\" target=\"_blank\">@dyjh</a> | <a href=\"mailto:training@tpq.io\">training@tpq.io</a>"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
@syracusepro
Copy link

I got Tensorflow 2. Anyone has ideas on how to fix to work with TF 2? Dont want to downgrade to TF 1.14. Thanks.

@yhilpisch
Copy link
Author

I got Tensorflow 2. Anyone has ideas on how to fix to work with TF 2? Dont want to downgrade to TF 1.14. Thanks.

What issues do you face? Tensorflow is used only indirectly via Keras.

@syracusepro
Copy link

syracusepro commented Sep 8, 2019 via email

@yhilpisch
Copy link
Author

This has probably something to do with inconsistent versions overall.

Would try to upgrade all relevant packages (keras, tf, etc.)

@syracusepro
Copy link

syracusepro commented Sep 8, 2019 via email

@yhilpisch
Copy link
Author

I meant that you should try to upgrade all of your packages.

Have a look eg here: http://certificate.tpq.io

This is our most comprehensive offering in Python for Algorithmic Trading.

@syracusepro
Copy link

syracusepro commented Sep 8, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment