Skip to content

Instantly share code, notes, and snippets.

@yhilpisch
Last active September 4, 2022 17:38
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 7 You must be signed in to fork a gist
  • Save yhilpisch/c51c04cd160ff6157b4cfa9f30b92b16 to your computer and use it in GitHub Desktop.
Save yhilpisch/c51c04cd160ff6157b4cfa9f30b92b16 to your computer and use it in GitHub Desktop.

Artificial Intelligence in Finance

Workshop by Dr Yves J Hilpisch | The Python Quants GmbH

ODSC East, Boston, 30. April 2019

Short Link

http://bit.ly/odsc_east

Slides

http://hilpisch.com/odsc_east.pdf

Resources

Python for Finance (2nd ed.)

Sign up under http://py4fi.pqp.io to access all the Jupyter Notebooks and codes and execute them on our Quant Platform.

Cloud

Use this link to get a 10 USD bonus on DigitalOcean when signing up for a new account.

Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src='http://hilpisch.com/taim_logo.png' width=\"350px\" align=\"right\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Artificial Intelligence in Finance\n",
"\n",
"**Financial Features**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"&copy; Dr Yves J Hilpisch | The Python Quants GmbH\n",
"\n",
"http://tpq.io | http://twitter.com/dyjh"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Imports"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For the `tpqoa` package see http://github.com/yhilpisch/tpqoa."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import math\n",
"import numpy as np\n",
"import pandas as pd\n",
"from pylab import plt\n",
"plt.style.use('seaborn')\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"url = 'http://hilpisch.com/oanda_eur_usd.csv'"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# url = 'oanda_eur_usd.csv'"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"raw = pd.read_csv(url, index_col=0,\n",
" parse_dates=True).dropna()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"raw.info()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data = raw.copy()\n",
"data['r'] = np.log(data['c'] / data['c'].shift(1))\n",
"data['rs'] = (data['r'] - data['r'].mean()) / data['r'].std()\n",
"data['d'] = np.where(data['r'] > 0, 1, 0)\n",
"data['c-o'] = data['c'] - data['o']\n",
"data['u-d'] = np.where(data['c'] - data['o'] > 0, 1, 0)\n",
"data['h-l'] = data['h'] - data['l']\n",
"data['h-o'] = data['h'] - data['o']\n",
"data['o-l'] = data['o'] - data['l']\n",
"data['h-c'] = data['h'] - data['c']\n",
"data['c-l'] = data['c'] - data['l']\n",
"data['v1'] = data['r'].rolling(20).std()\n",
"data['v2'] = data['r'].rolling(100).std()\n",
"data['sma1'] = data['c'].rolling(20).mean()\n",
"data['sma2'] = data['c'].rolling(100).mean()\n",
"data['mom1'] = data['r'].rolling(5).mean()\n",
"data['mom2'] = data['r'].rolling(20).mean()\n",
"data.dropna(inplace=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"features = list(data.columns)\n",
"features.remove('complete')\n",
"# features"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ld = len(data)\n",
"ld"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"split = int(len(data) * 0.6)\n",
"val_size = int(split * 0.15)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"train = data.iloc[:split]\n",
"val = train[-val_size:]\n",
"train = train[:-val_size]\n",
"test = data.iloc[split:].copy()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"lags = 10"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def gaussian(x):\n",
" mean = x.mean()\n",
" std = x.std()\n",
" return (x - mean) / std, mean, std"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def normalize_and_lag():\n",
" global cols\n",
" cols = []\n",
" for f in features:\n",
" for lag in range(1, lags + 1):\n",
" col = f'{f}_lag_{lag}'\n",
" if f in ['r', 'rs', 'd', 'u-d']:\n",
" train[col] = train[f].shift(lag)\n",
" val[col] = val[f].shift(lag)\n",
" test[col] = test[f].shift(lag)\n",
" else:\n",
" train[col], mean, std = gaussian(train[f].shift(lag))\n",
" val[col] = (val[f].shift(lag) - mean) / std\n",
" test[col] = (test[f].shift(lag) - mean) / std\n",
" cols.append(col)\n",
" train.dropna(inplace=True)\n",
" val.dropna(inplace=True)\n",
" test.dropna(inplace=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"normalize_and_lag()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"len(cols)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"train.head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Backtesting"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Scikit-Learn"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.neural_network import MLPClassifier"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model = MLPClassifier(hidden_layer_sizes=(256, 256),\n",
" activation='relu',\n",
" alpha=0.0001,\n",
" random_state=100,\n",
" max_iter=200,\n",
" validation_fraction=0.1,\n",
" shuffle=False,\n",
" early_stopping=False,\n",
" verbose=False)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%time model.fit(train[cols], train['d'])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"test['p'] = model.predict(test[cols])\n",
"test['p'] = np.where(test['p'] > 0, 1, -1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"test['s'] = test['p'] * test['r']"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"test[['r', 's']].sum().apply(np.exp)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sum(test['p'].diff() != 0)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"test['p'].value_counts()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"test[['r', 's']].cumsum().apply(np.exp).plot(figsize=(10, 6));"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Keras"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import tensorflow as tf\n",
"from keras.layers import Dense, Dropout\n",
"from keras.models import Sequential\n",
"from keras.callbacks import EarlyStopping, ModelCheckpoint"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(100)\n",
"tf.random.set_random_seed(100)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model = Sequential()\n",
"\n",
"model.add(Dense(128, activation='relu',\n",
" # kernel_regularizer=l2(0.001),\n",
" input_shape=(len(cols),)\n",
" )\n",
" )\n",
"model.add(Dropout(0.3, seed=100))\n",
"model.add(Dense(128, activation='relu',\n",
" # kernel_regularizer=l2(0.001)\n",
" )\n",
" )\n",
"model.add(Dropout(0.3, seed=100))\n",
"model.add(Dense(1, activation='sigmoid'))\n",
"\n",
"model.compile(optimizer='adam',\n",
" loss='binary_crossentropy',\n",
" metrics=['accuracy'])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model.summary()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"callbacks = [EarlyStopping(monitor='val_acc', patience=25)]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"model.fit(train[cols], train['d'],\n",
" epochs=250, batch_size=32, verbose=False,\n",
" validation_data=(val[cols], val['d']),\n",
" callbacks=callbacks);"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"res = pd.DataFrame(model.history.history)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"res.tail(3)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"res.plot(figsize=(10, 6), style=['--', '--', '-', '-']);"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model.evaluate(test[cols], test['d'])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"test['p'] = model.predict_classes(test[cols])\n",
"test['p'] = np.where(test['p'] > 0, 1, -1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"test['s'] = test['p'] * test['r']"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"test[['r', 's']].sum().apply(np.exp)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sum(test['p'].diff() != 0)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"test['p'].value_counts()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"test[['r', 's']].cumsum().apply(np.exp).plot(figsize=(10, 6));"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Feature Selection"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Selection"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.feature_selection import SelectKBest, f_classif"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"selector = SelectKBest(f_classif, k=50)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"cols_sel = selector.fit(train[cols], train['d']).get_support(indices=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"cols_sel"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"csel = train[cols].columns[cols_sel]\n",
"csel"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Scikit-Learn"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.neural_network import MLPClassifier"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model = MLPClassifier(hidden_layer_sizes=(128, 128),\n",
" activation='relu',\n",
" learning_rate_init=0.001,\n",
" random_state=100,\n",
" max_iter=500,\n",
" validation_fraction=0.1,\n",
" shuffle=False,\n",
" early_stopping=False,\n",
" verbose=False)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%time model.fit(train[csel], train['d'])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"test['p'] = model.predict(test[csel])\n",
"test['p'] = np.where(test['p'] > 0, 1, -1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"test['s'] = test['p'] * test['r']"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"test[['r', 's']].sum().apply(np.exp)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sum(test['p'].diff() != 0)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"test['p'].value_counts()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"test[['r', 's']].cumsum().apply(np.exp).plot(figsize=(10, 6));"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Keras"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(100)\n",
"tf.random.set_random_seed(100)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model = Sequential()\n",
"\n",
"model.add(Dense(128, activation='relu',\n",
" input_shape=(len(csel),)\n",
" )\n",
" )\n",
"model.add(Dropout(0.3, seed=100))\n",
"model.add(Dense(128, activation='relu',\n",
" )\n",
" )\n",
"model.add(Dropout(0.3, seed=100))\n",
"model.add(Dense(1, activation='sigmoid'))\n",
"\n",
"model.compile(optimizer='adam',\n",
" loss='binary_crossentropy',\n",
" metrics=['accuracy'])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"callbacks = [ModelCheckpoint(filepath='weights.hdf5',\n",
" monitor='val_acc',\n",
" verbose=0,\n",
" save_best_only=True,\n",
" save_weights_only=True,\n",
" mode='auto', period=1)]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"model.fit(train[csel], train['d'],\n",
" epochs=125, batch_size=32, verbose=False,\n",
" validation_data=(val[csel], val['d']),\n",
" callbacks=callbacks);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Regular Results"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"res = pd.DataFrame(model.history.history)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"res.tail(3)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"res[['acc', 'val_acc']].plot(figsize=(10, 6), style=['--', '--', '-', '-']);"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model.evaluate(test[csel], test['d'])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"test['p'] = model.predict_classes(test[csel])\n",
"test['p'] = np.where(test['p'] > 0, 1, -1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"test['s'] = test['p'] * test['r']"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"test[['r', 's']].sum().apply(np.exp)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sum(test['p'].diff() != 0)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"test['p'].value_counts()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"test[['r', 's']].cumsum().apply(np.exp).plot(figsize=(10, 6));"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Best Weights (Validation)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model.load_weights('weights.hdf5')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model.evaluate(test[csel], test['d'])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"test['p'] = model.predict_classes(test[csel])\n",
"test['p'] = np.where(test['p'] > 0, 1, -1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"test['s'] = test['p'] * test['r']"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"test[['r', 's']].sum().apply(np.exp)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sum(test['p'].diff() != 0)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"test['p'].value_counts()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"test[['r', 's']].cumsum().apply(np.exp).plot(figsize=(10, 6));"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"http://hilpisch.com/tpq_logo.png\" alt=\"The Python Quants\" width=\"35%\" align=\"right\" border=\"0\"><br>\n",
"\n",
"<a href=\"http://tpq.io\" target=\"_blank\">http://tpq.io</a> | <a href=\"http://twitter.com/dyjh\" target=\"_blank\">@dyjh</a> | <a href=\"mailto:training@tpq.io\">training@tpq.io</a>"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Artificial Intelligence in Finance
==================================
Slides http://hilpisch.com/odsc_east.pdf
Gist http://bit.ly/odsc_east
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment