Skip to content

Instantly share code, notes, and snippets.

@yhilpisch
Last active September 4, 2022 17:38
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 7 You must be signed in to fork a gist
  • Save yhilpisch/c51c04cd160ff6157b4cfa9f30b92b16 to your computer and use it in GitHub Desktop.
Save yhilpisch/c51c04cd160ff6157b4cfa9f30b92b16 to your computer and use it in GitHub Desktop.

Artificial Intelligence in Finance

Workshop by Dr Yves J Hilpisch | The Python Quants GmbH

ODSC East, Boston, 30. April 2019

Short Link

http://bit.ly/odsc_east

Slides

http://hilpisch.com/odsc_east.pdf

Resources

Python for Finance (2nd ed.)

Sign up under http://py4fi.pqp.io to access all the Jupyter Notebooks and codes and execute them on our Quant Platform.

Cloud

Use this link to get a 10 USD bonus on DigitalOcean when signing up for a new account.

Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src='http://hilpisch.com/taim_logo.png' width=\"350px\" align=\"right\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Artificial Intelligence in Finance\n",
"\n",
"**Machine Learning Methods**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"&copy; Dr Yves J Hilpisch | The Python Quants GmbH\n",
"\n",
"http://tpq.io | http://twitter.com/dyjh"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Machine Learning"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"import datetime as dt\n",
"from pylab import mpl, plt\n",
"import warnings; warnings.simplefilter('ignore')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"plt.style.use('seaborn')\n",
"mpl.rcParams['font.family'] = 'serif'\n",
"np.random.seed(1000)\n",
"np.set_printoptions(suppress=True, precision=4)\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Unsupervised Learning"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### The Data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.datasets.samples_generator import make_blobs"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"X, y = make_blobs(n_samples=250, centers=4,\n",
" random_state=500, cluster_std=1.25) "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plt.figure(figsize=(10, 6))\n",
"plt.scatter(X[:, 0], X[:, 1], s=50);\n",
"# plt.savefig('../../images/ch13/ml_plot_01.png')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### K-Means Clustering"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.cluster import KMeans "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model = KMeans(n_clusters=4, random_state=0) "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model.fit(X) "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"y_kmeans = model.predict(X) "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"y_kmeans[:12] "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plt.figure(figsize=(10, 6))\n",
"plt.scatter(X[:, 0], X[:, 1], c=y_kmeans, cmap='coolwarm');\n",
"# plt.savefig('../../images/ch13/ml_plot_02.png');"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Gaussian Mixtures"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.mixture import GaussianMixture"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model = GaussianMixture(n_components=4, random_state=0)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model.fit(X)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"y_gm = model.predict(X)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"y_gm[:12]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"(y_gm == y_kmeans).all() "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Supervised Learning"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### The Data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.datasets import make_classification"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"n_samples = 100"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"X, y = make_classification(n_samples=n_samples, n_features=2, n_informative=2,\n",
" n_redundant=0, n_repeated=0, random_state=250)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"X[:5] "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"X.shape "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"y[:5] "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"y.shape "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plt.figure(figsize=(10, 6))\n",
"plt.scatter(x=X[:, 0], y=X[:, 1], c=y, cmap='coolwarm');\n",
"# plt.savefig('../../images/ch13/ml_plot_03.png')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Gaussian Naive Bayes"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.naive_bayes import GaussianNB\n",
"from sklearn.metrics import accuracy_score"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model = GaussianNB()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model.fit(X, y)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model.predict_proba(X).round(4)[:5] "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pred = model.predict(X) "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pred "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pred == y "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"accuracy_score(y, pred) "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"Xc = X[y == pred] \n",
"Xf = X[y != pred] "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plt.figure(figsize=(10, 6))\n",
"plt.scatter(x=Xc[:, 0], y=Xc[:, 1], c=y[y == pred],\n",
" marker='o', cmap='coolwarm') \n",
"plt.scatter(x=Xf[:, 0], y=Xf[:, 1], c=y[y != pred],\n",
" marker='x', cmap='coolwarm') \n",
"# plt.savefig('../../images/ch13/ml_plot_04.png')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Logistic Regression"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.linear_model import LogisticRegression"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model = LogisticRegression(C=1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model.fit(X, y)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model.predict_proba(X).round(4)[:5]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pred = model.predict(X)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"accuracy_score(y, pred)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"Xc = X[y == pred]\n",
"Xf = X[y != pred]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plt.figure(figsize=(10, 6))\n",
"plt.scatter(x=Xc[:, 0], y=Xc[:, 1], c=y[y == pred],\n",
" marker='o', cmap='coolwarm')\n",
"plt.scatter(x=Xf[:, 0], y=Xf[:, 1], c=y[y != pred],\n",
" marker='x', cmap='coolwarm');"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Decision Tree"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.tree import DecisionTreeClassifier"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model = DecisionTreeClassifier(max_depth=1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model.fit(X, y)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model.predict_proba(X).round(4)[:5]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pred = model.predict(X)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"accuracy_score(y, pred)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"Xc = X[y == pred]\n",
"Xf = X[y != pred]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plt.figure(figsize=(10, 6))\n",
"plt.scatter(x=Xc[:, 0], y=Xc[:, 1], c=y[y == pred],\n",
" marker='o', cmap='coolwarm')\n",
"plt.scatter(x=Xf[:, 0], y=Xf[:, 1], c=y[y != pred],\n",
" marker='x', cmap='coolwarm');\n",
"# plt.savefig('../../images/ch13/ml_plot_05.png')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print('{:>8s} | {:8s}'.format('depth', 'accuracy'))\n",
"print(20 * '-')\n",
"for depth in range(1, 7):\n",
" model = DecisionTreeClassifier(max_depth=depth)\n",
" model.fit(X, y)\n",
" acc = accuracy_score(y, model.predict(X))\n",
" print('{:8d} | {:8.2f}'.format(depth, acc))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Deep Neural Network"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### scikit-learn"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.neural_network import MLPClassifier"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model = MLPClassifier(solver='lbfgs', alpha=1e-5,\n",
" hidden_layer_sizes=2 * [75], random_state=10)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%time model.fit(X, y)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pred = model.predict(X)\n",
"pred"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"accuracy_score(y, pred)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### TensorFlow"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import tensorflow as tf\n",
"tf.logging.set_verbosity(tf.logging.ERROR) "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fc = [tf.contrib.layers.real_valued_column('features')] "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model = tf.contrib.learn.DNNClassifier(hidden_units=5 * [250],\n",
" n_classes=2, feature_columns=fc) "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def input_fn(): \n",
" fc = {'features': tf.constant(X)}\n",
" la = tf.constant(y)\n",
" return fc, la"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%time model.fit(input_fn=input_fn, steps=100) "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model.evaluate(input_fn=input_fn, steps=1) "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pred = np.array(list(model.predict(input_fn=input_fn))) \n",
"pred[:10] "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%time model.fit(input_fn=input_fn, steps=750) "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model.evaluate(input_fn=input_fn, steps=1) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Feature Transforms"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn import preprocessing"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"X[:5]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"Xs = preprocessing.StandardScaler().fit_transform(X) \n",
"Xs[:5]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"Xm = preprocessing.MinMaxScaler().fit_transform(X) \n",
"Xm[:5]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"Xn1 = preprocessing.Normalizer(norm='l1').transform(X) \n",
"Xn1[:5]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"Xn2 = preprocessing.Normalizer(norm='l2').transform(X) \n",
"Xn2[:5]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plt.figure(figsize=(10, 6))\n",
"markers = ['o', '.', 'x', '^', 'v']\n",
"data_sets = [X, Xs, Xm, Xn1, Xn2]\n",
"labels = ['raw', 'standard', 'minmax', 'norm(1)', 'norm(2)']\n",
"for x, m, l in zip(data_sets, markers, labels):\n",
" plt.scatter(x=x[:, 0], y=x[:, 1], c=y,\n",
" marker=m, cmap='coolwarm', label=l)\n",
"plt.legend();\n",
"# plt.savefig('../../images/ch13/ml_plot_06.png');"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"X[:5]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"Xb = preprocessing.Binarizer().fit_transform(X) \n",
"Xb[:5]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"2 ** 2 "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"Xd = np.digitize(X, bins=[-1, 0, 1]) \n",
"Xd[:5]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"4 ** 2 "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train-Test Splits "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.svm import SVC\n",
"from sklearn.model_selection import train_test_split"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"train_x, test_x, train_y, test_y = train_test_split(X, y, test_size=0.33,\n",
" random_state=0)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model = SVC(C=1, kernel='linear')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model.fit(train_x, train_y) "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pred_train = model.predict(train_x) "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"accuracy_score(train_y, pred_train) "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pred_test = model.predict(test_x) "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"test_y == pred_test "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"accuracy_score(test_y, pred_test) "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"test_c = test_x[test_y == pred_test]\n",
"test_f = test_x[test_y != pred_test]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plt.figure(figsize=(10, 6))\n",
"plt.scatter(x=test_c[:, 0], y=test_c[:, 1], c=test_y[test_y == pred_test],\n",
" marker='o', cmap='coolwarm')\n",
"plt.scatter(x=test_f[:, 0], y=test_f[:, 1], c=test_y[test_y != pred_test],\n",
" marker='x', cmap='coolwarm');\n",
"# plt.savefig('../../images/ch13/ml_plot_07.png');"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"bins = np.linspace(-4.5, 4.5, 50)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"Xd = np.digitize(X, bins=bins)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"Xd[:5]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"train_x, test_x, train_y, test_y = train_test_split(Xd, y, test_size=0.33,\n",
" random_state=0)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print('{:>8s} | {:8s}'.format('kernel', 'accuracy'))\n",
"print(20 * '-')\n",
"for kernel in ['linear', 'poly', 'rbf', 'sigmoid']:\n",
" model = SVC(C=1, kernel=kernel)\n",
" model.fit(train_x, train_y)\n",
" acc = accuracy_score(test_y, model.predict(test_x))\n",
" print('{:>8s} | {:8.3f}'.format(kernel, acc))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"http://hilpisch.com/tpq_logo.png\" alt=\"The Python Quants\" width=\"35%\" align=\"right\" border=\"0\"><br>\n",
"\n",
"<a href=\"http://tpq.io\" target=\"_blank\">http://tpq.io</a> | <a href=\"http://twitter.com/dyjh\" target=\"_blank\">@dyjh</a> | <a href=\"mailto:training@tpq.io\">training@tpq.io</a>"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Artificial Intelligence in Finance
==================================
Slides http://hilpisch.com/odsc_east.pdf
Gist http://bit.ly/odsc_east
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment