Skip to content

Instantly share code, notes, and snippets.

@yhilpisch
Last active January 5, 2023 16:36
Show Gist options
  • Star 7 You must be signed in to star a gist
  • Fork 4 You must be signed in to fork a gist
  • Save yhilpisch/f905bfc3e4859a65d736b6593f268ef3 to your computer and use it in GitHub Desktop.
Save yhilpisch/f905bfc3e4859a65d736b6593f268ef3 to your computer and use it in GitHub Desktop.

Financial Tick Data Analysis with pandas

From Data Preprocessing to AI-Powered Market Prediction

Professor Dr. Yves J. Hilpisch | CEO of The Python Quants & The AI Machine

Jason Ramchandani | Developer Community Advocate at Refinitiv

PyConf Hyderabad, Online, 04. December 2020

Short Link

https://bit.ly/pyconf_hyd_2020

Slides

https://certificate.tpq.io/pyconf_hyd_2020.pdf

Data

Data sponsored by Refinitiv (https://refinitiv.com)

https://certificate.tpq.io/nif_refinitiv_tick_data.csv.gz

Python Certificate Programs

Python for Algorithmic Trading

My new book about Python for Algorithmic Trading — From Idea to Cloud Deployment (https://home.tpq.io/books/pyalgo).

Artificial Intelligence in Finance

My new book about AI in finance and applied to algorithmic trading (https://home.tpq.io/books/aiif).

Python for Finance (2nd ed.)

My standard reference book about Python for Finance (https://home.tpq.io/books/py4fi).

General Resources

https://home.tpq.io (company page)

https://twitter.com/dyjh (Twitter)

https://linkedin.com/in/dyjh (LinkedIn)

https://github.com/yhilpisch (Github)

https://youtube.com/c/yves-hilpisch (YouTube)

Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src='https://hilpisch.com/taim_logo.png' width=\"350px\" align=\"right\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Financial Tick Data Analysis with `pandas`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Rolling Prediction"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Dr Yves J Hilpisch | The AI Machine\n",
"\n",
"Jason Ramchandani | Refinitiv\n",
"\n",
"http://aimachine.io | http://twitter.com/dyjh"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The Imports"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"from pylab import plt\n",
"plt.style.use('seaborn')\n",
"pd.set_option('mode.chained_assignment', None)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Reading the Data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fn = 'nif_refinitiv_resampled.csv'"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data = pd.read_csv(fn, index_col=0, parse_dates=True).iloc[:]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data.index = data.index.tz_localize(None)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Preparing Features & Labels Data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data['RET'] = np.log(data['Price'] / data['Price'].shift(1))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"window = 1000"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data['SMA1'] = data['Price'].rolling(window).mean()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data['SMA2'] = data['Price'].rolling(2 * window).mean()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data['MOM'] = data['RET'].rolling(window).mean()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data['VOL'] = data['RET'].rolling(window).std()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data['MIN'] = data['Price'].rolling(window).min()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data['MAX'] = data['Price'].rolling(window).max()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data.dropna(inplace=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data['D'] = np.sign(data['RET']) # labels\n",
"data['D'] = data['D'].astype(int)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"features = ['Price', 'Volume', 'RET', 'SMA1', 'SMA2', 'MOM', 'VOL', 'MIN', 'MAX'][:]\n",
"features"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Lagging the Data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"lags = 3"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"cols = []\n",
"for f in features:\n",
" for lag in range(1, lags + 1):\n",
" col = f'{f}_lag_{lag}'\n",
" data[col] = data[f].shift(lag)\n",
" cols.append(col)\n",
"data.dropna(inplace=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"len(cols)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Rolling Windows"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"weeks = pd.date_range(start=data.index.min(), end=data.index.max(),\n",
" freq='W')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"weeks.freq=None"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"weeks[:10]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Rolling Training & Testing"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.tree import DecisionTreeClassifier\n",
"from sklearn.ensemble import RandomForestClassifier\n",
"from sklearn.ensemble import BaggingClassifier\n",
"from sklearn.neural_network import MLPClassifier"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dtc = DecisionTreeClassifier(max_depth=3, min_samples_split=15)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"rfc = RandomForestClassifier(n_estimators=15, max_features=0.75,\n",
" max_samples=0.75,\n",
" max_depth=3, min_samples_split=15)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"bc = BaggingClassifier(base_estimator=dtc, n_estimators=15,\n",
" max_samples=0.75,\n",
" max_features=0.75,\n",
" bootstrap=True,\n",
" bootstrap_features=True,)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"mlp = MLPClassifier(\n",
" hidden_layer_sizes=[12],\n",
" random_state=100,\n",
" validation_fraction=0.1,\n",
" shuffle=False,\n",
" max_iter=1000\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model = dtc"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data['PRED'] = 0"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time \n",
"for batch in range(12, len(weeks) - 1):\n",
" train = data.loc[weeks[0]:weeks[batch]]\n",
" mu, std = train.mean(), train.std()\n",
" train_ = (train - mu) / std\n",
" model.fit(train_[cols], train['D'])\n",
" test = data.loc[weeks[batch]: weeks[batch + 1]]\n",
" test_ = (test - mu) / std\n",
" data['PRED'].loc[weeks[batch]: weeks[batch + 1]] = model.predict(test_[cols])\n",
" print(f'{batch}/{len(weeks)} | ' + str(weeks[batch])[:10],\n",
" ' | test score=%.3f' % model.score(test_[cols], test['D']),\n",
" ' | len train=%d' % len(train),\n",
" end='\\r')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data['PRED'][data['PRED'] != 0].value_counts()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sum(data['PRED'][data['PRED'] != 0].diff() != 0)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data['STRAT'] = data['PRED'] * data['RET']"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data[['RET', 'STRAT']][data['PRED'] != 0].sum().apply(np.exp)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data[['RET', 'STRAT']][data['PRED'] != 0].cumsum(\n",
" ).apply(np.exp).plot(figsize=(10, 6));"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src='https://hilpisch.com/taim_logo.png' width=\"350px\" align=\"right\">\n",
"\n",
"<br><br><br><a href=\"http://tpq.io\" target=\"_blank\">http://tpq.io</a> | <a href=\"http://twitter.com/dyjh\" target=\"_blank\">@dyjh</a> | <a href=\"mailto:ai@tpq.io\">ai@tpq.io</a>"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment