yhilpisch/00_data_science.md

## 00_data_science.md

      
    Raw
  

              00_data_science.md
            
          
    Python and Data Science

Empowering Quants, Traders & Asset Managers
Dr. Yves J. Hilpisch | The Python Quants & The AI Machine
Texas State University, April 2022
(short link to this Gist: http://bit.ly/py_ds_gist)

Slides

You find the slides under http://certificate.tpq.io/python_data_science.pdf

Resources

This Gist contains selected resources used during the lecture.
Dislaimer

All the content, Python code, Jupyter Notebooks and other materials (the “Material”) come without warranties or representations, to the extent permitted by applicable law.
None of the Material represents any kind of recommendation or investment advice.
The Material is only meant as a technical illustration.
Leveraged and unleveraged trading of financial instruments, and of contracts for difference (CFDs) in particular, involves a number of risks (for example, losses in excess of deposits). Make sure to understand and manage these risks.


## 01_financial_data.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "158717b9-91f6-4fd9-b556-a45ad57a0964",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "<img src=\"https://hilpisch.com/tpq_logo.png\" alt=\"The Python Quants\" width=\"35%\" align=\"right\" border=\"0\"><br>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cd44fa75-f0e0-40aa-a994-cd7424924ae6",
   "metadata": {},
   "source": [
    "# Python and Data Science"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cc344ce0-1069-4c8c-b354-215e91844069",
   "metadata": {},
   "source": [
    "### Getting Financial Data from APIs"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "336e3c6b-9502-4b9c-87a7-a51efad295dd",
   "metadata": {},
   "source": [
    "&copy; Dr. Yves J. Hilpisch | The Python Quants GmbH\n",
    "\n",
    "http://tpq.io | [training@tpq.io](mailto:trainin@tpq.io) | [@dyjh](http://twitter.com/dyjh)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1c049fe3",
   "metadata": {},
   "source": [
    "# EOD Historical Data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7d77b612-8f72-4258-a87f-ba1f9bb44960",
   "metadata": {},
   "source": [
    "See [company page](https://eodhistoricaldata.com/r/?ref=X8R79ISB)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ad1fe1e7",
   "metadata": {},
   "source": [
    "## Imports \n",
    "\n",
    "Installation of the packages via\n",
    "\n",
    "    pip install eod"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bbc4144d",
   "metadata": {},
   "outputs": [],
   "source": [
    "import requests\n",
    "import cufflinks\n",
    "import pandas as pd\n",
    "from io import StringIO\n",
    "from eod import EodHistoricalData\n",
    "cufflinks.set_config_file(offline=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "df164f2f",
   "metadata": {},
   "source": [
    "## API Connection"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1140932b",
   "metadata": {},
   "outputs": [],
   "source": [
    "%run ../creds.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a71a4b48",
   "metadata": {},
   "outputs": [],
   "source": [
    "api = EodHistoricalData(api_key)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8cc2d362-3da2-4700-858c-3163a9bcd21c",
   "metadata": {},
   "source": [
    "## Stock Price Data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7bcb12ac-87c6-4e39-bdd5-f5410181db54",
   "metadata": {},
   "source": [
    "* `period = 'w'` for weekly\n",
    "* `period='m'` for monthly\n",
    "* `order='a'` for ascending\n",
    "* `order='d'` for descending"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "69145c19-3f63-480d-b07c-cfed3aa0046a",
   "metadata": {},
   "outputs": [],
   "source": [
    "symbol = 'AAPL.US'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "05ba0b98-29ef-4a70-9bde-d51908da7ed4",
   "metadata": {},
   "outputs": [],
   "source": [
    "%time prices = api.get_prices_eod(symbol, period='d', order='a')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "28245bde",
   "metadata": {},
   "outputs": [],
   "source": [
    "price_df = pd.DataFrame.from_dict(prices).set_index('date')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "962f5635",
   "metadata": {},
   "outputs": [],
   "source": [
    "price_df.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2e966942",
   "metadata": {},
   "outputs": [],
   "source": [
    "price_df.tail()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1e2138f6-9f52-4d7f-ba96-c49af6e7c9b7",
   "metadata": {},
   "outputs": [],
   "source": [
    "price_df['adjusted_close'].iloc[-1000:].iplot(title=symbol)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "feb75c04-9bcf-4852-9c53-fc451cfed42d",
   "metadata": {},
   "source": [
    "## ETF Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "07ebf590-1c7c-40f2-a99b-5a6b80f73e49",
   "metadata": {},
   "outputs": [],
   "source": [
    "sym = ['IQQW.XETRA', 'QDVE.XETRA', 'EXXT.XETRA', 'EGLN.LSE']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d25cf7d6-c787-4d08-878e-a1b2513cc3f0",
   "metadata": {},
   "outputs": [],
   "source": [
    "%time prices = api.get_prices_eod(sym[2], period='d', order='a')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "42ed75f5-8d1c-4ac8-929f-d9cfef1dfd61",
   "metadata": {},
   "outputs": [],
   "source": [
    "price_df = pd.DataFrame.from_dict(prices).set_index('date')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a1f0bace-5d2b-42fe-836e-a1de5d60668e",
   "metadata": {},
   "outputs": [],
   "source": [
    "price_df.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1a4c0084-6778-4113-9182-8b5a2ab8f750",
   "metadata": {},
   "outputs": [],
   "source": [
    "price_df.tail()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ce77000e-7a74-4d97-b973-18a89f2f6095",
   "metadata": {},
   "outputs": [],
   "source": [
    "for s in sym:\n",
    "    prices = api.get_prices_eod(s, period='d', order='a')\n",
    "    price_df = pd.DataFrame.from_dict(prices).set_index('date')\n",
    "    price_df.to_csv(s.split('.')[0] + '.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3b8f04bc-a3d9-43b2-8e65-97873a5da6c1",
   "metadata": {},
   "outputs": [],
   "source": [
    "raw = {}\n",
    "for s in sym:\n",
    "    prices = api.get_prices_eod(s, period='d', order='a')\n",
    "    price_df = pd.DataFrame.from_dict(prices).set_index('date')\n",
    "    raw[s] = price_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "94dabf32-584e-4dcf-8b76-42067d34e670",
   "metadata": {},
   "outputs": [],
   "source": [
    "data = pd.DataFrame()\n",
    "for s in raw:\n",
    "    data[s] = raw[s]['close']\n",
    "data.dropna(inplace=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eca4639a-a192-43a1-ae15-2abffdb7526c",
   "metadata": {},
   "outputs": [],
   "source": [
    "data.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "65e5ef5d-9a1a-4365-8e9e-85c7d38eff91",
   "metadata": {},
   "outputs": [],
   "source": [
    "data.normalize().iplot();"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ed36b9c9-9bd6-41b3-ac03-7cbb99f0dd5f",
   "metadata": {},
   "outputs": [],
   "source": [
    "data.corr()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d46bebdb",
   "metadata": {},
   "source": [
    "## Bond Data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3727b094-bd7a-4d28-a173-67ec778076f7",
   "metadata": {},
   "source": [
    "### Corporate Bonds"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "946091c0-487b-48b0-8e49-3c72a4dd92e1",
   "metadata": {},
   "outputs": [],
   "source": [
    "corporate_bond = 'US00213MAS35.BOND'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "48f77af4-e00f-4e98-94d9-7216b07ee6f7",
   "metadata": {},
   "outputs": [],
   "source": [
    "bond_prices = api.get_prices_eod(corporate_bond, period='w', order='a')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5ec2f013",
   "metadata": {},
   "outputs": [],
   "source": [
    "bond_df = pd.DataFrame.from_dict(bond_prices).set_index('date')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2e363a80-1e7c-405b-9945-72f36360aa62",
   "metadata": {},
   "outputs": [],
   "source": [
    "bond_df.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3a4fea76",
   "metadata": {},
   "outputs": [],
   "source": [
    "bond_df.tail()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "93849c41-5f40-4dcd-85f2-b7df9dbd9fcb",
   "metadata": {},
   "outputs": [],
   "source": [
    "bond_df['price'].iplot()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5698f2af-12c5-4d5f-b715-e03d716884a0",
   "metadata": {},
   "source": [
    "### Government Bond"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3b03cf93-0cdc-4cf7-9ee3-c0e520906733",
   "metadata": {},
   "outputs": [],
   "source": [
    "government_bond = 'SW10Y.GBOND'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2e9f561f-38c6-4325-b95a-2f89d970d3e6",
   "metadata": {},
   "outputs": [],
   "source": [
    "bond_prices = api.get_prices_eod(government_bond, period='w', order='a')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "549d622a",
   "metadata": {},
   "outputs": [],
   "source": [
    "bond_df = pd.DataFrame.from_dict(bond_prices).set_index('date')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "20e41198-1331-4be0-a9a6-aae2cc52468d",
   "metadata": {},
   "outputs": [],
   "source": [
    "bond_df.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7078178d",
   "metadata": {},
   "outputs": [],
   "source": [
    "bond_df.tail()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e5041e3f-293c-4c72-9651-22b332aa25f4",
   "metadata": {},
   "outputs": [],
   "source": [
    "bond_df['close'].iplot()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f2516b55",
   "metadata": {},
   "source": [
    "## Exchanges"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f45708ad",
   "metadata": {},
   "outputs": [],
   "source": [
    "all_exchanges = api.get_exchanges()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d8b4d709",
   "metadata": {},
   "outputs": [],
   "source": [
    "ex_df = pd.DataFrame.from_dict(all_exchanges)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a0befb51",
   "metadata": {},
   "outputs": [],
   "source": [
    "sorted(list(ex_df['Country'].unique()))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4ab5329d",
   "metadata": {},
   "source": [
    "## Fundamentals Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "102dd81d",
   "metadata": {},
   "outputs": [],
   "source": [
    "fundamental = api.get_fundamental_equity('600487.SHG')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cd60485f-8e07-4cb6-bf3a-9e7b27dd68c6",
   "metadata": {},
   "outputs": [],
   "source": [
    "fundamental['General']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e91d8197",
   "metadata": {},
   "outputs": [],
   "source": [
    "fundamental = api.get_fundamental_equity('ZUARI.NSE')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "92ca543d-49f0-48cb-95a8-e66e9403c1d3",
   "metadata": {},
   "outputs": [],
   "source": [
    "fundamental['General']"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f27bd41d",
   "metadata": {},
   "source": [
    "## Get Options data for a symbol"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "df7dfc34",
   "metadata": {},
   "outputs": [],
   "source": [
    "option_data = api.get_stock_options('AAPL.US')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8d261037",
   "metadata": {},
   "outputs": [],
   "source": [
    "option_df = pd.DataFrame.from_dict(option_data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f3503aab",
   "metadata": {},
   "outputs": [],
   "source": [
    "option_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "68de2113-3d34-45ee-942e-6419bfe77cef",
   "metadata": {},
   "outputs": [],
   "source": [
    "opt_data = option_df.loc[0, 'data']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "87ec1378-1cf2-435d-b6c5-059c9143e43d",
   "metadata": {},
   "outputs": [],
   "source": [
    "opt_data.keys()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d4e96b19-cca8-4a27-a240-905befe1d642",
   "metadata": {},
   "outputs": [],
   "source": [
    "for key in opt_data.keys():\n",
    "    if key != 'options':\n",
    "        print(key, opt_data[key])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d4bb470b-829c-4312-b041-18f5e7703546",
   "metadata": {},
   "source": [
    "<img src=\"http://hilpisch.com/tpq_logo.png\" alt=\"The Python Quants\" width=\"30%\" align=\"right\" border=\"0\"><br>\n",
    "\n",
    "<a href=\"http://tpq.io\" target=\"_blank\">http://tpq.io</a> | <a href=\"http://twitter.com/dyjh\" target=\"_blank\">@dyjh</a> | <a href=\"mailto:team@tpq.io\">team@tpq.io</a>"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}

## 02_data_logistics.ipynb

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              02_data_logistics.ipynb
            
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## 03_portfolio_management.ipynb

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              03_portfolio_management.ipynb
            
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## 04_risk_budgeting.ipynb

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              04_risk_budgeting.ipynb
            
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## 05_text_processing.ipynb

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              05_text_processing.ipynb
            
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## mvp_portfolio.py
#
# Mean-Variance Portfolio Class
# Markowitz (1952)
#
# Python for Asset Management
# (c) Dr. Yves J. Hilpisch
# The Python Quants GmbH
#
import math
import numpy as np
import pandas as pd

def portfolio_return(weights, rets):
    return np.dot(weights.T, rets.mean()) * 252

def portfolio_variance(weights, rets):
    return np.dot(weights.T, np.dot(rets.cov(), weights)) * 252

def portfolio_volatility(weights, rets):
    return math.sqrt(portfolio_variance(weights, rets))

## nlp.py
#
# NLP Helper Functions
#
# Artificial Intelligence in Finance
# (c) Dr Yves J Hilpisch
# The Python Quants GmbH
#
import re
import nltk
import string
import pandas as pd
from pylab import plt
from wordcloud import WordCloud
from nltk.corpus import stopwords
from nltk.corpus import wordnet as wn
from lxml.html.clean import Cleaner
from sklearn.feature_extraction.text import TfidfVectorizer
plt.style.use('seaborn')

cleaner = Cleaner(style=True, links=True, allow_tags=[''],
                  remove_unknown_tags=False)

stop_words = stopwords.words('english')
stop_words.extend(['new', 'old', 'pro', 'open', 'menu', 'close'])


def remove_non_ascii(s):
    ''' Removes all non-ascii characters.
    '''
    return ''.join(i for i in s if ord(i) < 128)

def clean_up_html(t):
    t = cleaner.clean_html(t)
    t = re.sub('[\n\t\r]', ' ', t)
    t = re.sub(' +', ' ', t)
    t = re.sub('<.*?>', '', t)
    t = remove_non_ascii(t)
    return t

def clean_up_text(t, numbers=False, punctuation=False):
    ''' Cleans up a text, e.g. HTML document,
        from HTML tags and also cleans up the
        text body.
    '''
    try:
        t = clean_up_html(t)
    except:
        pass
    t = t.lower()
    t = re.sub(r"what's", "what is ", t)
    t = t.replace('(ap)', '')
    t = re.sub(r"\'ve", " have ", t)
    t = re.sub(r"can't", "cannot ", t)
    t = re.sub(r"n't", " not ", t)
    t = re.sub(r"i'm", "i am ", t)
    t = re.sub(r"\'s", "", t)
    t = re.sub(r"\'re", " are ", t)
    t = re.sub(r"\'d", " would ", t)
    t = re.sub(r"\'ll", " will ", t)
    t = re.sub(r'\s+', ' ', t)
    t = re.sub(r"\\", "", t)
    t = re.sub(r"\'", "", t)
    t = re.sub(r"\"", "", t)
    if numbers:
        t = re.sub('[^a-zA-Z ?!]+', '', t)
    if punctuation:
        t = re.sub(r'\W+', ' ', t)
    t = remove_non_ascii(t)
    t = t.strip()
    return t

def nltk_lemma(word):
    ''' If one exists, returns the lemma of a word.
        I.e. the base or dictionary version of it.
    '''
    lemma = wn.morphy(word)
    if lemma is None:
        return word
    else:
        return lemma

def tokenize(text, min_char=3, lemma=True, stop=True,
             numbers=False):
    ''' Tokenizes a text and implements some
        transformations.
    '''
    tokens = nltk.word_tokenize(text)
    tokens = [t for t in tokens if len(t) >= min_char]
    if numbers:
        tokens = [t for t in tokens if t[0].lower()
                  in string.ascii_lowercase]
    if stop:
        tokens = [t for t in tokens if t not in stop_words]
    if lemma:
        tokens = [nltk_lemma(t) for t in tokens]
    return tokens

def generate_word_cloud(text, no, name=None, show=True):
    ''' Generates a word cloud bitmap given a
        text document (string).
        It uses the Term Frequency (TF) and
        Inverse Document Frequency (IDF)
        vectorization approach to derive the
        importance of a word -- represented
        by the size of the word in the word cloud.

    Parameters
    ==========
    text: str
        text as the basis
    no: int
        number of words to be included
    name: str
        path to save the image
    show: bool
        whether to show the generated image or not
    '''
    tokens = tokenize(text)
    vec = TfidfVectorizer(min_df=2,
                      analyzer='word',
                      ngram_range=(1, 2),
                      stop_words='english'
                     )
    vec.fit_transform(tokens)
    wc = pd.DataFrame({'words': vec.get_feature_names(),
                       'tfidf': vec.idf_})
    words = ' '.join(wc.sort_values('tfidf', ascending=True)['words'].head(no))
    wordcloud = WordCloud(max_font_size=110,
                      background_color='white',
                      width=1024, height=768,
                      margin=10, max_words=150).generate(words)
    if show:
        plt.figure(figsize=(10, 10))
        plt.imshow(wordcloud, interpolation='bilinear')
        plt.axis('off')
        plt.show()
    if name is not None:
        wordcloud.to_file(name)

def generate_key_words(text, no):
    try:
        tokens = tokenize(text)
        vec = TfidfVectorizer(min_df=2,
                      analyzer='word',
                      ngram_range=(1, 2),
                      stop_words='english'
                     )

        vec.fit_transform(tokens)
        wc = pd.DataFrame({'words': vec.get_feature_names(),
                       'tfidf': vec.idf_})
        words = wc.sort_values('tfidf', ascending=False)['words'].values
        words = [a for a in words if not a.isnumeric()][:no]
    except:
        words = list()
    return words
	{
	"cells": [
	{
	"cell_type": "markdown",
	"id": "158717b9-91f6-4fd9-b556-a45ad57a0964",
	"metadata": {
	"slideshow": {
	"slide_type": "slide"
	}
	},
	"source": [
	"<img src=\"https://hilpisch.com/tpq_logo.png\" alt=\"The Python Quants\" width=\"35%\" align=\"right\" border=\"0\"><br>"
	]
	},
	{
	"cell_type": "markdown",
	"id": "cd44fa75-f0e0-40aa-a994-cd7424924ae6",
	"metadata": {},
	"source": [
	"# Python and Data Science"
	]
	},
	{
	"cell_type": "markdown",
	"id": "cc344ce0-1069-4c8c-b354-215e91844069",
	"metadata": {},
	"source": [
	"### Getting Financial Data from APIs"
	]
	},
	{
	"cell_type": "markdown",
	"id": "336e3c6b-9502-4b9c-87a7-a51efad295dd",
	"metadata": {},
	"source": [
	"© Dr. Yves J. Hilpisch \| The Python Quants GmbH\n",
	"\n",
	"http://tpq.io \| [training@tpq.io](mailto:trainin@tpq.io) \| [@dyjh](http://twitter.com/dyjh)"
	]
	},
	{
	"cell_type": "markdown",
	"id": "1c049fe3",
	"metadata": {},
	"source": [
	"# EOD Historical Data"
	]
	},
	{
	"cell_type": "markdown",
	"id": "7d77b612-8f72-4258-a87f-ba1f9bb44960",
	"metadata": {},
	"source": [
	"See [company page](https://eodhistoricaldata.com/r/?ref=X8R79ISB)."
	]
	},
	{
	"cell_type": "markdown",
	"id": "ad1fe1e7",
	"metadata": {},
	"source": [
	"## Imports \n",
	"\n",
	"Installation of the packages via\n",
	"\n",
	" pip install eod"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "bbc4144d",
	"metadata": {},
	"outputs": [],
	"source": [
	"import requests\n",
	"import cufflinks\n",
	"import pandas as pd\n",
	"from io import StringIO\n",
	"from eod import EodHistoricalData\n",
	"cufflinks.set_config_file(offline=True)"
	]
	},
	{
	"cell_type": "markdown",
	"id": "df164f2f",
	"metadata": {},
	"source": [
	"## API Connection"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "1140932b",
	"metadata": {},
	"outputs": [],
	"source": [
	"%run ../creds.py"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "a71a4b48",
	"metadata": {},
	"outputs": [],
	"source": [
	"api = EodHistoricalData(api_key)"
	]
	},
	{
	"cell_type": "markdown",
	"id": "8cc2d362-3da2-4700-858c-3163a9bcd21c",
	"metadata": {},
	"source": [
	"## Stock Price Data"
	]
	},
	{
	"cell_type": "markdown",
	"id": "7bcb12ac-87c6-4e39-bdd5-f5410181db54",
	"metadata": {},
	"source": [
	"* `period = 'w'` for weekly\n",
	"* `period='m'` for monthly\n",
	"* `order='a'` for ascending\n",
	"* `order='d'` for descending"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "69145c19-3f63-480d-b07c-cfed3aa0046a",
	"metadata": {},
	"outputs": [],
	"source": [
	"symbol = 'AAPL.US'"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "05ba0b98-29ef-4a70-9bde-d51908da7ed4",
	"metadata": {},
	"outputs": [],
	"source": [
	"%time prices = api.get_prices_eod(symbol, period='d', order='a')"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "28245bde",
	"metadata": {},
	"outputs": [],
	"source": [
	"price_df = pd.DataFrame.from_dict(prices).set_index('date')"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "962f5635",
	"metadata": {},
	"outputs": [],
	"source": [
	"price_df.info()"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "2e966942",
	"metadata": {},
	"outputs": [],
	"source": [
	"price_df.tail()"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "1e2138f6-9f52-4d7f-ba96-c49af6e7c9b7",
	"metadata": {},
	"outputs": [],
	"source": [
	"price_df['adjusted_close'].iloc[-1000:].iplot(title=symbol)"
	]
	},
	{
	"cell_type": "markdown",
	"id": "feb75c04-9bcf-4852-9c53-fc451cfed42d",
	"metadata": {},
	"source": [
	"## ETF Data"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "07ebf590-1c7c-40f2-a99b-5a6b80f73e49",
	"metadata": {},
	"outputs": [],
	"source": [
	"sym = ['IQQW.XETRA', 'QDVE.XETRA', 'EXXT.XETRA', 'EGLN.LSE']"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "d25cf7d6-c787-4d08-878e-a1b2513cc3f0",
	"metadata": {},
	"outputs": [],
	"source": [
	"%time prices = api.get_prices_eod(sym[2], period='d', order='a')"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "42ed75f5-8d1c-4ac8-929f-d9cfef1dfd61",
	"metadata": {},
	"outputs": [],
	"source": [
	"price_df = pd.DataFrame.from_dict(prices).set_index('date')"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "a1f0bace-5d2b-42fe-836e-a1de5d60668e",
	"metadata": {},
	"outputs": [],
	"source": [
	"price_df.info()"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "1a4c0084-6778-4113-9182-8b5a2ab8f750",
	"metadata": {},
	"outputs": [],
	"source": [
	"price_df.tail()"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "ce77000e-7a74-4d97-b973-18a89f2f6095",
	"metadata": {},
	"outputs": [],
	"source": [
	"for s in sym:\n",
	" prices = api.get_prices_eod(s, period='d', order='a')\n",
	" price_df = pd.DataFrame.from_dict(prices).set_index('date')\n",
	" price_df.to_csv(s.split('.')[0] + '.csv')"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "3b8f04bc-a3d9-43b2-8e65-97873a5da6c1",
	"metadata": {},
	"outputs": [],
	"source": [
	"raw = {}\n",
	"for s in sym:\n",
	" prices = api.get_prices_eod(s, period='d', order='a')\n",
	" price_df = pd.DataFrame.from_dict(prices).set_index('date')\n",
	" raw[s] = price_df"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "94dabf32-584e-4dcf-8b76-42067d34e670",
	"metadata": {},
	"outputs": [],
	"source": [
	"data = pd.DataFrame()\n",
	"for s in raw:\n",
	" data[s] = raw[s]['close']\n",
	"data.dropna(inplace=True)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "eca4639a-a192-43a1-ae15-2abffdb7526c",
	"metadata": {},
	"outputs": [],
	"source": [
	"data.head()"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "65e5ef5d-9a1a-4365-8e9e-85c7d38eff91",
	"metadata": {},
	"outputs": [],
	"source": [
	"data.normalize().iplot();"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "ed36b9c9-9bd6-41b3-ac03-7cbb99f0dd5f",
	"metadata": {},
	"outputs": [],
	"source": [
	"data.corr()"
	]
	},
	{
	"cell_type": "markdown",
	"id": "d46bebdb",
	"metadata": {},
	"source": [
	"## Bond Data"
	]
	},
	{
	"cell_type": "markdown",
	"id": "3727b094-bd7a-4d28-a173-67ec778076f7",
	"metadata": {},
	"source": [
	"### Corporate Bonds"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "946091c0-487b-48b0-8e49-3c72a4dd92e1",
	"metadata": {},
	"outputs": [],
	"source": [
	"corporate_bond = 'US00213MAS35.BOND'"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "48f77af4-e00f-4e98-94d9-7216b07ee6f7",
	"metadata": {},
	"outputs": [],
	"source": [
	"bond_prices = api.get_prices_eod(corporate_bond, period='w', order='a')"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "5ec2f013",
	"metadata": {},
	"outputs": [],
	"source": [
	"bond_df = pd.DataFrame.from_dict(bond_prices).set_index('date')"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "2e363a80-1e7c-405b-9945-72f36360aa62",
	"metadata": {},
	"outputs": [],
	"source": [
	"bond_df.info()"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "3a4fea76",
	"metadata": {},
	"outputs": [],
	"source": [
	"bond_df.tail()"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "93849c41-5f40-4dcd-85f2-b7df9dbd9fcb",
	"metadata": {},
	"outputs": [],
	"source": [
	"bond_df['price'].iplot()"
	]
	},
	{
	"cell_type": "markdown",
	"id": "5698f2af-12c5-4d5f-b715-e03d716884a0",
	"metadata": {},
	"source": [
	"### Government Bond"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "3b03cf93-0cdc-4cf7-9ee3-c0e520906733",
	"metadata": {},
	"outputs": [],
	"source": [
	"government_bond = 'SW10Y.GBOND'"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "2e9f561f-38c6-4325-b95a-2f89d970d3e6",
	"metadata": {},
	"outputs": [],
	"source": [
	"bond_prices = api.get_prices_eod(government_bond, period='w', order='a')"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "549d622a",
	"metadata": {},
	"outputs": [],
	"source": [
	"bond_df = pd.DataFrame.from_dict(bond_prices).set_index('date')"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "20e41198-1331-4be0-a9a6-aae2cc52468d",
	"metadata": {},
	"outputs": [],
	"source": [
	"bond_df.info()"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "7078178d",
	"metadata": {},
	"outputs": [],
	"source": [
	"bond_df.tail()"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "e5041e3f-293c-4c72-9651-22b332aa25f4",
	"metadata": {},
	"outputs": [],
	"source": [
	"bond_df['close'].iplot()"
	]
	},
	{
	"cell_type": "markdown",
	"id": "f2516b55",
	"metadata": {},
	"source": [
	"## Exchanges"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "f45708ad",
	"metadata": {},
	"outputs": [],
	"source": [
	"all_exchanges = api.get_exchanges()"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "d8b4d709",
	"metadata": {},
	"outputs": [],
	"source": [
	"ex_df = pd.DataFrame.from_dict(all_exchanges)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "a0befb51",
	"metadata": {},
	"outputs": [],
	"source": [
	"sorted(list(ex_df['Country'].unique()))"
	]
	},
	{
	"cell_type": "markdown",
	"id": "4ab5329d",
	"metadata": {},
	"source": [
	"## Fundamentals Data"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "102dd81d",
	"metadata": {},
	"outputs": [],
	"source": [
	"fundamental = api.get_fundamental_equity('600487.SHG')"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "cd60485f-8e07-4cb6-bf3a-9e7b27dd68c6",
	"metadata": {},
	"outputs": [],
	"source": [
	"fundamental['General']"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "e91d8197",
	"metadata": {},
	"outputs": [],
	"source": [
	"fundamental = api.get_fundamental_equity('ZUARI.NSE')"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "92ca543d-49f0-48cb-95a8-e66e9403c1d3",
	"metadata": {},
	"outputs": [],
	"source": [
	"fundamental['General']"
	]
	},
	{
	"cell_type": "markdown",
	"id": "f27bd41d",
	"metadata": {},
	"source": [
	"## Get Options data for a symbol"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "df7dfc34",
	"metadata": {},
	"outputs": [],
	"source": [
	"option_data = api.get_stock_options('AAPL.US')"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "8d261037",
	"metadata": {},
	"outputs": [],
	"source": [
	"option_df = pd.DataFrame.from_dict(option_data)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "f3503aab",
	"metadata": {},
	"outputs": [],
	"source": [
	"option_df"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "68de2113-3d34-45ee-942e-6419bfe77cef",
	"metadata": {},
	"outputs": [],
	"source": [
	"opt_data = option_df.loc[0, 'data']"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "87ec1378-1cf2-435d-b6c5-059c9143e43d",
	"metadata": {},
	"outputs": [],
	"source": [
	"opt_data.keys()"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "d4e96b19-cca8-4a27-a240-905befe1d642",
	"metadata": {},
	"outputs": [],
	"source": [
	"for key in opt_data.keys():\n",
	" if key != 'options':\n",
	" print(key, opt_data[key])"
	]
	},
	{
	"cell_type": "markdown",
	"id": "d4bb470b-829c-4312-b041-18f5e7703546",
	"metadata": {},
	"source": [
	"<img src=\"http://hilpisch.com/tpq_logo.png\" alt=\"The Python Quants\" width=\"30%\" align=\"right\" border=\"0\"><br>\n",
	"\n",
	"<a href=\"http://tpq.io\" target=\"_blank\">http://tpq.io</a> \| <a href=\"http://twitter.com/dyjh\" target=\"_blank\">@dyjh</a> \| <a href=\"mailto:team@tpq.io\">team@tpq.io</a>"
	]
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3 (ipykernel)",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.9.7"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 5
	}
	#
	# Mean-Variance Portfolio Class
	# Markowitz (1952)
	#
	# Python for Asset Management
	# (c) Dr. Yves J. Hilpisch
	# The Python Quants GmbH
	#
	import math
	import numpy as np
	import pandas as pd

	def portfolio_return(weights, rets):
	return np.dot(weights.T, rets.mean()) * 252

	def portfolio_variance(weights, rets):
	return np.dot(weights.T, np.dot(rets.cov(), weights)) * 252

	def portfolio_volatility(weights, rets):
	return math.sqrt(portfolio_variance(weights, rets))
	#
	# NLP Helper Functions
	#
	# Artificial Intelligence in Finance
	# (c) Dr Yves J Hilpisch
	# The Python Quants GmbH
	#
	import re
	import nltk
	import string
	import pandas as pd
	from pylab import plt
	from wordcloud import WordCloud
	from nltk.corpus import stopwords
	from nltk.corpus import wordnet as wn
	from lxml.html.clean import Cleaner
	from sklearn.feature_extraction.text import TfidfVectorizer
	plt.style.use('seaborn')

	cleaner = Cleaner(style=True, links=True, allow_tags=[''],
	remove_unknown_tags=False)

	stop_words = stopwords.words('english')
	stop_words.extend(['new', 'old', 'pro', 'open', 'menu', 'close'])


	def remove_non_ascii(s):
	''' Removes all non-ascii characters.
	'''
	return ''.join(i for i in s if ord(i) < 128)

	def clean_up_html(t):
	t = cleaner.clean_html(t)
	t = re.sub('[\n\t\r]', ' ', t)
	t = re.sub(' +', ' ', t)
	t = re.sub('<.*?>', '', t)
	t = remove_non_ascii(t)
	return t

	def clean_up_text(t, numbers=False, punctuation=False):
	''' Cleans up a text, e.g. HTML document,
	from HTML tags and also cleans up the
	text body.
	'''
	try:
	t = clean_up_html(t)
	except:
	pass
	t = t.lower()
	t = re.sub(r"what's", "what is ", t)
	t = t.replace('(ap)', '')
	t = re.sub(r"\'ve", " have ", t)
	t = re.sub(r"can't", "cannot ", t)
	t = re.sub(r"n't", " not ", t)
	t = re.sub(r"i'm", "i am ", t)
	t = re.sub(r"\'s", "", t)
	t = re.sub(r"\'re", " are ", t)
	t = re.sub(r"\'d", " would ", t)
	t = re.sub(r"\'ll", " will ", t)
	t = re.sub(r'\s+', ' ', t)
	t = re.sub(r"\\", "", t)
	t = re.sub(r"\'", "", t)
	t = re.sub(r"\"", "", t)
	if numbers:
	t = re.sub('[^a-zA-Z ?!]+', '', t)
	if punctuation:
	t = re.sub(r'\W+', ' ', t)
	t = remove_non_ascii(t)
	t = t.strip()
	return t

	def nltk_lemma(word):
	''' If one exists, returns the lemma of a word.
	I.e. the base or dictionary version of it.
	'''
	lemma = wn.morphy(word)
	if lemma is None:
	return word
	else:
	return lemma

	def tokenize(text, min_char=3, lemma=True, stop=True,
	numbers=False):
	''' Tokenizes a text and implements some
	transformations.
	'''
	tokens = nltk.word_tokenize(text)
	tokens = [t for t in tokens if len(t) >= min_char]
	if numbers:
	tokens = [t for t in tokens if t[0].lower()
	in string.ascii_lowercase]
	if stop:
	tokens = [t for t in tokens if t not in stop_words]
	if lemma:
	tokens = [nltk_lemma(t) for t in tokens]
	return tokens

	def generate_word_cloud(text, no, name=None, show=True):
	''' Generates a word cloud bitmap given a
	text document (string).
	It uses the Term Frequency (TF) and
	Inverse Document Frequency (IDF)
	vectorization approach to derive the
	importance of a word -- represented
	by the size of the word in the word cloud.

	Parameters
	==========
	text: str
	text as the basis
	no: int
	number of words to be included
	name: str
	path to save the image
	show: bool
	whether to show the generated image or not
	'''
	tokens = tokenize(text)
	vec = TfidfVectorizer(min_df=2,
	analyzer='word',
	ngram_range=(1, 2),
	stop_words='english'
	)
	vec.fit_transform(tokens)
	wc = pd.DataFrame({'words': vec.get_feature_names(),
	'tfidf': vec.idf_})
	words = ' '.join(wc.sort_values('tfidf', ascending=True)['words'].head(no))
	wordcloud = WordCloud(max_font_size=110,
	background_color='white',
	width=1024, height=768,
	margin=10, max_words=150).generate(words)
	if show:
	plt.figure(figsize=(10, 10))
	plt.imshow(wordcloud, interpolation='bilinear')
	plt.axis('off')
	plt.show()
	if name is not None:
	wordcloud.to_file(name)

	def generate_key_words(text, no):
	try:
	tokens = tokenize(text)
	vec = TfidfVectorizer(min_df=2,
	analyzer='word',
	ngram_range=(1, 2),
	stop_words='english'
	)

	vec.fit_transform(tokens)
	wc = pd.DataFrame({'words': vec.get_feature_names(),
	'tfidf': vec.idf_})
	words = wc.sort_values('tfidf', ascending=False)['words'].values
	words = [a for a in words if not a.isnumeric()][:no]
	except:
	words = list()
	return words