Created
September 10, 2020 02:32
-
-
Save yaojenkuo/3744db21543c99acac68c7f840112bf7 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# 指數是什麼?動手算一次就懂\n", | |
"\n", | |
"> 寫點 Python 程式計算道瓊工業平均指數與標準普爾 500 指數\n", | |
">\n", | |
"> 標籤:程式設計,獲取載入\n", | |
"\n", | |
"郭耀仁 <yaojenkuo@datainpoint.com> from [datainpoint](https://datainpoint.substack.com/about)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# 載入專案需要使用的套件\n", | |
"import numpy as np\n", | |
"import pandas as pd\n", | |
"from tqdm import tqdm" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## TL; DR\n", | |
"\n", | |
"在這個專案中,我們打算寫點 Python 程式,先將道瓊工業平均指數與標準普爾 500 指數的成分股代號由網頁中抓取出來,接著再取得這些成分股的股價以及流通股數,最後再套用加權計算的公式求解。透過這個專案,我們能夠暸解如何使用 Python 實作網頁資料擷取、數值計算以及陣列運算。" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 取得道瓊工業平均指數成分股\n", | |
"\n", | |
"道瓊工業平均指數成分股由 30 間在紐約證券交易所與 NASDAQ 上市的藍籌股(Blue Chip)公司所組成,可以由 <https://en.wikipedia.org/wiki/Dow_Jones_Industrial_Average> 取得這些公司的代號;使用 `pandas` 套件中的 `read_html()` 函式可以將網頁中的 `<table></table>` 標記擷取下來為資料框。" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"['MMM', 'AXP', 'AMGN', 'AAPL', 'BA', 'CAT', 'CVX', 'CSCO', 'KO', 'DOW', 'GS', 'HD', 'HON', 'IBM', 'INTC', 'JNJ', 'JPM', 'MCD', 'MRK', 'MSFT', 'NKE', 'PG', 'CRM', 'TRV', 'UNH', 'VZ', 'V', 'WBA', 'WMT', 'DIS']\n", | |
"30\n" | |
] | |
} | |
], | |
"source": [ | |
"request_url = \"https://en.wikipedia.org/wiki/Dow_Jones_Industrial_Average\"\n", | |
"html_tables = pd.read_html(request_url)\n", | |
"djia_symbols = [e.replace(\"\\xa0\", \"\").replace(\"NYSE:\", \"\") for e in html_tables[1]['Symbol'].values]\n", | |
"print(djia_symbols)\n", | |
"print(len(djia_symbols))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 取得標準普爾 500 指數成分股\n", | |
"\n", | |
"標準普爾 500 指數成分股由 505 間在紐約證券交易所、NASDAQ 與芝加哥期貨選擇權交易所上市的藍籌股(Blue Chip)公司所組成,可以由 <https://en.wikipedia.org/wiki/List_of_S%26P_500_companies> 取得這些公司的代號;使用 `pandas` 套件中的 `read_html()` 函式可以將網頁中的 `<table></table>` 標記擷取下來為資料框。" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"['MMM', 'ABT', 'ABBV', 'ABMD', 'ACN', 'ATVI', 'ADBE', 'AMD', 'AAP', 'AES']\n", | |
"505\n" | |
] | |
} | |
], | |
"source": [ | |
"request_url = \"https://en.wikipedia.org/wiki/List_of_S%26P_500_companies\"\n", | |
"html_tables = pd.read_html(request_url)\n", | |
"sp500_symbols = list(html_tables[0]['Symbol'].values)\n", | |
"print(sp500_symbols[:10]) # print the first 10 symbols\n", | |
"print(len(sp500_symbols))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 取得成分股的股價以及流通股數\n", | |
"\n", | |
"使用 `pandas` 套件取得成份股的股價以及流通股數(Float shares),我們撰寫了一個類別 `GetPriceFloatShares` 可以接受美股代號的輸入,回傳 [Yahoo! Finance](https://finance.yahoo.com/) 的資料,以蘋果電腦(美股代號 AAPL)為例:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"class GetPriceFloatShares:\n", | |
" def __init__(self, symbol):\n", | |
" self._symbol = symbol\n", | |
" query_str_params = {\n", | |
" \"p\": symbol\n", | |
" }\n", | |
" self._query_str_params = query_str_params\n", | |
" def get_price(self):\n", | |
" request_url = \"https://finance.yahoo.com/quote/{}/history\".format(self._symbol)\n", | |
" html_tables = pd.read_html(request_url)\n", | |
" price = html_tables[0].iloc[0, 4].replace(\",\", \"\")\n", | |
" return price\n", | |
" def get_float_shares(self):\n", | |
" request_url = \"https://finance.yahoo.com/quote/{}/key-statistics\".format(self._symbol)\n", | |
" html_tables = pd.read_html(request_url)\n", | |
" float_shares = html_tables[2].set_index(0).loc['Float', :].values[0]\n", | |
" return float_shares" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"117.32\n", | |
"17.09B\n" | |
] | |
} | |
], | |
"source": [ | |
"aapl = GetPriceFloatShares(\"AAPL\")\n", | |
"print(aapl.get_price())\n", | |
"print(aapl.get_float_shares())" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"html_tables = pd.read_html(\"https://finance.yahoo.com/quote/AAPL/history?p=AAPL\")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"'117.32'" | |
] | |
}, | |
"execution_count": 7, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"html_tables[0].iloc[0, 4]" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 計算道瓊工業平均指數\n", | |
"\n", | |
"我們寫一個 `calculate_djia()` 函式計算道瓊工業平均指數,依據股價加權計算公式,是將成分股每個公司的股價加總後再除以道瓊除數(Dow divisor):\n", | |
"\n", | |
"\\begin{equation}\n", | |
"\\text{Dow Jones Industrial Average} = \\frac{\\sum_i P_i}{\\text{Dow divisor}}\n", | |
"\\end{equation}\n", | |
"\n", | |
"道瓊除數在 2020-08-31 為 0.15198707565833(<https://en.wikipedia.org/wiki/Dow_Jones_Industrial_Average>)。" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": [ | |
"100%|██████████| 30/30 [00:36<00:00, 1.22s/it]\n" | |
] | |
} | |
], | |
"source": [ | |
"def calculate_djia(dow_divisor, djia_symbols):\n", | |
" sum_of_prices = 0\n", | |
" for symb in tqdm(djia_symbols):\n", | |
" gpfs = GetPriceFloatShares(symb)\n", | |
" price = gpfs.get_price()\n", | |
" sum_of_prices += float(price)\n", | |
" return sum_of_prices / dow_divisor\n", | |
"\n", | |
"djia = calculate_djia(0.15198707565833, djia_symbols)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"27940.46784311068\n" | |
] | |
} | |
], | |
"source": [ | |
"print(djia)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 計算標準普爾 500 指數\n", | |
"\n", | |
"我們寫一個 `calculate_sp500()` 函式計算標準普爾 500 指數,依據市值加權計算公式,將成分股每個公司的股價乘以市值加總後再除以 S&P 500 除數(S&P 500 divisor):\n", | |
"\n", | |
"\\begin{equation}\n", | |
"\\text{S&P 500} = \\frac{\\sum_i (P_i \\times Q_i)}{\\text{S&P 500 divisor}} \\\\\n", | |
"\\text{where} \\; Q_i = \\text{Float shares of stock i}\n", | |
"\\end{equation}\n", | |
"\n", | |
"S&P 500 除數目前約為 83 億美元。(<https://en.wikipedia.org/wiki/S%26P_500_Index>)。由於從 [Yahoo! Finance](https://finance.yahoo.com/) 擷取 500+ 個成分股資訊要一段時間,因此我們先擷取了 2020-09-09 的資訊,存在 `data/sp500.csv`,可以用 `pandas` 的 `read_csv()` 函式讀入成為資料框。" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>symbol</th>\n", | |
" <th>price</th>\n", | |
" <th>float_shares</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>MMM</td>\n", | |
" <td>163.17</td>\n", | |
" <td>575.35M</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>ABT</td>\n", | |
" <td>102.84</td>\n", | |
" <td>1.76B</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>ABBV</td>\n", | |
" <td>90.22</td>\n", | |
" <td>1.76B</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>ABMD</td>\n", | |
" <td>268.19</td>\n", | |
" <td>44.04M</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>ACN</td>\n", | |
" <td>232.65</td>\n", | |
" <td>635.19M</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" symbol price float_shares\n", | |
"0 MMM 163.17 575.35M\n", | |
"1 ABT 102.84 1.76B\n", | |
"2 ABBV 90.22 1.76B\n", | |
"3 ABMD 268.19 44.04M\n", | |
"4 ACN 232.65 635.19M" | |
] | |
}, | |
"execution_count": 10, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"sp500_df = pd.read_csv(\"data/sp500.csv\")\n", | |
"sp500_df.head()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"其中在 `float_shares` 欄位我們可以觀察到流通股數資料是以 B 表示 Billion、M 表示 Million,這時我們可以定義一個函式 `unit_transform()` 判斷 B 或 M 之後進行單位的轉換,亦即乘以 1,000,000,000 或 1,000,000。" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 11, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"def unit_transform(x):\n", | |
" if \"B\" in x:\n", | |
" x = x.replace(\"B\", \"\")\n", | |
" float_x = float(x)\n", | |
" return float_x * 1000000000\n", | |
" elif \"M\" in x:\n", | |
" x = x.replace(\"M\", \"\")\n", | |
" float_x = float(x)\n", | |
" return float_x * 1000000" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"`unit_tranform()` 函式目前可以針對單一成分股的流通股數進行單位轉換。" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 12, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"575350000.0\n", | |
"1760000000.0\n" | |
] | |
} | |
], | |
"source": [ | |
"print(unit_transform(\"575.35M\"))\n", | |
"print(unit_transform(\"1.76B\"))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"若希望能夠將其應用到所有的成分股上,我們可以使用 `np.vectorize()` 將它改變成一個通用函式(Universal function)。" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"['575.35M' '1.76B' '1.76B' '44.04M' '635.19M' '763.22M' '477.4M' '1.17B'\n", | |
" '64.61M' '662.4M']\n", | |
"[5.7535e+08 1.7600e+09 1.7600e+09 4.4040e+07 6.3519e+08 7.6322e+08\n", | |
" 4.7740e+08 1.1700e+09 6.4610e+07 6.6240e+08]\n" | |
] | |
} | |
], | |
"source": [ | |
"unit_transform_ufunc = np.vectorize(unit_transform)\n", | |
"print(sp500_df['float_shares'].values[:10])\n", | |
"print(unit_transform_ufunc(sp500_df['float_shares'].values[:10]))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"最後依據市值加權計算公式定義 `calculate_sp500()` 函式計算標準普爾 500 指數。" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 14, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"def calculate_sp500(sp500_divisor, sp500_df):\n", | |
" prices = sp500_df['price'].values\n", | |
" float_shares = unit_transform_ufunc(sp500_df['float_shares'].values)\n", | |
" return np.sum(prices * float_shares) / sp500_divisor\n", | |
"\n", | |
"sp500 = calculate_sp500(8.3e9, sp500_df)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 15, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"3397.1794004698795\n" | |
] | |
} | |
], | |
"source": [ | |
"print(sp500)" | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python Data Analysis", | |
"language": "python", | |
"name": "pyda" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.6.8" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 4 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment