Skip to content

Instantly share code, notes, and snippets.

@yaojenkuo
Created September 10, 2020 02:32
Show Gist options
  • Save yaojenkuo/3744db21543c99acac68c7f840112bf7 to your computer and use it in GitHub Desktop.
Save yaojenkuo/3744db21543c99acac68c7f840112bf7 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 指數是什麼?動手算一次就懂\n",
"\n",
"> 寫點 Python 程式計算道瓊工業平均指數與標準普爾 500 指數\n",
">\n",
"> 標籤:程式設計,獲取載入\n",
"\n",
"郭耀仁 <yaojenkuo@datainpoint.com> from [datainpoint](https://datainpoint.substack.com/about)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# 載入專案需要使用的套件\n",
"import numpy as np\n",
"import pandas as pd\n",
"from tqdm import tqdm"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## TL; DR\n",
"\n",
"在這個專案中,我們打算寫點 Python 程式,先將道瓊工業平均指數與標準普爾 500 指數的成分股代號由網頁中抓取出來,接著再取得這些成分股的股價以及流通股數,最後再套用加權計算的公式求解。透過這個專案,我們能夠暸解如何使用 Python 實作網頁資料擷取、數值計算以及陣列運算。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 取得道瓊工業平均指數成分股\n",
"\n",
"道瓊工業平均指數成分股由 30 間在紐約證券交易所與 NASDAQ 上市的藍籌股(Blue Chip)公司所組成,可以由 <https://en.wikipedia.org/wiki/Dow_Jones_Industrial_Average> 取得這些公司的代號;使用 `pandas` 套件中的 `read_html()` 函式可以將網頁中的 `<table></table>` 標記擷取下來為資料框。"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['MMM', 'AXP', 'AMGN', 'AAPL', 'BA', 'CAT', 'CVX', 'CSCO', 'KO', 'DOW', 'GS', 'HD', 'HON', 'IBM', 'INTC', 'JNJ', 'JPM', 'MCD', 'MRK', 'MSFT', 'NKE', 'PG', 'CRM', 'TRV', 'UNH', 'VZ', 'V', 'WBA', 'WMT', 'DIS']\n",
"30\n"
]
}
],
"source": [
"request_url = \"https://en.wikipedia.org/wiki/Dow_Jones_Industrial_Average\"\n",
"html_tables = pd.read_html(request_url)\n",
"djia_symbols = [e.replace(\"\\xa0\", \"\").replace(\"NYSE:\", \"\") for e in html_tables[1]['Symbol'].values]\n",
"print(djia_symbols)\n",
"print(len(djia_symbols))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 取得標準普爾 500 指數成分股\n",
"\n",
"標準普爾 500 指數成分股由 505 間在紐約證券交易所、NASDAQ 與芝加哥期貨選擇權交易所上市的藍籌股(Blue Chip)公司所組成,可以由 <https://en.wikipedia.org/wiki/List_of_S%26P_500_companies> 取得這些公司的代號;使用 `pandas` 套件中的 `read_html()` 函式可以將網頁中的 `<table></table>` 標記擷取下來為資料框。"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['MMM', 'ABT', 'ABBV', 'ABMD', 'ACN', 'ATVI', 'ADBE', 'AMD', 'AAP', 'AES']\n",
"505\n"
]
}
],
"source": [
"request_url = \"https://en.wikipedia.org/wiki/List_of_S%26P_500_companies\"\n",
"html_tables = pd.read_html(request_url)\n",
"sp500_symbols = list(html_tables[0]['Symbol'].values)\n",
"print(sp500_symbols[:10]) # print the first 10 symbols\n",
"print(len(sp500_symbols))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 取得成分股的股價以及流通股數\n",
"\n",
"使用 `pandas` 套件取得成份股的股價以及流通股數(Float shares),我們撰寫了一個類別 `GetPriceFloatShares` 可以接受美股代號的輸入,回傳 [Yahoo! Finance](https://finance.yahoo.com/) 的資料,以蘋果電腦(美股代號 AAPL)為例:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"class GetPriceFloatShares:\n",
" def __init__(self, symbol):\n",
" self._symbol = symbol\n",
" query_str_params = {\n",
" \"p\": symbol\n",
" }\n",
" self._query_str_params = query_str_params\n",
" def get_price(self):\n",
" request_url = \"https://finance.yahoo.com/quote/{}/history\".format(self._symbol)\n",
" html_tables = pd.read_html(request_url)\n",
" price = html_tables[0].iloc[0, 4].replace(\",\", \"\")\n",
" return price\n",
" def get_float_shares(self):\n",
" request_url = \"https://finance.yahoo.com/quote/{}/key-statistics\".format(self._symbol)\n",
" html_tables = pd.read_html(request_url)\n",
" float_shares = html_tables[2].set_index(0).loc['Float', :].values[0]\n",
" return float_shares"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"117.32\n",
"17.09B\n"
]
}
],
"source": [
"aapl = GetPriceFloatShares(\"AAPL\")\n",
"print(aapl.get_price())\n",
"print(aapl.get_float_shares())"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"html_tables = pd.read_html(\"https://finance.yahoo.com/quote/AAPL/history?p=AAPL\")"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'117.32'"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"html_tables[0].iloc[0, 4]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 計算道瓊工業平均指數\n",
"\n",
"我們寫一個 `calculate_djia()` 函式計算道瓊工業平均指數,依據股價加權計算公式,是將成分股每個公司的股價加總後再除以道瓊除數(Dow divisor):\n",
"\n",
"\\begin{equation}\n",
"\\text{Dow Jones Industrial Average} = \\frac{\\sum_i P_i}{\\text{Dow divisor}}\n",
"\\end{equation}\n",
"\n",
"道瓊除數在 2020-08-31 為 0.15198707565833(<https://en.wikipedia.org/wiki/Dow_Jones_Industrial_Average>)。"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 30/30 [00:36<00:00, 1.22s/it]\n"
]
}
],
"source": [
"def calculate_djia(dow_divisor, djia_symbols):\n",
" sum_of_prices = 0\n",
" for symb in tqdm(djia_symbols):\n",
" gpfs = GetPriceFloatShares(symb)\n",
" price = gpfs.get_price()\n",
" sum_of_prices += float(price)\n",
" return sum_of_prices / dow_divisor\n",
"\n",
"djia = calculate_djia(0.15198707565833, djia_symbols)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"27940.46784311068\n"
]
}
],
"source": [
"print(djia)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 計算標準普爾 500 指數\n",
"\n",
"我們寫一個 `calculate_sp500()` 函式計算標準普爾 500 指數,依據市值加權計算公式,將成分股每個公司的股價乘以市值加總後再除以 S&P 500 除數(S&P 500 divisor):\n",
"\n",
"\\begin{equation}\n",
"\\text{S&P 500} = \\frac{\\sum_i (P_i \\times Q_i)}{\\text{S&P 500 divisor}} \\\\\n",
"\\text{where} \\; Q_i = \\text{Float shares of stock i}\n",
"\\end{equation}\n",
"\n",
"S&P 500 除數目前約為 83 億美元。(<https://en.wikipedia.org/wiki/S%26P_500_Index>)。由於從 [Yahoo! Finance](https://finance.yahoo.com/) 擷取 500+ 個成分股資訊要一段時間,因此我們先擷取了 2020-09-09 的資訊,存在 `data/sp500.csv`,可以用 `pandas` 的 `read_csv()` 函式讀入成為資料框。"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>symbol</th>\n",
" <th>price</th>\n",
" <th>float_shares</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>MMM</td>\n",
" <td>163.17</td>\n",
" <td>575.35M</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>ABT</td>\n",
" <td>102.84</td>\n",
" <td>1.76B</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>ABBV</td>\n",
" <td>90.22</td>\n",
" <td>1.76B</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>ABMD</td>\n",
" <td>268.19</td>\n",
" <td>44.04M</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>ACN</td>\n",
" <td>232.65</td>\n",
" <td>635.19M</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" symbol price float_shares\n",
"0 MMM 163.17 575.35M\n",
"1 ABT 102.84 1.76B\n",
"2 ABBV 90.22 1.76B\n",
"3 ABMD 268.19 44.04M\n",
"4 ACN 232.65 635.19M"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sp500_df = pd.read_csv(\"data/sp500.csv\")\n",
"sp500_df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"其中在 `float_shares` 欄位我們可以觀察到流通股數資料是以 B 表示 Billion、M 表示 Million,這時我們可以定義一個函式 `unit_transform()` 判斷 B 或 M 之後進行單位的轉換,亦即乘以 1,000,000,000 或 1,000,000。"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"def unit_transform(x):\n",
" if \"B\" in x:\n",
" x = x.replace(\"B\", \"\")\n",
" float_x = float(x)\n",
" return float_x * 1000000000\n",
" elif \"M\" in x:\n",
" x = x.replace(\"M\", \"\")\n",
" float_x = float(x)\n",
" return float_x * 1000000"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`unit_tranform()` 函式目前可以針對單一成分股的流通股數進行單位轉換。"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"575350000.0\n",
"1760000000.0\n"
]
}
],
"source": [
"print(unit_transform(\"575.35M\"))\n",
"print(unit_transform(\"1.76B\"))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"若希望能夠將其應用到所有的成分股上,我們可以使用 `np.vectorize()` 將它改變成一個通用函式(Universal function)。"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['575.35M' '1.76B' '1.76B' '44.04M' '635.19M' '763.22M' '477.4M' '1.17B'\n",
" '64.61M' '662.4M']\n",
"[5.7535e+08 1.7600e+09 1.7600e+09 4.4040e+07 6.3519e+08 7.6322e+08\n",
" 4.7740e+08 1.1700e+09 6.4610e+07 6.6240e+08]\n"
]
}
],
"source": [
"unit_transform_ufunc = np.vectorize(unit_transform)\n",
"print(sp500_df['float_shares'].values[:10])\n",
"print(unit_transform_ufunc(sp500_df['float_shares'].values[:10]))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"最後依據市值加權計算公式定義 `calculate_sp500()` 函式計算標準普爾 500 指數。"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"def calculate_sp500(sp500_divisor, sp500_df):\n",
" prices = sp500_df['price'].values\n",
" float_shares = unit_transform_ufunc(sp500_df['float_shares'].values)\n",
" return np.sum(prices * float_shares) / sp500_divisor\n",
"\n",
"sp500 = calculate_sp500(8.3e9, sp500_df)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"3397.1794004698795\n"
]
}
],
"source": [
"print(sp500)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python Data Analysis",
"language": "python",
"name": "pyda"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment