Skip to content

Instantly share code, notes, and snippets.

@ccwang002
Last active December 31, 2015 05:49
Show Gist options
  • Save ccwang002/7943637 to your computer and use it in GitHub Desktop.
Save ccwang002/7943637 to your computer and use it in GitHub Desktop.
2013.12 貝氏應用統計期末報告,附錄程式碼

Readme

這些程式碼包含:

  • 資料前處理
  • 繪圖
  • Bayesian linear regression model 計算

三大部份的程式碼,以下將分別列述。 由於原始資料需經過一定處理轉換才能使用,但細節較多且複雜,另獨立列述。

如果要跳過資料前處理,可以使用使用 101_境外學生總計_國別_ConsumerIndex.csv link 作後續的繪圖與 model 計算。

所有的檔案可以一次下載,使用左方的 Download Gist 一連結。

原始資料來源

  • 101_大專校院境外學生在臺人數統計,教育部統計處 http://stats.moe.gov.tw/files/detail/101/101_ab101.xls 為求清楚、方便處理,重新命名並用新的 Excel 格式另存為 101_大專校院境外學生在臺(國別、校別).xlsx

  • Cost of Livings Index by Country, Numbeo 這邊使用 HTML 直接複製原始 102 個國家表格於 Excel 中,另存為 CSV 格式 living_cost.csv,見本 gist 後面檔案。 但由於許多國家的資料在 Numbeo 分別瀏覽時有,但在此表中沒有出現,故另行補上

以上的資料在處理時放在 dataset/ 資料夾中,程式碼的相對路徑皆需要根據使用者設定作修改。

資料前處理

包含兩檔案:

使用 IPython Notebook 撰寫,檔名後連結為一個線上觀看 notebook 格式的網站。 裡面所儲存的程式碼使用 Python 語言撰寫。如果需要在本機端開啟並重新執行,需要安裝 Python 3.3.2 以及相關的套件:

  • Pandas (提供向 R 的 data.frame 功能)
  • Numpy 1.7+
  • IPython 1.0+
  • pygeocoder
  • geopy

安裝方式

在非 Windows 系統上安裝,以 Ubuntu 舉例

# Python 3.x dependcies
sudo apt-get install build-essential git xz-utils \
  libncursesw5-dev libreadline-dev libssl-dev libgdbm-dev \
  libc6-dev libsqlite3-dev tk-dev

# Install Python 3.3.2
cd /tmp
wget http://python.org/ftp/python/3.3.2/Python-3.3.2.tar.xz
tar Jxvf Python-3.3.2.tar.xz
cd Python-3.3.2
./configure
make
sudo make install

# Install setuptools
wget https://bitbucket.org/pypa/setuptools/raw/bootstrap/ez_setup.py
sudo python3 ez_setup.py
sudo easy_install-3.3 pip

# Install Numpy
sudo apt-get install liblapack-dev libatlas-base-dev gfortran
sudo pip-3.3 install numpy

# Install IPython
sudo easy_install-3.3 ipython[all]

其餘套件均可以使用 sudo pip-3.3 install <package_name> 方式安裝。 就可以使用 ipython notebook 執行相關的程式碼

繪圖

繪圖的程式碼較複雜,故另外獨立出來。主要有兩程式碼

  • display_distance.R : 距離相關的繪圖
  • display_livingcost.R : 消費指數相關的繪圖

需要安裝像 ggplot2, gridExtra, googleVis 等套件。

Model 計算

  • Full model (把 4 個 X 放入計算): full_linear_reg_model.R
  • Model selection: model_select.R
洲別 國別 境外學生總計 long_name short_name long lat tpe_distance ConsumerIndex
亞洲 馬來西亞 8530.0 Malaysia MY 101.975766 4.210483999999999 3118.823437829021 35.83
亞洲 澳門 4495.0 Macau MO 113.543873 22.198745 878.1294552393146 51.58
亞洲 日本 4457.0 Japan JP 138.252924 36.204824 2014.2363241703463 80.96
亞洲 越南 3915.0 Vietnam VN 108.277199 14.058324 1850.615977969412 32.44
亞洲 香港 3083.0 Hong Kong HK 114.109497 22.396428 815.9221793584004 82.98
亞洲 印尼 2799.0 Indonesia ID 113.921327 -0.789275 2979.4155215637797 32.43
亞洲 南韓 2531.0 South Korea KR 127.766922 35.90775700000001 1338.215024827551 55.58
亞洲 中國大陸 17454.0 China CN 119.332526 33.6219 969.9112097536964 44.51
亞洲 泰國 1250.0 Thailand TH 100.992541 15.870032 2371.9181834891187 37.15
亞洲 蒙古 715.0 Mongolia MN 103.846656 46.862496 2881.187420272893 42.23
亞洲 印度 575.0 India IN 78.96288 20.593684 4382.940751418016 19.61
亞洲 菲律賓 480.0 Philippines PH 121.774017 12.879721 1351.9257043939344 30.2
亞洲 緬甸 449.0 Myanmar (Burma) MM 95.956223 21.913965 2634.6861511999195 35.9
亞洲 俄羅斯 357.0 Russia RU 105.318756 61.52401 4230.195945056516 50.34
亞洲 新加坡 348.0 Singapore SG 103.819836 1.352083 3245.464098668557 104.26
亞洲 土耳其 112.0 Turkey TR 35.243322 38.963745 8001.540230806996 41.24
亞洲 約旦 48.0 Jordan JO 36.238414 30.585164 8218.490891015343 41.59
亞洲 伊朗 38.0 Iran IR 53.68804599999999 32.427908 6570.14020893806 41.33
亞洲 以色列 38.0 Israel IL 34.851612 31.046051 8325.35863781858 63.46
亞洲 尼泊爾 34.0 Nepal NP 84.12400799999998 28.394857 3727.326912962233 21.91
亞洲 孟加拉 25.0 Bangladesh BD 90.356331 23.684994 3162.2933290443293 27.05
亞洲 伊拉克 24.0 Iraq IQ 43.679291 33.223191 7459.0080881411195 38.39
亞洲 斯里蘭卡 17.0 Sri Lanka LK 80.77179699999998 7.873053999999999 4727.401610376703 33.27
亞洲 汶萊 14.0 Brunei BN 114.727669 4.535277 2389.106060855059 68.2
亞洲 哈薩克 10.0 Kazakhstan KZ 66.923684 48.019573 5372.722062490883 46.64
亞洲 巴基斯坦 9.0 Pakistan PK 69.34511599999999 30.375321000000003 5137.433418823234 20.31
亞洲 吉爾吉斯 7.0 Kyrgyzstan KG 74.766098 41.20438 4651.169586626263
亞洲 柬埔寨 6.0 Cambodia KH 104.990963 12.565679 2225.634027197917 33.44
亞洲 敘利亞 5.0 Syria SY 38.99681500000001 34.80207499999999 7823.325774788828 33.05
亞洲 葉門 5.0 Yemen YE 48.516388 15.552727 7616.2134244375075 29.45
亞洲 烏茲別克 4.0 Uzbekistan UZ 64.585262 41.377491 5502.687174983322 33.81
亞洲 黎巴嫩 3.0 Lebanon LB 35.862285 33.854721000000005 8130.186982761907 57.82
亞洲 阿拉伯聯合大公國 3.0 United Arab Emirates AE 53.847818 23.424076 6802.020765452293 69.63
亞洲 不丹 2.0 Bhutan BT 90.433601 27.514162 3111.7752094862294
亞洲 寮國 2.0 Laos LA 102.495496 19.85627 2043.5918984379093 36.39
亞洲 沙烏地阿拉伯 2.0 Saudi Arabia SA 45.079162 23.885942 7637.968608576841 45.08
亞洲 亞美尼亞 2.0 Armenia AM 45.038189 40.069099 7156.815784852356 31.19
亞洲 阿富汗 1.0 Afghanistan AF 67.709953 33.93911 5252.946440110646 42.53
亞洲 賽普勒斯 1.0 Cyprus CY 33.429859 35.126413 8293.15297957603 63.28
亞洲 巴勒斯坦 1.0 Palestine PS 35.233154 31.952162 8256.570747836091
亞洲 喬治亞 1.0 Georgia GE 43.35689199999999 42.315407 7242.313420137883 36.18
亞洲 土庫曼 1.0 Turkmenistan TM 59.556278000000006 38.969719 5940.200100629239 30.33
亞洲 亞塞拜然 1.0 Azerbaijan AZ 47.57692700000001 40.143105 6943.023509289576 60.81
亞洲 摩爾多瓦 1.0 Moldova MD 28.369885 47.411631 8227.240764670578 30.75
大洋洲 澳大利亞 272.0 Australia AU 133.775136 -25.274398 5725.629604163969 104.41
大洋洲 紐西蘭 94.0 New Zealand NZ 174.88597099999996 -40.900557 9148.454950049549 81.2
大洋洲 索羅門群島 63.0 Solomon Islands SB 160.156194 -9.64571 5682.537714929305
大洋洲 吉里巴斯 48.0 Zhongba Zhongba 84.03152999999998 29.770279 3730.3439703737617
大洋洲 吐瓦魯 41.0 Tuvalu TV 178.6799214 -7.478441900000001 7157.1014680679245
大洋洲 帛琉 30.0 Palau PW 134.58252 7.514979999999999 2387.627282411628
大洋洲 馬紹爾群島共和國 30.0 Marshall Islands MH 171.184478 7.131474000000001 5629.648147332142
大洋洲 諾魯 18.0 Nauru NR 166.931503 -0.522778 5644.739922547878
大洋洲 斐濟 3.0 Fiji FJ 178.065032 -17.713371 7740.713065621621 43.2
大洋洲 巴布亞紐幾內亞 1.0 Papua New Guinea PG 147.203979 -9.503244 4731.3917564677
非洲 甘比亞 227.0 The Gambia GM -15.310139 13.443182 13690.532050606767
非洲 南非 201.0 South Africa ZA 22.937506 -30.559482 12164.357954338682 41.57
非洲 史瓦濟蘭 113.0 Swaziland SZ 31.465866 -26.522503000000004 11226.550685019984
非洲 布吉納法索 108.0 Burkina Faso BF -1.561593 12.238333 12601.757200542106
非洲 聖多美普林西比 56.0 São Tomé and Príncipe ST 6.6130809999999975 0.18636 12507.06435517828
非洲 衣索比亞 50.0 Ethiopia ET 40.489673 9.145 8694.500801664088 42.77
非洲 肯亞 20.0 Kenya KE 37.906193 -0.023559 9376.872189422962 37.89
非洲 奈及利亞 20.0 Nigeria NG 8.675277000000001 9.081999 11835.504151044002 53.92
非洲 埃及 14.0 Egypt EG 30.802498 26.820553000000004 8865.563090345793 28.64
非洲 馬拉威 13.0 Malawi MW 34.301525 -13.254307999999998 10360.451569175682 71.01
非洲 模里西斯 10.0 Mauritius MU 57.55215200000001 -20.348404 8556.012256134067 39.54
非洲 剛果 8.0 Congo CG 15.827659 -0.228021 11609.873457221873
非洲 莫三比克 8.0 Mozambique MZ 35.529562 -18.665695 10493.38124682724
非洲 辛巴威 7.0 Zimbabwe ZW 29.154857 -19.015438 11122.945731279236 45.77
非洲 摩洛哥 6.0 Morocco MA -7.0926199999999975 31.791702 11685.778616781943 36.12
非洲 坦尚尼亞 6.0 Tanzania TZ 34.888822 -6.369028 9979.193743611075 35.22
非洲 象牙海岸 5.0 Côte d'Ivoire CI -5.547079999999999 7.5399889999999985 13256.02306270979
非洲 烏干達 5.0 Uganda UG 32.290275 1.373333 9877.862999352139 35.82
非洲 迦納 4.0 Ghana GH -1.0231940000000002 7.9465270000000015 12814.871280668965 54.85
非洲 蘇丹 3.0 Sudan SD 30.217636 12.862807 9550.179843567385 41.58
非洲 多哥 3.0 Togo TG 0.824782 8.619543 12603.081469500412
非洲 尚比亞 3.0 Zambia ZM 27.849332 -13.133897 10992.375216505665
非洲 喀麥隆 2.0 Cameroon CM 12.354722 7.3697219999999986 11572.09862005204 46.79
非洲 納米比亞 2.0 Namibia 18.49041 -22.95764 12314.143987669931 47.74
非洲 貝南 1.0 Borgou Borgou 2.7779813 9.5340864 12367.360714195125
非洲 查德 1.0 Chad TD 18.732207 15.454166 10537.80834438672
非洲 幾內亞 1.0 Guinea GN -9.696645 9.945587 13470.035002392971
非洲 利比亞 1.0 Libya LY 17.228331 26.3351 10106.067493299664
非洲 馬達加斯加 1.0 Madagascar MG 46.869107 -18.766947 9429.644573401949
非洲 盧安達 1.0 Rwanda RW 29.873888 -1.940278 10276.481267058167 32.85
非洲 塞內加爾 1.0 Senegal SN -14.452362 14.497401000000002 13543.399233254611
非洲 突尼西亞 1.0 Tunisia TN 9.537499 33.886917 10315.866079033523 31.74
歐洲 法國 1232.0 France FR 2.213749 46.227638 10031.944400408192 71.92
歐洲 德國 799.0 Germany DE 10.451526 51.165691 9221.326456774634 63.61
歐洲 英國 320.0 United Kingdom GB -3.435973 55.378051 9682.63682419338 72.89
歐洲 荷蘭 286.0 The Netherlands NL 5.291265999999999 52.132633 9459.462078628465 72.03
歐洲 波蘭 244.0 Poland PL 19.145136 51.919438 8651.421976027801 38.63
歐洲 西班牙 243.0 Spain ES -3.74922 40.46366700000001 10813.431494292065 57.2
歐洲 瑞典 213.0 Sweden SE 18.643501 60.12816100000001 8300.776539450033 76.86
歐洲 義大利 194.0 Italy IT 12.56738 41.87194 9620.365108763655 70.12
歐洲 捷克 189.0 Czech Republic CZ 15.472962 49.81749199999999 8987.391359201036 43.04
歐洲 奧地利 146.0 Austria AT 14.550072 47.516231 9169.886610140962 68.57
歐洲 比利時 102.0 Belgium BE 4.469936 50.503887 9611.932701687072 74.99
歐洲 瑞士 95.0 Switzerland CH 8.227511999999999 46.818188 9619.348671853724 112.95
歐洲 烏克蘭 81.0 Ukraine UA 31.16558 48.379433 7994.735911202741 32.33
歐洲 芬蘭 72.0 Finland FI 25.748151 61.92410999999999 7871.154060492467 75.02
歐洲 斯洛伐克 57.0 Slovakia SK 19.699024 48.669026 8770.361608547695 45.92
歐洲 匈牙利 53.0 Hungary HU 19.503304 47.162494 8855.989784415759 39.28
歐洲 丹麥 43.0 Denmark DK 9.501785 56.26392 8976.137483679468 82.98
歐洲 愛爾蘭 38.0 Ireland IE -8.243889999999999 53.41291 10059.04823706989 77.79
歐洲 挪威 36.0 Norway NO 8.468945999999999 60.47202399999999 8780.023564010653 123.52
歐洲 葡萄牙 30.0 Portugal PT -8.224454 39.39987199999999 11172.164610046912 50.84
歐洲 克羅埃西亞 28.0 Croatia HR 15.2 45.1 9256.666428357445 44.72
歐洲 斯洛維尼亞 21.0 Slovenia SI 14.506559 46.055484 9252.159159943876 55.38
歐洲 立陶宛 19.0 Lithuania LT 23.881275 55.169438 8220.955629412909 40.7
歐洲 白俄羅斯 18.0 Belarus BY 27.953389 53.709807 8026.485676032184 33.06
歐洲 羅馬尼亞 11.0 Romania RO 24.96676 45.943161 8528.902311921769 33.77
歐洲 愛沙尼亞 9.0 Estonia EE 25.013607 58.595272 8025.695683667203 44.86
歐洲 拉脫維亞 9.0 Latvia LV 24.603189 56.879635 8112.400183028664 42.67
歐洲 保加利亞 8.0 Bulgaria BG 25.48583 42.733883 8627.66236442726 33.35
歐洲 希臘 8.0 Greece GR 21.824312 39.074208 9075.516026366002 57.09
歐洲 冰島 6.0 Iceland IS -19.020835 64.96305100000001 9462.921194084893 77.33
歐洲 盧森堡 4.0 Luxembourg LU 6.1295829999999984 49.815273 9560.03731908462 97.36
歐洲 馬其頓 4.0 Macedonia (FYROM) MK 21.745275 41.608635 8960.812805257849 27.29
歐洲 塞爾維亞共和國 4.0 Serbia RS 21.005859 44.016521 8901.188270217013 33.61
歐洲 阿爾巴尼亞 1.0 Albania AL 20.168331 41.153332 9101.585827570592 30.5
歐洲 波士尼亞與赫塞哥維納 1.0 Bosnia and Herzegovina BA 18.433227 43.858297 9094.308031680404
美洲 美國 3582.0 United States US -95.712891 37.09024 12097.670372846202 59.83
美洲 加拿大 774.0 Canada CA -106.346771 56.130366 9938.83706395379 70.94
美洲 宏都拉斯 258.0 Luss Luss -4.6403349999999985 56.099985 9684.008125985983 32.32
美洲 巴拉圭 210.0 Paraguay PY -58.44383199999999 -23.442503 19821.331806363058
美洲 巴拿馬 193.0 Panama PA -80.782127 8.537981 15573.558765944248 41.32
美洲 尼加拉瓜 192.0 Nicaragua NI -85.207229 12.865416 14907.92407978164 53.03
美洲 瓜地馬拉 167.0 Guatemala GT -90.23075899999999 15.783471 14328.398457171854 52.8
美洲 薩爾瓦多 166.0 El Salvador SV -88.89653 13.794185 14590.47307316171 52.68
美洲 貝里斯 158.0 Belize BZ -88.49765 17.189877 14313.119071042649
美洲 墨西哥 137.0 Mexico MX -102.552784 23.634501 12827.70959032813 34.23
美洲 巴西 113.0 Brazil BR -51.92528 -14.235004 18629.22190257964 47.34
美洲 海地 91.0 Haiti HT -72.28521500000002 18.971187 14906.437145671194
美洲 多明尼加 76.0 Dominica DM -61.37097600000001 15.414999 15511.761303838635
美洲 秘魯 70.0 Peru PE -75.015152 -9.189967 17532.897397910026 32.81
美洲 厄瓜多 69.0 Ecuador EC -78.517342 -0.20118699999999998 16515.385774584855 30.44
美洲 聖露西亞 55.0 Saint Lucia LC -60.97889300000001 13.909444 15680.795462439299
美洲 阿根廷 51.0 Argentina AR -63.61667199999999 -38.416097 18450.18544286892 46.29
美洲 聖文森 48.0 Bois de Vincennes Bois de Vincennes 2.4341723 48.83294710000001 9841.091634308012
美洲 智利 40.0 Chile CL -71.542969 -35.675147 18298.909062214618 45.79
美洲 哥倫比亞 37.0 Colombia CO -74.297333 4.570868 16307.716388450075 37.12
美洲 哥斯大黎加 28.0 Costa Rica CR -83.753428 9.748917 15281.943397704172 44.66
美洲 聖克里斯多福 24.0 Saint Kitts and Nevis KN -62.782998 17.357822 15284.634334435408
美洲 玻利維亞 20.0 Bolivia BO -63.588653 -16.290154 18898.567274317982 27.57
美洲 委內瑞拉 9.0 Venezuela VE -66.58973 6.42375 16409.276472089605 77.64
美洲 牙買加 4.0 Jamaica JM -77.297508 18.109581 14810.733504137119 49.58
美洲 古巴 2.0 Cuba CU -77.78116700000002 21.521757 14443.510287777975
美洲 巴貝多 1.0 Barbados BB -59.612217 13.107567999999999 15776.290413547182 60.37
美洲 千里達 1.0 Republic of Trinidad and Tobago TT 14.4738541 61.11693639999999 8460.028397377986 40.09
美洲 烏拉圭 1.0 Uruguay UY -55.765835 -32.522779 19142.197458156974 55.01
##########################
# 留學生人數與距離的關係 #
##########################
# 資料前處理 ########
df.bycountry <- read.csv("dataset/101_境外學生總計_國別.csv", stringsAsFactors=FALSE)
colnames(df.bycountry) <- c("continent", "country_orig", "student_total",
colnames(df.bycountry)[-(1:3)])
# 去除 大陸、香港、澳門
# df.bycountry <- df.bycountry[-c(2, 5, 8), ]
# 去除 大陸
# df.bycountry <- df.bycountry[-c(8), ]
# 全部國家的繪圖 #######
library(ggplot2)
g.full <- ggplot(df.bycountry, aes(x=tpe_distance, y=student_total, color=continent))
# g + geom_point() + scale_y_log10() + scale_x_log10() # 使用 log-scale 作圖
g.full <- g.full + geom_point(size=5, alpha=0.6) + theme_bw() +
theme(text=element_text(family="Heiti TC Medium", size=20)) +
labs(x=NULL, y=NULL, color=NULL) +
annotate("text", x=5000, y=8500, label="馬來西亞", family="Heiti TC Medium", size=6) +
annotate("text", x=12000, y=4500, label="美國",family="Heiti TC Medium", size=6) +
annotate("text", x=3000, y=17500, label="中國大陸",family="Heiti TC Medium", size=6) +
theme(legend.position=c(1,1), legend.justification=c(1,1))
## 取前 50% 多的國家 #######
g.upper_median <- ggplot(df.bycountry[df.bycountry$student_total > 30, ],
aes(x=tpe_distance, y=student_total, color=continent))
g.upper_median <- g.upper_median + geom_point(size=5, alpha=0.6) + theme_bw() +
theme(text=element_text(family="Heiti TC Medium", size=20)) +
labs(x=NULL, y=NULL, color=NULL) +
theme(legend.position=c(1,1), legend.justification=c(1,1))
## 取另外一半 #######
g.lower_median <- ggplot(df.bycountry[df.bycountry$student_total <= 30, ],
aes(x=tpe_distance, y=student_total, color=continent))
g.lower_median <- g.lower_median + geom_point(size=5, alpha=0.6) + theme_bw() +
theme(text=element_text(family="Heiti TC Medium", size=20)) +
labs(x=NULL, y=NULL, color=NULL) +
theme(legend.position=c(1,1), legend.justification=c(1,1))
## 全部合併
library(gridExtra)
grid.arrange(
arrangeGrob(g.full, ncol=1),
arrangeGrob(g.upper_median, g.lower_median, ncol=1),
ncol=2,
sub=textGrob("與臺灣距離 (km)\n", gp=gpar(fontsize=20, fontfamily="Heiti TC Medium")),
left=textGrob("\n留學生人數", gp=gpar(fontsize=20, fontfamily="Heiti TC Medium"), rot=90)
)
#####################################
# 留學生人數與 ConsumerIndex 的關係 #
#####################################
# 資料前處理
df.living <- read.csv("dataset/101_境外學生總計_國別_ConsumerIndex.csv")
colnames(df.living) <- c("continent", "country_orig", "student_total", "long_name", "short_name", "longitude", "latitude", colnames(df.living)[-(1:7)])
df.living <- na.omit(df.living) # 去除沒有完整資訊的國家
## 作圖於 google Map 上,這是互動網頁,會需要另存成 png
library(googleVis)
plot(gvisGeoMap(df.living, locationvar="long_name", numvar="ConsumerIndex",
options=list(width=1200, height=800, dataMode='regions',
colors='[0xFDFF73, 0xFF7573, 0xDC0300]')
))
# 全部國家 #######
g.full <- ggplot(df.living, aes(x=ConsumerIndex, y=student_total, color=continent))
g.full <- g.full + geom_point(size=5, alpha=0.6) + theme_bw() +
theme(text=element_text(family="Heiti TC Medium", size=20)) +
labs(x=NULL, y=NULL, color=NULL) +
theme(legend.position=c(1,1), legend.justification=c(1,1)) +
geom_vline(aes(xintercept=x, color=continent), data=data.frame(x=39.86, continent="亞洲"),
size=1, alpha=0.6, linetype="dashed")
g.full
# 比台灣 CI 高的國家 #######
g.upperCI <- ggplot(df.living[df.living$ConsumerIndex > 40, ],
aes(x=ConsumerIndex, y=student_total, color=continent))
g.upperCI <- g.upperCI + geom_point(size=5, alpha=0.6) + theme_bw() +
theme(text=element_text(family="Heiti TC Medium", size=20)) +
labs(x=NULL, y=NULL, color=NULL) +
theme(legend.position=c(1,1), legend.justification=c(1,1)) +
geom_vline(aes(xintercept=x, color=continent), data=data.frame(x=39.86, continent="亞洲"),
size=1, alpha=0.6, linetype="dashed") + scale_y_log10()
g.upperCI
# 比台灣 CI 低的國家 #######
g.lowerCI <- ggplot(df.living[df.living$ConsumerIndex <= 40, ],
aes(x=ConsumerIndex, y=student_total, color=continent))
g.lowerCI <- g.lowerCI + geom_point(size=5, alpha=0.6) + theme_bw() +
theme(text=element_text(family="Heiti TC Medium", size=20)) +
labs(x=NULL, y=NULL, color=NULL) +
theme(legend.position=c(1,1), legend.justification=c(1,1)) +
geom_vline(aes(xintercept=x, color=continent), data=data.frame(x=39.86, continent="亞洲"),
size=1, alpha=0.6, linetype="dashed") + scale_y_log10()
g.lowerCI
## 合併繪圖
library(gridExtra)
grid.arrange(
arrangeGrob(g.full, ncol=1),
arrangeGrob(g.upperCI, g.lowerCI, ncol=1),
ncol=2,
sub=textGrob("消費含租屋指數(NYC=100, TW=40)\n", gp=gpar(fontsize=20, fontfamily="Heiti TC Medium")),
left=textGrob("\n留學生人數", gp=gpar(fontsize=20, fontfamily="Heiti TC Medium"), rot=90)
)
#########################################################
# Bayesian Linear Normal Regression Model using g-prior #
#########################################################
# Data Preprocessing ########
df.living <- read.csv("dataset/101_境外學生總計_國別_ConsumerIndex.csv", stringsAsFactors=FALSE)
colnames(df.living) <- c("continent", "country_orig", "student_total", colnames(df.living)[-(1:3)])
df.needed <- df.living[, c("student_total", "tpe_distance", "ConsumerIndex")]
df.needed <- na.omit(df.needed) # Trim any with NAs
names(df.needed) <- c("y", "x_dist", "x_ci")
row.names(df.needed) <- NULL
# Construct all Xs ########
## Here we have
## x_const=1, x_dist, x_ci, x_interact
## four regressors. For simplicity, we don't consider further classification
## based on the preliminary view of data.
df.needed$x_const <- 1
df.needed$x_interact <- df.needed$x_dist * df.needed$x_ci
Y <- df.needed$y
X <- as.matrix(df.needed[, -c(1)])
# Sketch ########
panel.cor <- function(x, y, digits=2, prefix="", cex.cor) {
usr <- par("usr"); on.exit(par(usr))
par(usr = c(0, 1, 0, 1))
r = (cor(x, y))
txt <- format(c(r, 0.123456789), digits=digits)[1]
txt <- paste(prefix, txt, sep="")
if(missing(cex.cor)) cex <- 0.8/strwidth(txt)
text(0.5, 0.5, txt, cex = 5) # `cex` or `cex * abs(r)`
}
pairs(~ x_dist + x_ci + x_interact, upper.panel=panel.cor, data=df.needed)
# Model Computation ########
## using OLS, x_const will be dropped for singularity
model <- lm(y ~ x_dist + x_ci + x_const + x_interact, data=df.needed)
beta_ols <- model$coefficients[c(2, 3, 1, 5)] # notice the coeff order should match
dim(beta_ols) <- c(1, 4)
# by definition, simga_ols^2 = sum[(Y - predict_y)^2] / (N-p)
simga_ols <- sqrt(sum((Y - beta_ols %*% t(X))^2) / (length(Y) - dim(X)[2]))
## using g-prior with least prior information
S <- 100000
g <- length(Y); nu0 <- 1; sq2_0 <- simga_ols^2 # prior parameters
n <- length(Y); p <- dim(X)[2] # some useful info about data
inv_Xsq <- solve(t(X) %*% X)
Hg <- ( g/(g+1) ) *
X %*% inv_Xsq %*% t(X) # beta_ols = inv_Xsq %*% t(X) %*% Y
SSRg <- t(Y) %*% (diag(1, nrow=n) - Hg) %*% Y
sq2_vec <- 1 / rgamma(S, (nu0+n)/2, (nu0*sq2_0+SSRg)/2)
Vb <- g * inv_Xsq / (g+1)
Eb <- Vb %*% t(X) %*% Y
E <- matrix(rnorm(S*p, 0, sqrt(sq2_0)), S, p)
beta <- t(t(E %*% chol(Vb)) + c(Eb))
# Plot ########
par(mfrow=c(2,2))
hist(beta[, "x_const"], main=expression(beta[const]))
hist(beta[, "x_dist"], main=expression(beta[dist]))
hist(beta[, "x_ci"], main=expression(beta[ci]))
hist(beta[, "x_interact"], main=expression(beta[interact]))
Country Consumer Price Index Rent Index Consumer Price Plus Rent Index Groceries Index Restaurant Price Index Local Purchasing Power Index
Norway 173.85 68.99 123.52 162.53 183.3 95.91
Switzerland 151.77 70.9 112.95 153.05 142.36 137.26
Australia 133.66 72.72 104.41 128.69 116.39 103.02
Singapore 105.39 103.04 104.26 91.19 63.28 59.94
Luxembourg 124.76 67.68 97.36 111.78 130.8 121.39
Qatar 82.36 84.11 83.2 75 78.79 136.65
Denmark 119.95 42.93 82.98 103.97 130.5 102.15
Hong Kong 77.81 88.57 82.98 79.82 54.48 80.3
New Zealand 113.63 46.06 81.2 111.68 93.48 80.65
Japan 115.24 43.81 80.96 112.83 70.01 85.99
Ireland 112.33 40.37 77.79 105.55 102.99 95.06
Venezuela 111.01 41.48 77.64 126.37 93.65 16.64
Iceland 112.43 39.3 77.33 118.81 99.72 67.91
Sweden 114.47 36.11 76.86 105.57 110.14 106.57
Netherlands 103.9 45.71 75.97 81.11 107.68 92.88
Finland 106.99 40.37 75.02 101.29 100.24 94.06
Belgium 108.04 39.18 74.99 90.14 115.6 94.09
Bahrain 113.49 29.58 73.21 162.91 71.01 59.29
United Kingdom 102.24 41.1 72.89 93.06 94.28 89.07
France 103.24 37.99 71.92 97.75 100.45 98.11
Canada 99.65 39.85 70.94 101.24 85.58 109.56
Italy 101.42 36.21 70.12 87.31 107.38 70.44
United Arab Emirates 75.48 63.29 69.63 64.99 69.9 121.24
Austria 97.88 36.81 68.57 92.75 81.43 87.47
Germany 91.64 33.24 63.61 80.74 79.68 117.58
Israel 92.24 32.28 63.46 76.92 90.09 72.55
Cyprus 98.61 24.99 63.28 89.35 101.17 72.48
Azerbaijan 78.25 41.92 60.81 67.88 64.83 20.48
United States 80.54 37.39 59.83 80.74 67.87 136.5
Kuwait 80.74 34.55 58.57 78.99 78.65 84.61
Lebanon 75.63 38.51 57.82 58.06 65.2 42.46
Spain 83.14 29.09 57.2 66.55 80.97 84.04
Greece 92.57 18.64 57.09 74.67 95.52 49.44
South Korea 80.44 28.66 55.58 94.12 46.5 102.13
Slovenia 82.06 26.48 55.38 70.49 63.24 59.25
Uruguay 81.66 26.14 55.01 70.53 75.66 37.58
Puerto Rico 82.56 25 54.93 83.45 65.03 94.08
Malta 85.61 20.6 54.4 73.68 77.57 77.82
Nigeria 87.18 17.89 53.92 92.93 62.45 32.73
Portugal 75.31 24.34 50.84 61.25 59.09 54.12
Russia 69.69 29.38 50.34 56.88 73.64 42.96
Brazil 70 22.79 47.34 53.97 51.81 41.26
Trinidad And Tobago 61.6 30.56 46.7 55.8 48.42 44.27
Kazakhstan 65.16 26.56 46.64 51.34 67.36 32.21
Argentina 70.44 20.12 46.29 64.52 65.43 50.24
Slovakia 65.72 24.46 45.92 56.16 44.54 53.78
Chile 68.38 21.3 45.79 58.15 56.28 56.43
Taiwan 67.83 20.52 45.12 79.96 31.75 73.02
Saudi Arabia 72.17 15.73 45.08 74.47 36.51 114.63
Estonia 70.75 16.8 44.86 60.94 53.36 53.09
Croatia 72.24 14.9 44.72 64.47 55.98 51.58
Costa Rica 66.62 20.87 44.66 71.79 54.7 45.59
China 61.75 25.82 44.51 66.43 36.53 37.27
Czech Republic 63.35 21.04 43.04 54.69 40.61 60.65
Ethiopia 59.86 24.26 42.77 62.51 26.1 11.48
Latvia 67.96 15.26 42.67 55.36 58.45 41.34
Dominican Republic 66.17 15.44 41.82 66.12 41.54 27.99
Jordan 65.55 15.63 41.59 56.78 56.37 36.18
South Africa 60.2 21.39 41.57 52.86 49.76 105.33
Oman 54.56 27.5 41.57 51.37 39.81 129.2
Iran 57.04 24.3 41.33 59.28 42.76 37.51
Panama 53.38 28.26 41.32 58.83 36.26 37.52
Turkey 64.39 16.16 41.24 50.04 46.33 56.44
Lithuania 63.33 16.18 40.7 55.36 41.98 42.24
Montenegro 63.18 15.2 40.15 56.02 55.2 43.94
Mauritius 54.48 23.36 39.54 43.46 53.48 46.08
Hungary 63.9 12.6 39.28 50.87 47.33 42.97
Poland 55.64 20.2 38.63 46.58 43.08 54.72
Iraq 57.16 18.04 38.39 50.89 46.23 33.66
Kenya 56.99 17.19 37.89 60.96 33.82 38.93
Thailand 51.78 21.3 37.15 57.93 25.48 35.01
Colombia 58.44 14.01 37.12 52.83 38.31 29.71
Georgia 53.8 17.09 36.18 45.54 47.48 28.56
Morocco 54.04 16.7 36.12 44.71 42.31 26.38
Malaysia 54.82 15.25 35.83 51.46 27.37 69.64
Tanzania 54.88 13.91 35.22 59.2 24.94 66.03
Mexico 53.56 13.29 34.23 52.48 37.4 59.77
Romania 51.89 14.14 33.77 47.31 37.2 35.81
Serbia 53.59 11.96 33.61 43.53 42.78 36.58
Cambodia 55.84 9.18 33.44 64.35 24.45 15.45
Bulgaria 52.68 12.39 33.35 46.89 38.82 37.71
Sri Lanka 50.58 14.52 33.27 55 29.73 27.18
Belarus 51.32 13.28 33.06 41.89 58.09 23.45
Syria 47.3 17.61 33.05 36.7 40.65 21.89
Peru 48.82 15.47 32.81 42.62 31.18 45.28
Vietnam 43.21 20.77 32.44 41.77 23.74 31.86
Indonesia 46.14 17.58 32.43 51 23.71 24.62
Ukraine 48.56 14.74 32.33 41.48 43.47 31.97
Bosnia And Herzegovina 53.4 8.65 31.92 47.43 38.07 46.42
Tunisia 50.32 11.61 31.74 44.39 36.83 34.18
Moldova 47.24 12.89 30.75 38.28 37.97 24.52
Albania 48.64 10.84 30.5 40.23 31.83 35.28
Ecuador 46.61 12.91 30.44 48.18 26.45 25.47
Philippines 48.01 10.91 30.2 51.23 26.89 25.66
Macedonia 46.48 10.72 29.32 38.44 33.22 34.9
Egypt 44.82 11.1 28.64 40.33 37.18 30.32
Bolivia 41.5 12.48 27.57 36.42 21.48 46.39
Bangladesh 44.83 7.79 27.05 38.51 26.43 33.26
Algeria 41.49 6.06 24.48 43.76 24.84 32.71
Nepal 38.74 3.66 21.91 36.73 26.19 29.49
Pakistan 33.41 6.12 20.31 32.09 25.34 26.03
India 30.92 7.36 19.61 32.51 18.09 61.92
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
# data preprocessing
df.living <- read.csv("dataset/101_境外學生總計_國別_ConsumerIndex.csv", stringsAsFactors=FALSE)
colnames(df.living) <- c("continent", "country_orig", "student_total", colnames(df.living)[-(1:3)])
# df.living <- df.living[-c(8), ] # 去除 大陸
df.needed <- df.living[, c("student_total", "tpe_distance", "ConsumerIndex")]
df.needed <- na.omit(df.needed) # Trim any with NAs
names(df.needed) <- c("y", "x_dist", "x_ci")
row.names(df.needed) <- NULL
# Construct all Xs
df.needed$x_const <- 1
df.needed$x_interact <- df.needed$x_dist * df.needed$x_ci
Y <- df.needed$y
X <- as.matrix(df.needed[, -c(1)])
p <- 4
tf <- c(TRUE, FALSE)
models <- expand.grid(replicate(p,tf,simplify=FALSE))
names(models) <- NULL
models <- as.matrix(models)
models <- models[-dim(models)[1],]
models <- models[models[, 3] == TRUE, ]
## model selection
model_select <- function(model) {
# Ex. model = [T, F, F, T]
pz <- sum(model)
Xz <- X[, model]
g <- n <- length(Y)
inv_Xsqz <- solve(t(Xz) %*% Xz)
beta_z_ols <- t(inv_Xsqz %*% t(Xz) %*% Y)
sq2z_0 <- sum((Y - beta_z_ols %*% t(Xz))^2) / (length(Y) - dim(Xz)[2])
Hgz <- ( g/(g+1) ) * Xz %*% inv_Xsqz %*% t(Xz)
SSRgz <- t(Y) %*% (diag(1, nrow=n) - Hgz) %*% Y
- pz/2 * log(1+n) - log(sq2_0 + SSRgz) * (n+1)/2
}
lml.all=apply(models,1,model_select)
results=cbind(lml.all, models)
order=sort(results[,1],index=TRUE,decreasing=TRUE)
results[order$ix,]
df.results <- data.frame(results[order$ix,])
results[order$ix,1] <- results[order$ix,1] - results[order$ix[1],1]
results[order$ix,1] <- exp(results[order$ix,1])
nconstant <- sum(results[,1])
results[,1] <- results[,1]/nconstant
postprob <- results[order$ix,]
postprob
df.results$postr_z <- postprob[, 1]
df.results <- df.results[, c(2:5, 1, 6)]
names(df.results) <- c("x_dist", "x_ci", "x_const", "x_interact", "log_ml", "postr_z")
df.results <- df.results[, c(3, 1:2, 4:6)]
df.results[, 5:6] <- round(df.results[, 5:6], 2)
write.csv(df.results, file="model_selection.csv", row.names=FALSE)
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment