Skip to content

Instantly share code, notes, and snippets.

Created February 18, 2017 16:34
Show Gist options
  • Save anonymous/01d0536745d9814815081aa03d547bc7 to your computer and use it in GitHub Desktop.
Save anonymous/01d0536745d9814815081aa03d547bc7 to your computer and use it in GitHub Desktop.
#! /usr/bin/python
# -*- coding: utf-8 -*-
import re
import urllib2
import csv
import pandas as pd
def GetHtmlcode(ID):
# Get the webpage's source html code
source = 'http://goodinfo.tw/StockInfo/StockDetail.asp?STOCK_ID='
url = source + ID
#print url
# Header
headers = { 'User-Agent' : 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36',
'Accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Charset' : 'Big5,utf-8;q=0.7,*;q=0.3',
#'Accept-Encoding' : 'gzip,deflate,sdch',
'Accept-Language' : 'zh-TW,zh;q=0.8,en-US;q=0.6,en;q=0.4,ja;q=0.2' ,
'Cache-Control' : 'max-age=0',
'Connection' : 'keep-alive',
'Cookie' : '427 bytes were stripped',
'Host' : 'www.goodinfo.tw',
'Referer' : url }
# 連到網頁抓取資料
req= urllib2.Request(url,"",headers)
response = urllib2.urlopen(req)
result = response.read().decode('utf-8')
#print result
return result
def main():
page = GetHtmlcode('2103')
df=pd.read_html(page)
df2=pd.DataFrame(df[41])
print df2
if __name__ == "__main__":
main()
0 獲 利 狀 況      (/) NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN NaN
2 年/季 營收(億) 稅後淨利(億) 毛利(%) 營益(%) 稅後淨利(%) ROE(%) EPS(元)
3 16Q3(累季) 198 9.31 15.8 7.77 5.25 7.87(年估) 1.13
4 2015 260 5.29 13 5.38 2.31 3.32 0.64
5 2014 319 11.4 12.4 6.09 3.9 6.97 1.38
6 2013 344 15 12.4 6 4.98 10.3 1.9
7 2012 171 25.7 13.6 7.14 15.1 15.6 3.27
8 2011 206 57.4 21.9 16.2 27.9 38.1 8.03
9 2010 153 32.8 17.2 11.4 21.5 26.1 5.05
10 2009 103 23.3 20.7 14.5 22.7 19.4 3.59
11 2008 174 27.7 18.7 13.9 15.9 23.4 4.26
12 2007 132 32.6 17.3 10.7 24.7 29.5 5.02
13 2006 116 20.4 14 8 17.6 20.8 3.14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment