Skip to content

Instantly share code, notes, and snippets.

@lanfon72
Last active August 29, 2015 14:04
Show Gist options
  • Save lanfon72/9ab25384500b0bb4f02b to your computer and use it in GitHub Desktop.
Save lanfon72/9ab25384500b0bb4f02b to your computer and use it in GitHub Desktop.
parse tsmh live ER status board
#!/usr/bin/env python
#coding:UTF-8
import requests, re, json, os
from datetime import datetime
os.environ['TZ'] = 'ROC'
html = requests.get('http://www.tsmh.org.tw/~webapp/b/web_dg/er_status/er_show_db.php')
html.encoding='big5'
pending = re.findall(u': (.*?) 人',html.text)
# parse like ['1', '0', '4', '0']
full_reported = re.findall(u'>(.+)119',html.text)[0]
update_time = re.findall(u'時間: (.+) ]</',html.text)[0]
values = [ int('0' + ele) for ele in pending ]
keys = ['pending_doctor','pending_bed', 'pending_ward', 'pending_icu']
report = { key:value for value, key in zip(values, keys) }
report["hospital_sn"] = '0943030019'
report['full_reported'] = False if u'未' in full_reported else True
report["update_time"] = datetime.strptime(update_time, '%Y/%m/%d %H:%M').strftime('%s')
print ( json.dumps(report, ensure_ascii=False) )
@viirya
Copy link

viirya commented Aug 11, 2014

There would be incorrect format such as '等待住院人數: 人' on the html page. Based on your codes, slightly modified as:

#!/usr/bin/env python
#coding:UTF-8
import requests, re, json, os
from datetime import datetime
os.environ['TZ'] = 'ROC'
html = requests.get('http://www.tsmh.org.tw/~webapp/b/web_dg/er_status/er_show_db.php')
html.encoding='big5'

pending = re.findall(u': (.*?) 人',html.text)
# parse like ['1', '0', '4', '0']
full_reported = re.findall(u'>(.+)119',html.text)[0]
update_time = re.findall(u'時間: (.+) ]</',html.text)[0]

values = [ int('0' + ele) for ele in pending ]
keys = ['pending_doctor','pending_bed', 'pending_ward', 'pending_icu']

report = { key:value for value, key in zip(values, keys) }
report["Hosptial_sn"] = '943030019'
report['full_reported'] = False if u'未' in full_reported else True
report["update_time"] = datetime.strptime(update_time, '%Y/%m/%d %H:%M').strftime('%s')

print ( json.dumps(report, ensure_ascii=False) )

@lanfon72
Copy link
Author

oh, thx for fix it.

@viirya
Copy link

viirya commented Aug 12, 2014

Another fix in previous comment that you may not notice is re.findall(u': (.+?) 人',html.text) is modified to re.findall(u': (.*?) 人',html.text). (.+?) would match to "人" in next line when the number is null. Replace it with (.*?) can actually catch the null correctly. Thanks.

@lanfon72
Copy link
Author

sorry for that...QQ
and actually, I'm Taiwanese, 你可以說中文XD

(抱歉有點晚回...因為 gist 不會有提醒 囧>)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment