Skip to content

Instantly share code, notes, and snippets.

@eugene87222
Last active May 7, 2020 12:51
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save eugene87222/8386433cc4596f18fe4e82a59267ea53 to your computer and use it in GitHub Desktop.
Save eugene87222/8386433cc4596f18fe4e82a59267ea53 to your computer and use it in GitHub Desktop.
20200507 ccca 爬蟲社課 Python demo
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'title': '【有獎徵答】HyRead 世界書香日電子書活動', 'content': '<p>活動時間:2020年4月20日至5月31日</br>活動對象:HyRead電子書大專院校與高中職讀者(含本校教職員生)</br>活動網址:<a href=\"https://hyread.cc/202004school\" target=\"_blank\" title=\"https://hyread.cc/202004school\">https://hyread.cc/202004school</a></br>活動內容:</br>為推廣HyRead電子書,特舉辦線上活動,並配合防疫主題,鼓勵讀者踴躍借閱。</br><ol><li>活動期間至所屬圖書館之HyRead電子書平台,借閱電子書或電子雜誌達指定冊數即可參加抽獎。</li><li>活動期間完成FB指定任務即可參加抽獎。</li></ol>活動獎項:</br><ol><li>Hyread Gaze Note 電子子閱讀器</li><li>Arlink 健康氣炸鍋</li><li>Seagate Backup Plus Slim 2TB 行動電源</li><li>羅技 K380+M350藍芽無線鍵鼠組</li><li>病毒在跳舞Virus桌遊</li><li>3M淨呼吸個人隨身型空氣FA-C20PT</li><li>全家禮物卡200元</li></ol>詳情請見活動網站。</p><a href=\"https://www.lib.nctu.edu.tw/news/do-event/cid-14/1\" target=\"_blank\" title=\"更多有獎徵答..\">更多有獎徵答..</a></br></br><a href=\"https://hyread.cc/202004school\" target=\"_blank\" title=\"\"><img src=\"https://www.lib.nctu.edu.tw/attach/download/id-6389\" class=\"img-responsive\" /></a>', 'eventcate': '有獎徵答', 'dateline': '2020-05-05 16:50:25', 'views': '30', 'img_url': 'https://www.lib.nctu.edu.tw/thumbnails/210/7f9080823d38c3e86bddec36fb2e9812.jpg', 'background_width': 358, 'background_height': 210}\n",
"<html>\n",
" <body>\n",
" <p>\n",
" 活動時間:2020年4月20日至5月31日活動對象:HyRead電子書大專院校與高中職讀者(含本校教職員生)活動網址:\n",
" <a href=\"https://hyread.cc/202004school\" target=\"_blank\" title=\"https://hyread.cc/202004school\">\n",
" https://hyread.cc/202004school\n",
" </a>\n",
" 活動內容:為推廣HyRead電子書,特舉辦線上活動,並配合防疫主題,鼓勵讀者踴躍借閱。\n",
" </p>\n",
" <ol>\n",
" <li>\n",
" 活動期間至所屬圖書館之HyRead電子書平台,借閱電子書或電子雜誌達指定冊數即可參加抽獎。\n",
" </li>\n",
" <li>\n",
" 活動期間完成FB指定任務即可參加抽獎。\n",
" </li>\n",
" </ol>\n",
" 活動獎項:\n",
" <ol>\n",
" <li>\n",
" Hyread Gaze Note 電子子閱讀器\n",
" </li>\n",
" <li>\n",
" Arlink 健康氣炸鍋\n",
" </li>\n",
" <li>\n",
" Seagate Backup Plus Slim 2TB 行動電源\n",
" </li>\n",
" <li>\n",
" 羅技 K380+M350藍芽無線鍵鼠組\n",
" </li>\n",
" <li>\n",
" 病毒在跳舞Virus桌遊\n",
" </li>\n",
" <li>\n",
" 3M淨呼吸個人隨身型空氣FA-C20PT\n",
" </li>\n",
" <li>\n",
" 全家禮物卡200元\n",
" </li>\n",
" </ol>\n",
" 詳情請見活動網站。\n",
" <a href=\"https://www.lib.nctu.edu.tw/news/do-event/cid-14/1\" target=\"_blank\" title=\"更多有獎徵答..\">\n",
" 更多有獎徵答..\n",
" </a>\n",
" <a href=\"https://hyread.cc/202004school\" target=\"_blank\" title=\"\">\n",
" <img class=\"img-responsive\" src=\"https://www.lib.nctu.edu.tw/attach/download/id-6389\"/>\n",
" </a>\n",
" </body>\n",
"</html>\n"
]
}
],
"source": [
"# 圖書館公告\n",
"import json\n",
"import requests\n",
"\n",
"url = 'https://www.lib.nctu.edu.tw/api/news/do-event/amount-20/offset-1'\n",
"switch_to_zh_TW = 'https://www.lib.nctu.edu.tw/api/switch-lang/zh-TW'\n",
"headers = {\n",
" 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:76.0) Gecko/20100101 Firefox/76.0'\n",
"}\n",
"\n",
"# 圖書館的比較特殊,因為需要發 request 改回傳的資料語言,所以要用 session\n",
"# 發 request 的對象不是原本的公告網址,是 api 網址,所以回來的東西是 json,可以直接 import json 來 parse\n",
"session = requests.session()\n",
"session.get(switch_to_zh_TW, headers=headers)\n",
"res = session.get(url)\n",
"json_data = json.loads(res.text)\n",
"newsid = json_data[0]['newsid']\n",
"url = f'https://www.lib.nctu.edu.tw/api/news/do-event/id-{newsid}'\n",
"\n",
"res = session.get(url, headers=headers)\n",
"post = json.loads(res.text)\n",
"print(post)\n",
"soup = BeautifulSoup(post['content'], 'lxml')\n",
"print(soup.prettify())"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.7.5 64-bit",
"language": "python",
"name": "python37564bit9c3aa344ed5e49b1b7f5168b1a98152a"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment