Created
October 13, 2017 20:05
-
-
Save anonymous/e1c4f9184af9d6121d86233d86a4cac8 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Start parsing [ Gossiping ].... | |
https://www.ptt.cc/bbs/Gossiping/index25665.html | |
批踢踢實業坊 | |
本網站已依網站內容分級規定處理 | |
警告︰您即將進入之看板內容需滿十八歲方可瀏覽。 | |
若您尚未年滿十八歲,請點選離開。若您已滿十八歲,亦不可將本區之內容派發、傳閱、出售、出租、交給或借予年齡未滿18歲的人士瀏覽,或將本網站內容向該人士出示、播放或放映。 | |
我同意,我已年滿十八歲進入 | |
未滿十八歲或不同意本條款離開 | |
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ | |
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), | |
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) | |
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga'); | |
ga('create', 'UA-32365737-1', { | |
cookieDomain: 'ptt.cc', | |
legacyCookieDomain: 'ptt.cc' | |
}); | |
ga('send', 'pageview'); | |
https://www.ptt.cc/bbs/Gossiping/index25664.html | |
批踢踢實業坊 | |
本網站已依網站內容分級規定處理 | |
警告︰您即將進入之看板內容需滿十八歲方可瀏覽。 | |
若您尚未年滿十八歲,請點選離開。若您已滿十八歲,亦不可將本區之內容派發、傳閱、出售、出租、交給或借予年齡未滿18歲的人士瀏覽,或將本網站內容向該人士出示、播放或放映。 | |
我同意,我已年滿十八歲進入 | |
未滿十八歲或不同意本條款離開 | |
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ | |
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), | |
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) | |
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga'); | |
ga('create', 'UA-32365737-1', { | |
cookieDomain: 'ptt.cc', | |
legacyCookieDomain: 'ptt.cc' | |
}); | |
ga('send', 'pageview'); | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# -*- coding: utf-8 -* | |
from selenium import webdriver | |
from bs4 import BeautifulSoup | |
driver = webdriver.PhantomJS() | |
driver.get("https://www.ptt.cc/bbs/Gossiping/index25664.html") | |
soup = BeautifulSoup(driver.page_source, "lxml") | |
print soup.text | |
for article in soup.select('.r-list-container .r-ent .title a'): | |
title = (article.string) | |
print(title) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment