Skip to content

Instantly share code, notes, and snippets.

@aetherwu
Created January 10, 2018 01:17
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save aetherwu/b9894721cc2a23167e7b0905ba3d8fdc to your computer and use it in GitHub Desktop.
Save aetherwu/b9894721cc2a23167e7b0905ba3d8fdc to your computer and use it in GitHub Desktop.
V2EX post filter
#!/usr/bin/python
#coding:UTF-8
import urllib2, re
from bs4 import BeautifulSoup
site_url = 'https://www.v2ex.com'
headers = { 'User-Agent' : 'Mozilla/5.0' }
for page_number in range(1, 10):
link = 'https://www.v2ex.com/go/cv?p=%s' % page_number
req = urllib2.Request(link, None, headers)
page = urllib2.urlopen(req).read()
soup = BeautifulSoup(page, 'html.parser')
soup.prettify()
list = soup.find_all("div", {"id": "TopicsNode"})
for tag in list:
tdTags = tag.find_all("span", {"class": "item_title"})
for tag in tdTags:
if re.search(u'上海|android', tag.text, re.IGNORECASE):
print "%s - %s%s" % (tag.text, site_url , tag.find('a')['href'])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment