Skip to content

Instantly share code, notes, and snippets.

@ccjeng
Created August 28, 2016 09:37
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ccjeng/a73ad461f266fd0592a1d0d40817dfbc to your computer and use it in GitHub Desktop.
Save ccjeng/a73ad461f266fd0592a1d0d40817dfbc to your computer and use it in GitHub Desktop.
台北市圖書館當月新書 http://book.tpml.edu.tw/webpac/webpacIndex.jsp
import requests
from bs4 import BeautifulSoup
urlRoot = 'http://book.tpml.edu.tw/webpac/'
dateRange = '2016-08-01%2C2016-08-31'
rows = '1000'
link = urlRoot + 'newbooknotftdList.do?sortfield=MarcInsertDate&sorttype=1&showtuple=' + rows + '&newbookdatebet='+ dateRange +'&pubyear=all&language=chi&classtype=all&classno=all&classno10=all&keepsiteid=all&type=&categoraycode=-1&bookstatus=all&collection=all&viewmode=0'
res = requests.get(link)
soup = BeautifulSoup(res.text.encode('utf-8'), 'html.parser')
books = soup.find_all('h4')
with open("books.txt", "w") as outfile:
for book in books:
name = book.a.contents[0]
link = urlRoot + book.a['href']
outfile.write(name + ' ' + link + '\n')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment