Created
January 31, 2019 12:24
-
-
Save 01x01/a2fa7df89218f9e16e13785a59cbc42c to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#coding:utf-8 | |
import requests | |
from bs4 import BeautifulSoup | |
res = requests.get(url) | |
html = res.text | |
# 初始化,指定解析器,默认是html.parser | |
soup = BeautifulSoup(html,"html.parser") | |
# find_all / find | |
soup.find_all("p", "title") # [<p class="title"><b>The Dormouse's story</b></p>] | |
css_soup.find_all("p", class_="body strikeout") # [<p class="body strikeout"></p>] | |
data_soup.find_all(attrs={"data-foo": "value"}) # [<div data-foo="value">foo!</div>] | |
#---text 参数--- | |
# 通过 text 参数可以搜搜文档中的字符串内容.与 name 参数的可选值一样, text 参数接受 字符串 , 正则表达式 , 列表, True . | |
# ---limit 参数--- | |
# find_all() 方法返回全部的搜索结构,如果文档树很大那么搜索会很慢.如果我们不需要全部结果, | |
# 可以使用 limit 参数限制返回结果的数量.效果与SQL中的limit关键字类似,当搜索到的结果数量达到 limit 的限制时,就停止搜索返回结果. | |
# ---recursive参数--- | |
# 如果只想搜索tag的直接子节点,可以使用参数 recursive=False |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment