Created
April 28, 2024 08:36
-
-
Save zh3389/3ddfddd37a80c9d9502365c4deeec6d7 to your computer and use it in GitHub Desktop.
网页内容自动化抓取利用BeautifulSoup和requests库,编写Python爬虫获取所需网页信息。
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import requests | |
from bs4 import BeautifulSoup | |
def fetch_web_content(url): | |
response = requests.get(url) | |
if response.status_code == 200: | |
soup = BeautifulSoup(response.text, 'html.parser') | |
# 示例提取页面标题 | |
title = soup.find('title').text | |
return title | |
else: | |
return "无法获取网页内容" | |
# 使用示例: | |
url = 'https://example.com' | |
web_title = fetch_web_content(url) | |
print("网页标题:", web_title) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment