Skip to content

Instantly share code, notes, and snippets.

@xjxckk
Last active December 16, 2021 12:17
Show Gist options
  • Save xjxckk/597e9a4f2fef1eb61e9626eb01ffd414 to your computer and use it in GitHub Desktop.
Save xjxckk/597e9a4f2fef1eb61e9626eb01ffd414 to your computer and use it in GitHub Desktop.
Beautiful soup (Python)
from bs4 import BeautifulSoup
import requests
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
elements = soup.select('div.class')
element = soup.select_one('div#id')
print(element['href']) # Element link
print(element.text.strip()) # Element text
print(element.decode_contents().strip()) # Element inner HTML
print(element.contents) # List of nested elements
print(soup.body()) # Whole page
def get_text(selector):
element = soup.select_one(selector)
element = element.text.strip()
return element
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment