Skip to content

Instantly share code, notes, and snippets.

@masterial
Created November 25, 2023 05:25
Show Gist options
  • Save masterial/bddefb9fd1419eae5bd59f0c48cfacf3 to your computer and use it in GitHub Desktop.
Save masterial/bddefb9fd1419eae5bd59f0c48cfacf3 to your computer and use it in GitHub Desktop.
from bs4 import BeautifulSoup
# Your HTML snippet
html = """
<!DOCTYPE html>
<!-- (Your HTML here) -->
</html>
"""
# Parse HTML using BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
# Extract text from h2 elements with "id" attribute
extracted_text = ""
for h2_tag in soup.find_all('h2', {'id': True}):
extracted_text += h2_tag.get_text() + "\n"
# Process the extracted text (word count in this case)
word_count = len(extracted_text.split())
# Output the results
print("Extracted Text:")
print(extracted_text)
print("\nWord Count:", word_count)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment