Skip to content

Instantly share code, notes, and snippets.

View williambrach's full-sized avatar
🏴‍☠️

William Brach williambrach

🏴‍☠️
View GitHub Profile
@williambrach
williambrach / example.py
Created October 15, 2025 11:12
cleaning html example
# %%
from bs4 import BeautifulSoup
from markitdown import MarkItDown
from readability import Document
from trafilatura import extract
# %%
with open("page.html") as f:
html_content = f.read()