Created
July 11, 2023 14:06
-
-
Save queencitycyber/89e5c16750d91eda8f8a32d39607cfc4 to your computer and use it in GitHub Desktop.
URL -> Markdown
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
### Turn HTML page into Markdown (.md) | |
import requests | |
import html2text | |
def download_html(url): | |
response = requests.get(url) | |
return response.text | |
def convert_to_markdown(html): | |
converter = html2text.HTML2Text() | |
converter.body_width = 0 # Disable line wrapping | |
markdown = converter.handle(html) | |
return markdown | |
# Example usage | |
url = "YOUR URL" | |
html = download_html(url) | |
markdown = convert_to_markdown(html) | |
print(markdown) | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment