Skip to content

Instantly share code, notes, and snippets.

@lorey
Last active July 18, 2024 05:53
Show Gist options
  • Save lorey/eb15a7f3338f959a78cc3661fbc255fe to your computer and use it in GitHub Desktop.
Save lorey/eb15a7f3338f959a78cc3661fbc255fe to your computer and use it in GitHub Desktop.
Markdown to Plaintext in Python
from bs4 import BeautifulSoup
from markdown import markdown
import re
def markdown_to_text(markdown_string):
""" Converts a markdown string to plaintext """
# md -> html -> text since BeautifulSoup can extract text cleanly
html = markdown(markdown_string)
# remove code snippets
html = re.sub(r'<pre>(.*?)</pre>', ' ', html)
html = re.sub(r'<code>(.*?)</code >', ' ', html)
# extract text
soup = BeautifulSoup(html, "html.parser")
text = ''.join(soup.findAll(text=True))
return text
@hemikak
Copy link

hemikak commented Aug 19, 2018

 html = re.sub(r'<code>(.*?)</code>', ' ', html)

@rmln
Copy link

rmln commented Nov 16, 2022

import re

@lorey
Copy link
Author

lorey commented Nov 16, 2022

@rmln great catch, thanks. Added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment