Skip to content

Instantly share code, notes, and snippets.

@seozed
Last active May 11, 2020 06:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save seozed/d7653feb1fa3d235b70cb8b7e368236f to your computer and use it in GitHub Desktop.
Save seozed/d7653feb1fa3d235b70cb8b7e368236f to your computer and use it in GitHub Desktop.
优雅的过滤HTML
from w3lib.html import remove_tags, strip_html5_whitespace
# keep参数为需要保留的标签名称
remove_tags(text, keep=('img',))
# 移除HTML标签,并删除前后的空白字符
def clean_tags(text, which_ones=(), keep=(), encoding=None) -> str:
if not text:
return None
content = remove_tags(text, which_ones, keep, encoding)
content = remove_tags(content)
content = strip_html5_whitespace(content)
return content
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment