Skip to content

Instantly share code, notes, and snippets.

@lambdadog
Last active November 28, 2019 04:11
Show Gist options
  • Save lambdadog/8916bfa5248e53186448a2b5a416577e to your computer and use it in GitHub Desktop.
Save lambdadog/8916bfa5248e53186448a2b5a416577e to your computer and use it in GitHub Desktop.
Python HTMLParser example for @Cesese
# This is free and unencumbered software released into the public domain.
#
# Anyone is free to copy, modify, publish, use, compile, sell, or
# distribute this software, either in source code form or as a compiled
# binary, for any purpose, commercial or non-commercial, and by any
# means.
#
# In jurisdictions that recognize copyright laws, the author or authors
# of this software dedicate any and all copyright interest in the
# software to the public domain. We make this dedication for the benefit
# of the public at large and to the detriment of our heirs and
# successors. We intend this dedication to be an overt act of
# relinquishment in perpetuity of all present and future rights to this
# software under copyright law.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
# IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
# OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
# ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
# OTHER DEALINGS IN THE SOFTWARE.
#
# For more information, please refer to <http://unlicense.org/>
from html.parser import HTMLParser
class PostParser(HTMLParser):
""" Parser for mastodon posts """
# the text-only content of the post
postContent = ""
# accessor
def get_result_destructively(self):
postContent = self.postContent
self.postContent = ""
return postContent
def handle_starttag(self, tag, attrs):
pass
def handle_data(self, data):
self.postContent += data
def handle_endtag(self, tag):
pass
parser = PostParser()
parser.feed('<span class="h-card"><a class="u-url mention" href="https://niu.moe/@cesese" rel="nofollow noopener" target="_blank">@<span>cesese</span></a></span> what part are you having trouble with?')
print(parser.get_result_destructively())
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment