Skip to content

Instantly share code, notes, and snippets.

Last active May 14, 2021 19:13
  • Star 15 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save mahmoud/237eb20108b5805aed5f to your computer and use it in GitHub Desktop.
hashtag regex in python
import re
# the first group is noncapturing and just ensures we're at the beginning of
# the string or have whitespace before the hashtag (don't want to capture anchors)
# without the fullwidth hashmark, hashtags in asian languages would be tough
hashtag_re = re.compile("(?:^|\s)[##]{1}(\w+)", re.UNICODE)
import re
# similar to this regex finds username mentions, with a very permissive
# algorithm, suitable for MediaWiki/Wikipedia usernames, which can include
# unicode symbols and punctuation (almost anything but whitespace and a
# few punctuation marks)
mention_re = re.compile("(?:^|\s)[@ @]{1}([^\s#<>[\]|{}]+)", re.UNICODE)
Copy link

@madogan well, if there are spaces between the # and the next word those aren't really hashtags.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment