Last active May 14, 2021 19:13
hashtag regex in python
import re
# the first group is noncapturing and just ensures we're at the beginning of
# the string or have whitespace before the hashtag (don't want to capture anchors)
# without the fullwidth hashmark, hashtags in asian languages would be tough
hashtag_re = re.compile("(?:^|\s)[##]{1}(\w+)", re.UNICODE)
import re
# similar to this regex finds username mentions, with a very permissive
# algorithm, suitable for MediaWiki/Wikipedia usernames, which can include
# unicode symbols and punctuation (almost anything but whitespace and a
# few punctuation marks)
mention_re = re.compile("(?:^|\s)[@ @]{1}([^\s#<>[\]|{}]+)", re.UNICODE)
@madogan well, if there are spaces between the # and the next word those aren't really hashtags.

