Skip to content

Instantly share code, notes, and snippets.

Last active May 14, 2021
What would you like to do?
hashtag regex in python
import re
# the first group is noncapturing and just ensures we're at the beginning of
# the string or have whitespace before the hashtag (don't want to capture anchors)
# without the fullwidth hashmark, hashtags in asian languages would be tough
hashtag_re = re.compile("(?:^|\s)[##]{1}(\w+)", re.UNICODE)
import re
# similar to this regex finds username mentions, with a very permissive
# algorithm, suitable for MediaWiki/Wikipedia usernames, which can include
# unicode symbols and punctuation (almost anything but whitespace and a
# few punctuation marks)
mention_re = re.compile("(?:^|\s)[@ @]{1}([^\s#<>[\]|{}]+)", re.UNICODE)
Copy link

luca-ucsc commented Dec 13, 2019

@madogan well, if there are spaces between the # and the next word those aren't really hashtags.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment