Skip to content

Instantly share code, notes, and snippets.

@mahmoud
Last active May 14, 2021
Embed
What would you like to do?
hashtag regex in python
import re
# the first group is noncapturing and just ensures we're at the beginning of
# the string or have whitespace before the hashtag (don't want to capture anchors)
# without the fullwidth hashmark, hashtags in asian languages would be tough
hashtag_re = re.compile("(?:^|\s)[##]{1}(\w+)", re.UNICODE)
import re
# similar to hashtag.py this regex finds username mentions, with a very permissive
# algorithm, suitable for MediaWiki/Wikipedia usernames, which can include
# unicode symbols and punctuation (almost anything but whitespace and a
# few punctuation marks)
mention_re = re.compile("(?:^|\s)[@ @]{1}([^\s#<>[\]|{}]+)", re.UNICODE)
@madogan

This comment has been minimized.

Copy link

@madogan madogan commented Jul 23, 2018

1
2

I tried but it did not work?

@luca-ucsc

This comment has been minimized.

Copy link

@luca-ucsc luca-ucsc commented Dec 13, 2019

@madogan well, if there are spaces between the # and the next word those aren't really hashtags.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment