miyagawa/gist:52e8422175f25d982fd9 Secret

## gistfile1.txt
We have a list of keywords that we want to match against series of text.
We use CPAN module Regexp::Trie to compile it into a regular expression.
For instance, if we have keywords "mac", "apple" and "android", we'll get

  $re = qr/(?-xism:(?:a(?:ndroid|pple)|mac))/;

Because when we have "mac" in the keyword, we don't want to match against
"machines", so when we actually match against a text we use \b the zero-width
word boundary class, which works well with ASCII word chars at least.

  @match = $text =~ /\b($re)\b/g;

So far, so good.

However when we have punctuation characters at the beginning or end of the
keyword we add to the TRIE itself, like "#android", this \b gets in the way,
since '#' is not a word character.

  my $rt = Regexp::Trie->new;
  $rt->add($_) for qw( #android #apple mac );
  $re = $rt->regexp; # qr/(?-xism:(?:\#a(?:ndroid|pple)|mac))/
  $text = "I love #android";
  my @tags = $text =~ /\b($re)\b/g; # @tags is empty

The result is that it doesn't really match those hash tags that appear after
a whitespace (most of the time) or the beginning of the text.

I'm trying to work around this by changing the matcher to this:

  @match = $text =~ /(?:^|\b|\s)($re)(?:$|\b|\s)/g;

as ugly as it looks, it seems to work. Any suggestions to make it look & work better?
	We have a list of keywords that we want to match against series of text.
	We use CPAN module Regexp::Trie to compile it into a regular expression.
	For instance, if we have keywords "mac", "apple" and "android", we'll get

	$re = qr/(?-xism:(?:a(?:ndroid\|pple)\|mac))/;

	Because when we have "mac" in the keyword, we don't want to match against
	"machines", so when we actually match against a text we use \b the zero-width
	word boundary class, which works well with ASCII word chars at least.

	@match = $text =~ /\b($re)\b/g;

	So far, so good.

	However when we have punctuation characters at the beginning or end of the
	keyword we add to the TRIE itself, like "#android", this \b gets in the way,
	since '#' is not a word character.

	my $rt = Regexp::Trie->new;
	$rt->add($_) for qw( #android #apple mac );
	$re = $rt->regexp; # qr/(?-xism:(?:\#a(?:ndroid\|pple)\|mac))/
	$text = "I love #android";
	my @tags = $text =~ /\b($re)\b/g; # @tags is empty

	The result is that it doesn't really match those hash tags that appear after
	a whitespace (most of the time) or the beginning of the text.

	I'm trying to work around this by changing the matcher to this:

	@match = $text =~ /(?:^\|\b\|\s)($re)(?:$\|\b\|\s)/g;

	as ugly as it looks, it seems to work. Any suggestions to make it look & work better?