Skip to content

Instantly share code, notes, and snippets.

@mittenchops
Last active December 20, 2015 03:19
Show Gist options
  • Save mittenchops/6062848 to your computer and use it in GitHub Desktop.
Save mittenchops/6062848 to your computer and use it in GitHub Desktop.
Regexy thing for matching tags out of strings. You define a thing, ARTS, which is just a list of what you want to match, then follow syntax below to return either a dict of counts or a list of things with nonzero counts. I'm actually pretty pleased with the performance of this that it keeps working fairly quickly even as the string size approach…
import re
ARTS = ['dance','photography','art therapy']
def string_found(string1, string2):
if re.search(r"\b" + re.escape(string1) + r"\b", string2):
return 1
return 0
def string_count(string1, string2):
pos = 0
count = 0
while pos < len(string2):
v = re.search(r"\b" + re.escape(string1) + r"\b", string2[pos:])
if v:
count += 1
pos += v.end()
else:
break
return(count)
makecounts = lambda mystr, tags: {k: string_count(k,mystr.lower()) for k in tags}
gettags = lambda mydict : filter(None,[m for m,vv in mydict.items() if vv>0])
tagger = lambda mystr, tags : gettags(makecounts(mystr,tags))
x = "A wonderful string about art therapy, photography, and architecture!"
makecounts(x,ARTS)
{'dance': 0, 'photography': 1, 'art therapy': 1}
tagger(x, ARTS)
['photography', 'art therapy']
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment