Last active
December 20, 2015 03:19
-
-
Save mittenchops/6062848 to your computer and use it in GitHub Desktop.
Regexy thing for matching tags out of strings. You define a thing, ARTS, which is just a list of what you want to match, then follow syntax below to return either a dict of counts or a list of things with nonzero counts. I'm actually pretty pleased with the performance of this that it keeps working fairly quickly even as the string size approach…
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import re | |
ARTS = ['dance','photography','art therapy'] | |
def string_found(string1, string2): | |
if re.search(r"\b" + re.escape(string1) + r"\b", string2): | |
return 1 | |
return 0 | |
def string_count(string1, string2): | |
pos = 0 | |
count = 0 | |
while pos < len(string2): | |
v = re.search(r"\b" + re.escape(string1) + r"\b", string2[pos:]) | |
if v: | |
count += 1 | |
pos += v.end() | |
else: | |
break | |
return(count) | |
makecounts = lambda mystr, tags: {k: string_count(k,mystr.lower()) for k in tags} | |
gettags = lambda mydict : filter(None,[m for m,vv in mydict.items() if vv>0]) | |
tagger = lambda mystr, tags : gettags(makecounts(mystr,tags)) | |
x = "A wonderful string about art therapy, photography, and architecture!" | |
makecounts(x,ARTS) | |
{'dance': 0, 'photography': 1, 'art therapy': 1} | |
tagger(x, ARTS) | |
['photography', 'art therapy'] |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment