Created
November 3, 2023 09:15
-
-
Save gabonator/82756e173ef19e66508079731f6f7645 to your computer and use it in GitHub Desktop.
Simple string filtering algorithm
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# During data processing you will face a situation when you will | |
# need to filter out some data based on string comparison (e.g. process | |
# only file names which pass some condition). Things get more complicated | |
# if the condition is meant to be provided by user or as a CLI argument. | |
# | |
# Some of the popular methods are following: | |
# - wildcard matching (e.g. "*.jpg" which will return only jpeg files) | |
# - regex (e.g. ".*\.jpg") | |
# - code (e.g. filename.substr(-4) == ".jpg") | |
# | |
# Wildcard matching is easy and efficient, but not suitable for disjunction | |
# (e.g. match all image types), regex is very flexible but difficult to write, | |
# custom code can handle any situation but not possible to pass as CLI | |
# argument. | |
# | |
# With all these pros and cons in mind I have decided to design this | |
# simple yet flexible filtering algorithm. Filtering expression is | |
# composed of keywords, symbols and variables - you can use parentheses | |
# to form more complex expressions with logic operators (And, Or, Not) | |
# and anything else is considered as a variable. | |
# All variables will be substituted with True or False depending whether | |
# they are present in the tested string or not | |
# | |
# For example "sunny and not windy" will match all strings which contain | |
# word "sunny", but do not contain "windy". In genereal we can pass any | |
# python logic expression where all variables will turn True if their | |
# names is present in input string | |
def check(filter, tags): | |
test = [] | |
filter = filter.replace("(", " ( ") | |
filter = filter.replace(")", " ) ") | |
filter = filter.replace(" ", " ") | |
current = tags.split(" ") | |
for token in filter.split(" "): | |
if token in ["", "True", "False", "or", "and", "not", "(", ")"]: | |
test.append(token) | |
elif token in current: | |
test.append("True") | |
else: | |
test.append("False") | |
return eval(" ".join(test)) | |
# simple python expression matcher | |
print(check("day", "sunny day")) # True | |
print(check("night", "sunny day")) # False | |
print(check("sunny or cloudy", "sunny day")) # True | |
print(check("rainy or windy", "sunny day")) # False | |
print(check("sunny and not windy", "sunny day")) # True |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment