Skip to content

Instantly share code, notes, and snippets.

@gabonator
Created November 3, 2023 09:15
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save gabonator/82756e173ef19e66508079731f6f7645 to your computer and use it in GitHub Desktop.
Save gabonator/82756e173ef19e66508079731f6f7645 to your computer and use it in GitHub Desktop.
Simple string filtering algorithm
# During data processing you will face a situation when you will
# need to filter out some data based on string comparison (e.g. process
# only file names which pass some condition). Things get more complicated
# if the condition is meant to be provided by user or as a CLI argument.
#
# Some of the popular methods are following:
# - wildcard matching (e.g. "*.jpg" which will return only jpeg files)
# - regex (e.g. ".*\.jpg")
# - code (e.g. filename.substr(-4) == ".jpg")
#
# Wildcard matching is easy and efficient, but not suitable for disjunction
# (e.g. match all image types), regex is very flexible but difficult to write,
# custom code can handle any situation but not possible to pass as CLI
# argument.
#
# With all these pros and cons in mind I have decided to design this
# simple yet flexible filtering algorithm. Filtering expression is
# composed of keywords, symbols and variables - you can use parentheses
# to form more complex expressions with logic operators (And, Or, Not)
# and anything else is considered as a variable.
# All variables will be substituted with True or False depending whether
# they are present in the tested string or not
#
# For example "sunny and not windy" will match all strings which contain
# word "sunny", but do not contain "windy". In genereal we can pass any
# python logic expression where all variables will turn True if their
# names is present in input string
def check(filter, tags):
test = []
filter = filter.replace("(", " ( ")
filter = filter.replace(")", " ) ")
filter = filter.replace(" ", " ")
current = tags.split(" ")
for token in filter.split(" "):
if token in ["", "True", "False", "or", "and", "not", "(", ")"]:
test.append(token)
elif token in current:
test.append("True")
else:
test.append("False")
return eval(" ".join(test))
# simple python expression matcher
print(check("day", "sunny day")) # True
print(check("night", "sunny day")) # False
print(check("sunny or cloudy", "sunny day")) # True
print(check("rainy or windy", "sunny day")) # False
print(check("sunny and not windy", "sunny day")) # True
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment