Last active
June 14, 2022 11:34
-
-
Save Purrrpley/1b5d9ff9033144f96b958463270263b3 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def despamify(text: str, threshold=5) -> tuple[bool, str]: | |
last_char = "" | |
repeat_count = 0 | |
despammed = False | |
output = "" | |
for char in text: | |
if char == last_char: | |
repeat_count += 1 | |
else: | |
repeat_count = 0 | |
if repeat_count < threshold: | |
output += char | |
else: | |
despammed = True | |
last_char = char | |
return despammed, output |
If you really want to use regular expressions though, you can, with something like this:
re.sub(r'(.)\1{5,}', r'\1\1\1\1\1', 'aaaaaabaaaaaa') # 'aaaaabaaaaa'
may or may not have spent like 10 minutes debugging to save a few seconds of reading the documentation while making that... :D
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Some examples: