Skip to content

Instantly share code, notes, and snippets.

@blu3r4y
Created May 13, 2021 10:54
Show Gist options
  • Save blu3r4y/4d064a9d7e02b90ae5aebadf6c05bb1c to your computer and use it in GitHub Desktop.
Save blu3r4y/4d064a9d7e02b90ae5aebadf6c05bb1c to your computer and use it in GitHub Desktop.
Convert the locale of WhatsApp exports from DE to EN
import re, sys
REGEX_ATTACHMENT = {"de": r"\u200E(.*?) \(Datei angehängt\)"}
SUB_ATTACHMENT = r"<attached: \g<1>>"
def convert(file: str, locale="de"):
"""
Convert the locale of attachments to the english one.
The output is written to a file in the same location.
:param file: The input file name that ends in .txt
:param locale: The locale of the input file, defaults to "de"
"""
output = re.sub(r"\.txt$", f".en.txt", file)
pattern = re.compile(REGEX_ATTACHMENT[locale])
with open(file, "r", encoding="utf-8") as fread:
with open(output, "w", encoding="utf-8") as fwrit:
for line in fread:
result = pattern.sub(SUB_ATTACHMENT, line.strip())
fwrit.write(result + "\n")
print(f"saved converted version to '{output}'")
if __name__ == "__main__":
if len(sys.argv) == 1:
print("please specify the file name as an argument")
exit(1)
convert(sys.argv[1])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment