Skip to content

Instantly share code, notes, and snippets.

@Abuton
Last active December 28, 2021 20:01
Show Gist options
  • Save Abuton/031f81f1548f47c696fff5404101f1b4 to your computer and use it in GitHub Desktop.
Save Abuton/031f81f1548f47c696fff5404101f1b4 to your computer and use it in GitHub Desktop.
Remove system generated messages like "added", "removed", "left", "joined using this", "message was deleted"
def remove_system_generated_msgs(uploaded_file: str) -> list:
"""Remove system generated messages like::
1. +234 was added
2. +234 left etc
uploaded_file: str: path of dataset
Returns:
--------
list
"""
chats = pd.read_csv(uploaded_file, sep='\n', header=None, error_bad_lines=False,
converters={h:str for h in header})[0].tolist()
chat_msgs = []
system_generated_msgs = ["added", "removed", "left", "joined using this", "message was deleted"]
for chat in chats:
if chat not in system_generated_msgs:
chat_msgs.append(chat)
return chat_msgs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment