Skip to content

Instantly share code, notes, and snippets.

@angus-lherrou
Forked from slowkow/remove-emoji.py
Last active August 16, 2020 00:45
Show Gist options
  • Save angus-lherrou/83e509acca490fea28018f2ba8b8e3b6 to your computer and use it in GitHub Desktop.
Save angus-lherrou/83e509acca490fea28018f2ba8b8e3b6 to your computer and use it in GitHub Desktop.
Remove all traces of emoji from a text file.
#!/usr/bin/env python
"""
Remove emoji from a text file and print it to stdout.
Note: doesn't catch certain things that are counted as emoji by e.g. Facebook, like 2⃣.
Usage
-----
python remove-emoji.py input.txt > output.txt
"""
import re
import sys
# https://stackoverflow.com/a/49146722/330558
def remove_emoji(string):
emoji_pattern = re.compile("["
u"\U0001F600-\U0001F64F" # emoticons
u"\U0001F300-\U0001F5FF" # symbols & pictographs
u"\U0001F680-\U0001F6FF" # transport & map symbols
u"\U0001F1E0-\U0001F1FF" # flags (iOS)
u"\U00002500-\U00002BEF" # chinese char
u"\U00002702-\U000027B0"
u"\U00002702-\U000027B0"
u"\U000024C2-\U0001F251"
u"\U0001f926-\U0001f937"
u"\U00010000-\U0010ffff"
u"\u2640-\u2642"
u"\u2600-\u2B55"
u"\u200d"
u"\u23cf"
u"\u23e9"
u"\u231a"
u"\ufe0f" # dingbats
u"\u3030"
"]+", flags=re.UNICODE)
stripped = emoji_pattern.sub(r'', string)
return re.sub(r'([\w\d])\s+(\W)', r'\1\2', stripped)
if __name__ == '__main__':
text = open(sys.argv[1]).read()
text = remove_emoji(text)
print(text)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment