Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Remove all traces of emoji from a text file.
#!/usr/bin/env python
"""
Remove emoji from a text file and print it to stdout.
Usage
-----
python remove-emoji.py input.txt > output.txt
"""
import re
import sys
# https://stackoverflow.com/a/49146722/330558
def remove_emoji(string):
emoji_pattern = re.compile("["
u"\U0001F600-\U0001F64F" # emoticons
u"\U0001F300-\U0001F5FF" # symbols & pictographs
u"\U0001F680-\U0001F6FF" # transport & map symbols
u"\U0001F1E0-\U0001F1FF" # flags (iOS)
u"\U00002702-\U000027B0"
u"\U000024C2-\U0001F251"
"]+", flags=re.UNICODE)
return emoji_pattern.sub(r'', string)
if __name__ == '__main__':
text = open(sys.argv[1]).read()
text = remove_emoji(text)
print(text)
@Demonstrandum

This comment has been minimized.

Copy link

@Demonstrandum Demonstrandum commented Feb 25, 2020

do you hate fun?

@joshuaxex

This comment has been minimized.

Copy link

@joshuaxex joshuaxex commented Mar 6, 2020

very useful!! thank you

@saaranshM

This comment has been minimized.

Copy link

@saaranshM saaranshM commented May 23, 2020

Here is the updated one:

def remove_emoji(string):
    emoji_pattern = re.compile("["
                               u"\U0001F600-\U0001F64F"  # emoticons
                               u"\U0001F300-\U0001F5FF"  # symbols & pictographs
                               u"\U0001F680-\U0001F6FF"  # transport & map symbols
                               u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
                               u"\U00002500-\U00002BEF"  # chinese char
                               u"\U00002702-\U000027B0"
                               u"\U00002702-\U000027B0"
                               u"\U000024C2-\U0001F251"
                               u"\U0001f926-\U0001f937"
                               u"\U00010000-\U0010ffff"
                               u"\u2640-\u2642"
                               u"\u2600-\u2B55"
                               u"\u200d"
                               u"\u23cf"
                               u"\u23e9"
                               u"\u231a"
                               u"\ufe0f"  # dingbats
                               u"\u3030"
                               "]+", flags=re.UNICODE)
    return emoji_pattern.sub(r'', string)
@VpkPrasanna

This comment has been minimized.

Copy link

@VpkPrasanna VpkPrasanna commented Jun 2, 2020

Thanks a lot providing the function .it helps me a lot and it saved a lot of time
Thanks

@aderounmu

This comment has been minimized.

Copy link

@aderounmu aderounmu commented Jun 19, 2020

Thanks alot this help very much

@nmschorr

This comment has been minimized.

Copy link

@nmschorr nmschorr commented Jun 30, 2020

Here is the updated one:

def remove_emoji(string):
    emoji_pattern = re.compile("["
                               u"\U0001F600-\U0001F64F"  # emoticons
                               u"\U0001F300-\U0001F5FF"  # symbols & pictographs
                               u"\U0001F680-\U0001F6FF"  # transport & map symbols
                               u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
                               u"\U00002500-\U00002BEF"  # chinese char
                               u"\U00002702-\U000027B0"
                               u"\U00002702-\U000027B0"
                               u"\U000024C2-\U0001F251"
                               u"\U0001f926-\U0001f937"
                               u"\U00010000-\U0010ffff"
                               u"\u2640-\u2642"
                               u"\u2600-\u2B55"
                               u"\u200d"
                               u"\u23cf"
                               u"\u23e9"
                               u"\u231a"
                               u"\ufe0f"  # dingbats
                               u"\u3030"
                               "]+", flags=re.UNICODE)
    return emoji_pattern.sub(r'', string)
@ezequias

This comment has been minimized.

Copy link

@ezequias ezequias commented Jul 11, 2020

Dear @nmschorr

Some emojis your code didn't catch (could you :








#⃣

@wsebastiangroves

This comment has been minimized.

Copy link

@wsebastiangroves wsebastiangroves commented Aug 26, 2020

Here is the updated one:

def remove_emoji(string):
    emoji_pattern = re.compile("["
                               u"\U0001F600-\U0001F64F"  # emoticons
                               u"\U0001F300-\U0001F5FF"  # symbols & pictographs
                               u"\U0001F680-\U0001F6FF"  # transport & map symbols
                               u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
                               u"\U00002500-\U00002BEF"  # chinese char
                               u"\U00002702-\U000027B0"
                               u"\U00002702-\U000027B0"
                               u"\U000024C2-\U0001F251"
                               u"\U0001f926-\U0001f937"
                               u"\U00010000-\U0010ffff"
                               u"\u2640-\u2642"
                               u"\u2600-\u2B55"
                               u"\u200d"
                               u"\u23cf"
                               u"\u23e9"
                               u"\u231a"
                               u"\ufe0f"  # dingbats
                               u"\u3030"
                               "]+", flags=re.UNICODE)
    return emoji_pattern.sub(r'', string)

Thank you!!

@AmbiTyga

This comment has been minimized.

Copy link

@AmbiTyga AmbiTyga commented Sep 9, 2020

Use this link to get the unicode of every emojis

@sathya018

This comment has been minimized.

Copy link

@sathya018 sathya018 commented Nov 10, 2020

useful!!

@parigulak

This comment has been minimized.

Copy link

@parigulak parigulak commented Dec 12, 2020

what i am doing wrong?
import re
import csv,os,sys
def demoji(text):
emoji_pattern = re.compile("["
u"\U0001F600-\U0001F64F" # emoticons
u"\U0001F300-\U0001F5FF" # symbols & pictographs
u"\U0001F680-\U0001F6FF" # transport & map symbols
u"\U0001F1E0-\U0001F1FF" # flags (iOS)
u"\U00002702-\U000027B0"
u"\U000024C2-\U0001F251"
u"\U00010000-\U0010ffff"
"]+", flags=re.UNICODE)
return(emoji_pattern.sub(r'', text))

data = pd.read_csv('/content/englishcomments.csv',encoding='utf-8', sep='\t') # read tsv fil
data['header'] = data['header'].astype(str)
data['header'] = data['header'].apply(lambda x:demoji(x))
data.to_csv('output.csv',index=False, encoding='utf-8')

@parigulak

This comment has been minimized.

Copy link

@parigulak parigulak commented Dec 12, 2020

getting this error
The above exception was the direct cause of the following exception:

KeyError Traceback (most recent call last)
2 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2898 return self._engine.get_loc(casted_key)
2899 except KeyError as err:
-> 2900 raise KeyError(key) from err
2901
2902 if tolerance is not None:

KeyError: 'header'

@sathya018

This comment has been minimized.

Copy link

@sathya018 sathya018 commented Dec 13, 2020

@parigulak

This comment has been minimized.

Copy link

@parigulak parigulak commented Dec 13, 2020

@sathya018

This comment has been minimized.

Copy link

@sathya018 sathya018 commented Dec 14, 2020

leave a space,return(emoji_pattern.sub(r'_', text)),

@parigulak

This comment has been minimized.

Copy link

@parigulak parigulak commented Dec 14, 2020

I am getting error in
Df['content'] =df['content']... Etc line
While content is my column in csv file
In the error above I used header word
But same issue

@boushra

This comment has been minimized.

Copy link

@boushra boushra commented Mar 24, 2021

when i tray to excute the code i faced this problem
pr

@KorayUlusan

This comment has been minimized.

Copy link

@KorayUlusan KorayUlusan commented Jun 11, 2021

when i tray to excute the code i faced this problem

sys.argv[1] has no arguments. add the tweet.txt file to after main.py

@Mylloon

This comment has been minimized.

Copy link

@Mylloon Mylloon commented Aug 5, 2021

Use this link to get the unicode of every emojis

I just made something to get all the emojis from the link you gave (I use an updated one).

I forked the Gist to make it easier to understand the discussion.

Result:

String: hello 😂
Without Emojis: hello

I managed to comment to make the code understandable, it's not a very complicated code anyway.

@TrieuLe0801

This comment has been minimized.

Copy link

@TrieuLe0801 TrieuLe0801 commented Aug 25, 2021

Here is the updated one:

def remove_emoji(string):
    emoji_pattern = re.compile("["
                               u"\U0001F600-\U0001F64F"  # emoticons
                               u"\U0001F300-\U0001F5FF"  # symbols & pictographs
                               u"\U0001F680-\U0001F6FF"  # transport & map symbols
                               u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
                               u"\U00002500-\U00002BEF"  # chinese char
                               u"\U00002702-\U000027B0"
                               u"\U00002702-\U000027B0"
                               u"\U000024C2-\U0001F251"
                               u"\U0001f926-\U0001f937"
                               u"\U00010000-\U0010ffff"
                               u"\u2640-\u2642"
                               u"\u2600-\u2B55"
                               u"\u200d"
                               u"\u23cf"
                               u"\u23e9"
                               u"\u231a"
                               u"\ufe0f"  # dingbats
                               u"\u3030"
                               "]+", flags=re.UNICODE)
    return emoji_pattern.sub(r'', string)

Thank you!!

Here is the updated one:

def remove_emoji(string):
    emoji_pattern = re.compile("["
                               u"\U0001F600-\U0001F64F"  # emoticons
                               u"\U0001F300-\U0001F5FF"  # symbols & pictographs
                               u"\U0001F680-\U0001F6FF"  # transport & map symbols
                               u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
                               u"\U00002500-\U00002BEF"  # chinese char
                               u"\U00002702-\U000027B0"
                               u"\U00002702-\U000027B0"
                               u"\U000024C2-\U0001F251"
                               u"\U0001f926-\U0001f937"
                               u"\U00010000-\U0010ffff"
                               u"\u2640-\u2642"
                               u"\u2600-\u2B55"
                               u"\u200d"
                               u"\u23cf"
                               u"\u23e9"
                               u"\u231a"
                               u"\ufe0f"  # dingbats
                               u"\u3030"
                               "]+", flags=re.UNICODE)
    return emoji_pattern.sub(r'', string)

It cannot remove ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment