Strip emoji
# this scrubs emoji sequences from a string - i think it covers all of them
def strip_emoji ( str )
str = str.force_encoding('utf-8').encode
clean_text = ""
# emoticons 1F601 - 1F64F
regex = /[\u{1f600}-\u{1f64f}]/
clean_text = str.gsub regex, ''
#dingbats 2702 - 27B0
regex = /[\u{2702}-\u{27b0}]/
clean_text = clean_text.gsub regex, ''
# transport/map symbols
regex = /[\u{1f680}-\u{1f6ff}]/
clean_text = clean_text.gsub regex, ''
# enclosed chars 24C2 - 1F251
regex = /[\u{24C2}-\u{1F251}]/
clean_text = clean_text.gsub regex, ''
# symbols & pics
regex = /[\u{1f300}-\u{1f5ff}]/
clean_text = clean_text.gsub regex, ''
def test_strip_emoji
f ="emoji.txt", "r")
f.each_line do |line|
puts strip_emoji_full(line)
Copy link

yoniamir commented Jul 14, 2014


Copy link

Grafikart commented Nov 21, 2014

Saved my migrations. UTF8 can be a pain sometimes :(

Copy link

franklsf95 commented Mar 18, 2015

This also strips out Chinese characters.

Copy link

tigerjj commented Jul 20, 2015

Be careful, this also strips out CJK (Chinese, Japanese, Korean)

Copy link

philipgiuliani commented Nov 18, 2015

strip_emoji_full method is missing!

Copy link

64kramsystem commented May 24, 2016

This caught my attention because a colleague of mine used it as reference.

If the objective is to remove the 4-bytes characters from an UTF-8 string (which is the widespread problem of MySQL installations who have been using the default utf8 character set), then this is a more standard solution:

scrubbed_utf8_mb3_string = { |char| char.bytesize < 4 }.join

Note that his code is taken from

Copy link

juanroldan1989 commented Jan 4, 2017

Thanks for this method !

Comment from above worked like a charm too : )

Copy link

loicginoux commented Jul 9, 2017

This does not work for all emojis.
see complete list here
example of unfiltered emojis:

comment from @saveriomiroddi is better.
scrubbed_utf8_mb3_string = { |char| char.bytesize < 4 }.join

Copy link

reducm commented Oct 18, 2017

It removes Chinese as well...

Copy link

guanting112 commented Dec 15, 2017

Try this:

( 它不會移除任何中文,僅會根據標準將所有的 emoji 剔除 )

