Skip to content

Instantly share code, notes, and snippets.

@siqin
Created December 4, 2012 07:57
Show Gist options
  • Star 20 You must be signed in to star a gist
  • Fork 4 You must be signed in to fork a gist
  • Save siqin/4201667 to your computer and use it in GitHub Desktop.
Save siqin/4201667 to your computer and use it in GitHub Desktop.
Remove Emoji in NSString
// XCode 4.2.1
@implementation NSString(EmojiExtension)
- (NSString*)removeEmoji {
__block NSMutableString* temp = [NSMutableString string];
[self enumerateSubstringsInRange: NSMakeRange(0, [self length]) options:NSStringEnumerationByComposedCharacterSequences usingBlock:
^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop){
const unichar hs = [substring characterAtIndex: 0];
// surrogate pair
if (0xd800 <= hs && hs <= 0xdbff) {
const unichar ls = [substring characterAtIndex: 1];
const int uc = ((hs - 0xd800) * 0x400) + (ls - 0xdc00) + 0x10000;
[temp appendString: (0x1d000 <= uc && uc <= 0x1f77f)? @"": substring]; // U+1D000-1F77F
// non surrogate
} else {
[temp appendString: (0x2100 <= hs && hs <= 0x26ff)? @"": substring]; // U+2100-26FF
}
}];
return temp;
}
@end
@DHowett
Copy link

DHowett commented Aug 1, 2014

Why muck about with surrogate pairs and blocks (and function call overhead) if you don't have to?

#include <unicode/utf8.h>
@implementation NSString (EmojiExtension)
- (NSString *)stringByRemovingEmoji {
    NSData *d = [self dataUsingEncoding:NSUTF8StringEncoding allowLossyConversion:NO];
    if(!d) return nil;
    const char *buf = d.bytes;
    unsigned int len = [d length];
    char *s = (char *)malloc(len);
    unsigned int ii = 0, oi = 0; // in index, out index
    UChar32 uc;
    while (ii < len) {
        U8_NEXT_UNSAFE(buf, ii, uc);
        if(0x2100 <= uc && uc <= 0x26ff) continue;
        if(0x1d000 <= uc && uc <= 0x1f77f) continue;
        U8_APPEND_UNSAFE(s, oi, uc);
    }
    return [[[NSString alloc] initWithBytesNoCopy:s length:oi encoding:NSUTF8StringEncoding freeWhenDone:YES] autorelease];
}
@end

(Gist'd here)

@nschum
Copy link

nschum commented May 1, 2015

Swift version:

extension Character {
    func isEmoji() -> Bool {
        return Character(UnicodeScalar(0x1d000)) <= self && self <= Character(UnicodeScalar(0x1f77f))
            || Character(UnicodeScalar(0x2100)) <= self && self <= Character(UnicodeScalar(0x26ff))
    }
}

extension String {
    func stringByRemovingEmoji() -> String {
        return String(filter(self, {c in !c.isEmoji()}))
    }
}

@deya-eldeen
Copy link

deya-eldeen commented Jun 19, 2016

thanks nschum, but this is not working for swift 2.2, there is a problem in
c.isEmoji()

@crazypepper
Copy link

crazypepper commented Jul 12, 2016

For Swift 2.2:

extension Character {
    func isEmoji() -> Bool {
        return Character(UnicodeScalar(0x1d000)) <= self && self <= Character(UnicodeScalar(0x1f77f))
            || Character(UnicodeScalar(0x2100)) <= self && self <= Character(UnicodeScalar(0x26ff))
    }
}

extension String {
    func stringByRemovingEmoji() -> String {
        return String(self.characters.filter{!$0.isEmoji()})
    }
}

@foffer
Copy link

foffer commented Dec 9, 2016

Swift 3.0

extension Character {
    fileprivate func isEmoji() -> Bool {
        return Character(UnicodeScalar(UInt32(0x1d000))!) <= self && self <= Character(UnicodeScalar(UInt32(0x1f77f))!) 
            || Character(UnicodeScalar(UInt32(0x2100))!) <= self && self <= Character(UnicodeScalar(UInt32(0x26ff))!)
    }
}

extension String {
    func stringByRemovingEmoji() -> String {
        return String(self.characters.filter { !$0.isEmoji() })
    }
}

@deya-eldeen
Copy link

in XCode 8.3.2 ...

func stringByRemovingEmoji() -> String {
    return String(self.characters.filter { !$0.isEmoji() })
}

no longer works.

@maira786
Copy link

maira786 commented Nov 26, 2017

Swift 4.0

extension Character {
    fileprivate func isEmoji() -> Bool {
        return Character(UnicodeScalar(UInt32(0x1d000))!) <= self && self <= Character(UnicodeScalar(UInt32(0x1f77f))!) 
            || Character(UnicodeScalar(UInt32(0x2100))!) <= self && self <= Character(UnicodeScalar(UInt32(0x26ff))!)
    }
}

extension String {
    func stringByRemovingEmoji() -> String {
        return String(self.filter { !$0.isEmoji() })
    }
}

@simon9211
Copy link

the emoj of heart ❤️ does not work!

@Anticro
Copy link

Anticro commented Jul 1, 2020

'Measuring length of a string' at the Apple docs https://developer.apple.com/documentation/swift/string brought me to another solution, without the need for knowledge about the unicode pages. I just want letters to to remain in the string and skip all that is an icon:

#include <string.h>

inline static NSString* _Nonnull nsstring_remove_emoji_v2(NSString* const _Nonnull origString) {
    NSMutableString* const result = [NSMutableString stringWithCapacity:0];
    NSUInteger const len = origString.length;
    NSString* subStr;
    for (NSUInteger index = 0; index < len; index++) {
        subStr = [origString substringWithRange:NSMakeRange(index, 1)];
        const char* utf8Rep = subStr.UTF8String;  // will return NULL for icons that consist of 2 chars
        if (utf8Rep != NULL) {
            unsigned long const length = strlen(utf8Rep);
            if (length <= 2) {
                [result appendString:subStr];
            }
        }
    }
    return result.copy;
}

I have no clue, what this does with chinese or japanese text. But it works for all german letters.

@ninjitaru
Copy link

I have no clue, what this does with chinese or japanese text. But it works for all german letters.

Came across this gist, and I happen to have strings with Chinese + emoji, this code will remove all Chinese character due to there strlen are 3 :)

@smorr
Copy link

smorr commented Nov 16, 2022

Much simpler way is to use a string transform: -- this will move all emoji code points, and preserve non-latin characters, accents etc

Eg
[@"🤯!!! ক❤️testé᏷🧡💚💛せぬ❤️‍🔥👩🏿‍🦰" stringByApplyingTransform: @"[:emoji:] remove" reverse:NO]

returns
!!! ক️testé᏷せぬ️‍‍

@smorr
Copy link

smorr commented Nov 18, 2022

just to followup. -- apparently the [:emoji:] property used in the ICU transform includes digits, some punctuation, other things not generally though to be emoji.

I am finding this method on an NSString category working better

- (NSString *)stringByRemovingEmoji {
    static NSRegularExpression * regex = nil;
    static dispatch_once_t onceToken;
    dispatch_once(&onceToken, ^{
        // remove all emoji less those that are digits, punctuation, letters, latin 1 supplement or letter like symbols
        // or BIDI Non-Spacing Mark
        NSError * error = nil;
        regex = [NSRegularExpression regularExpressionWithPattern:@"([[:emoji:]--[:digit:]--[:punctuation:]--[:letter:]--[:block=Latin-1_sup:]--[:block=letter-like-symbols:]]|\\uFE0F)" options: 0 error:&error];
        if (error){
            NSLog(@"Error forming regex");
        }
    });
    
    return [regex stringByReplacingMatchesInString:self options:0 range:NSMakeRange(0, self.length) withTemplate:@""];
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment