Skip to content

Instantly share code, notes, and snippets.

@kssreeram
Created May 6, 2021 09:11
Show Gist options
  • Save kssreeram/dc0999f3aa8177700165c8ce549072c6 to your computer and use it in GitHub Desktop.
Save kssreeram/dc0999f3aa8177700165c8ce549072c6 to your computer and use it in GitHub Desktop.

Grapheme clustering in Swift and ObjC produce different output!

This swift program:

let s = "வணக்கம்" // This is a word in the Tamil language.
var n = 0
for cluster in Array(s) {
    print("cluster \(n) = '\(cluster)'")
    n += 1
}
print("number of clusters = \(n)")

Produces this output:

cluster 0 = 'வ'
cluster 1 = 'ண'
cluster 2 = 'க்'
cluster 3 = 'க'
cluster 4 = 'ம்'
number of clusters = 5

This is the equivalent program in Obj-C:

#import <Foundation/Foundation.h>

int main() {
    @autoreleasepool {
        NSString *s = @"வணக்கம்"; // This is a word in the Tamil language.
        NSUInteger i = 0;
        int n = 0;
        while (i < s.length) {
            NSRange r = [s rangeOfComposedCharacterSequenceAtIndex:i];
            NSString *cluster = [s substringWithRange:r];
            printf("cluster %d = '%s'\n", n, cluster.UTF8String);
            i = r.location + r.length;
            n += 1;
        }
        printf("number of clusters = %d\n", n);
    }
    return 0;
}

But it produces incorrect output:

cluster 0 = 'வ'
cluster 1 = 'ண'
cluster 2 = 'க்க'
cluster 3 = 'ம்'
number of clusters = 4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment