kamipo/gist:61ee662ee0b1127f0989

## gistfile1.md

      
    Raw
  

              gistfile1.md
            
          
    MySQL Unicode character set has following collations mainly:

xxx_bin: compare all characters by these code point as weight.
xxx_general_ci: compare almost characters by these code point as weight.
xxx_unicode_ci: compare all characters by these collating weight.

ref. http://dev.mysql.com/doc/refman/5.6/en/charset-unicode-sets.html
When xxx is utf8, can treat only BMP characters.
When xxx is utf8mb4, can treat SMP characters also.
Currently, collating behavior of utf8mb4_general_ci and utf8mb4_unicode_ci for SMP characters are known as Sushi-Beer issue:

http://bugs.mysql.com/bug.php?id=76553
http://blog.kamipo.net/entry/2015/03/23/093052 (Japanese entry)

In my opinion, I think that it is good to think as follows:

When want to treat SMP characters, use utf8mb4_general_ci (default collation of utf8mb4).

If want to compare SMP characters, use utf8mb4_bin.


When want to treat only BMP characters, use utf8_general_ci (default collation of utf8).

If want to case sensitive comparison, use utf8_bin.


When want to treat only ASCII characters, use ascii_general_ci (default collation of ascii).

If want to case sensitive comparison, use ascii_bin.