-
-
Save ahmednasir91/0cf805b5843b295e8959 to your computer and use it in GitHub Desktop.
<collation name="utf8_arabic" id="100"> | |
<rules> | |
<reset>\u0627</reset> | |
<i>\u0622</i> | |
<i>\u0623</i> | |
<i>\u0625</i> | |
</rules> | |
<rules> | |
<reset>\u0647</reset> | |
<i>\u0629</i> | |
</rules> | |
<rules> | |
<reset>\u0649</reset> | |
<i>\u064a</i> | |
</rules> | |
</collation> |
@mzeidhassan most likely you worked things out months ago, I am just adding the answer to your question in case someone comes here looking for a solution.
You need to add the collation to a file called 'Index.xml'. Its location varies from system to another, you can find it on your system by querying the 'information_schema' database with the following query:
SHOW VARIABLES LIKE 'character_sets_dir';
Backup the file, scroll to element <charset name="utf8">
and add the collation there. It will look like the following:
<charset name="utf8">
.
.
.
<collation name="utf8_arabic_ci" id="1029">
<rules>
<reset>\u0627</reset> <!-- Alef 'ا' -->
<i>\u0623</i> <!-- Alef With Hamza Above 'أ' -->
<i>\u0625</i> <!-- Alef With Hamza Below 'إ' -->
<i>\u0622</i> <!-- Alef With Madda Above 'آ' -->
</rules>
<rules>
<reset>\u0629</reset> <!-- Teh Marbuta 'ة' -->
<i>\u0647</i> <!-- Heh 'ه' -->
</rules>
<rules>
<reset>\u0000</reset> <!-- Ignore Tashkil -->
<i>\u064E</i> <!-- Fatha 'َ' -->
<i>\u064F</i> <!-- Damma 'ُ' -->
<i>\u0650</i> <!-- Kasra 'ِ' -->
<i>\u0651</i> <!-- Shadda 'ّ' -->
<i>\u064F</i> <!-- Sukun 'ْ' -->
<i>\u064B</i> <!-- Fathatan 'ً' -->
<i>\u064C</i> <!-- Dammatan 'ٌ' -->
<i>\u064D</i> <!-- Kasratan 'ٍ' -->
</rules>
</collation>
</charset>
My collation here named 'utf8_arabic_ci' is the same as Ahmed Nasir, I just added the part to ignore tashkil. You will have to restart MySQL, and then change the collation of the column with a query like:
ALTER TABLE persons MODIFY name VARCHAR(50) CHARACTER SET 'utf8' COLLATE 'utf8_arabic_ci';
For more information you can check my blog post on this subject:
arabic-case-insensitive-in-database-systems
And the MySQL documentation about adding a new collation
thanks for this gist.
maybe you should add the rule for Ya2 as well
resetting ي and ى
<rules>
<reset>\u064A</reset>
<i>\u0649</i>
</rules>
as-salamo Alaikom Ahmed,
I have been following your question @ https://stackoverflow.com/questions/23272518/normalize-arabic-text-mysql
but I don't know how to use this index file in MySql? Can you please provide instruction?
Also, can I use the same to ignore Arabic diacritics 'tashkeel' during search?
Thanks in advance for your help!
Mohamed