Skip to content

Instantly share code, notes, and snippets.

@ulcuber
Created March 23, 2019 05:53
Show Gist options
  • Save ulcuber/c28f1cb49d370780f817fbca7e4aa223 to your computer and use it in GitHub Desktop.
Save ulcuber/c28f1cb49d370780f817fbca7e4aa223 to your computer and use it in GitHub Desktop.
using cyrillic in php regexp
<?php
/*
* about task
* strings like "Artist1 & Artist2" or "Artist1 feat. Artist2" or "Исполнитель1 и Исполнитель2"
* are splited into ["Artist1", "Artist2"]
*/
/*
* using modifier u (PCRE_UTF8) to include cyrillic in regexp
* @see http://php.net/manual/en/reference.pcre.pattern.modifiers.php
* not working coz preg_split works only with single byte symbols
* NOTICE: `/` around regexp
*/
$artists = preg_split("/\s*(&|vs|((F|f)(ea)?t\.?)|и)\s*/u", $trimedArtist);
/*
* using mb_* functions for multi byte symbols
* NOTICE: no `/` around regexp
* so, there is no place for modifiers like after `/`
* thereby, we'll encode cyrillic symbols escape sequence \xhh
* @see http://php.net/manual/en/regexp.reference.escape.php
* 'd0b8' == 'и'
*/
$artists = mb_split("\s*(&|vs|((F|f)(ea)?t\.?)|\x{d0b8})\s*", $trimedArtist);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment