Created
December 29, 2019 16:53
-
-
Save zmwangx/ce419f2fb1332b7f92381f9e10d34f66 to your computer and use it in GitHub Desktop.
Strip non-BMP characters from string in Mathematica <12.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
surrogateQ[ch_] := # >= 55296 && # < 57344 &@First@ToCharacterCode[ch]; | |
surrogateQ::usage = | |
"Tests whether the given character is a surrogate, i.e., in the \ | |
range U+D800 to U+DFFF."; | |
stripNonBMPCharacters[s_] := | |
StringJoin[Select[Characters[s], ! surrogateQ[#] &]]; | |
stripNonBMPCharacters::usage = | |
"Strips the given string of Unicode code points outside of the \ | |
Basic Multilingual Plane (BMP), i.e., characters beyond U+FFFF, by \ | |
removing UTF-16 surrogate pairs."; |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment