Skip to content

Instantly share code, notes, and snippets.

@zmwangx
Created December 29, 2019 16:53
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save zmwangx/ce419f2fb1332b7f92381f9e10d34f66 to your computer and use it in GitHub Desktop.
Save zmwangx/ce419f2fb1332b7f92381f9e10d34f66 to your computer and use it in GitHub Desktop.
Strip non-BMP characters from string in Mathematica <12.
surrogateQ[ch_] := # >= 55296 && # < 57344 &@First@ToCharacterCode[ch];
surrogateQ::usage =
"Tests whether the given character is a surrogate, i.e., in the \
range U+D800 to U+DFFF.";
stripNonBMPCharacters[s_] :=
StringJoin[Select[Characters[s], ! surrogateQ[#] &]];
stripNonBMPCharacters::usage =
"Strips the given string of Unicode code points outside of the \
Basic Multilingual Plane (BMP), i.e., characters beyond U+FFFF, by \
removing UTF-16 surrogate pairs.";
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment