Created
February 7, 2019 15:37
-
-
Save frgomes/cf3a0803e364ab6e2eb5a3620d57e9af to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def escapeNonASCII(str: String) : String = { | |
val result = new java.lang.StringBuilder | |
var i = 0 | |
while(i < str.length) { | |
val cp: Int = Character.codePointAt(str, i) | |
val n : Int = Character.charCount(cp) | |
if(n > 1) { | |
i += n-1 | |
if(i >= str.length) throw new IllegalArgumentException("truncated unexpectedly") | |
} | |
if(cp < 128) | |
result.appendCodePoint(cp) | |
else | |
result.append("\\u%x".format(cp)) | |
i += 1 | |
} | |
result.toString | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Java (and by tradition, Scala) employs UTF-16 Unicode, which is a precursor of what we call today Unicode.
A quick approach to the matter is:
char
(orChar
) data type.length
of aString
simply counting how many code points it has.You can generate test data here: http://generator.lorem-ipsum.info/