Skip to content

Instantly share code, notes, and snippets.

@frgomes
Created February 7, 2019 15:37
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save frgomes/cf3a0803e364ab6e2eb5a3620d57e9af to your computer and use it in GitHub Desktop.
Save frgomes/cf3a0803e364ab6e2eb5a3620d57e9af to your computer and use it in GitHub Desktop.
def escapeNonASCII(str: String) : String = {
val result = new java.lang.StringBuilder
var i = 0
while(i < str.length) {
val cp: Int = Character.codePointAt(str, i)
val n : Int = Character.charCount(cp)
if(n > 1) {
i += n-1
if(i >= str.length) throw new IllegalArgumentException("truncated unexpectedly")
}
if(cp < 128)
result.appendCodePoint(cp)
else
result.append("\\u%x".format(cp))
i += 1
}
result.toString
}
@frgomes
Copy link
Author

frgomes commented Feb 7, 2019

Java (and by tradition, Scala) employs UTF-16 Unicode, which is a precursor of what we call today Unicode.
A quick approach to the matter is:

You can generate test data here: http://generator.lorem-ipsum.info/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment