Skip to content

Instantly share code, notes, and snippets.

@panzi
Last active April 25, 2018 22:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save panzi/f338e83e128736b5aaac2b5a0816f530 to your computer and use it in GitHub Desktop.
Save panzi/f338e83e128736b5aaac2b5a0816f530 to your computer and use it in GitHub Desktop.
Truncate a string at a given length in UTF-8 encoded bytes with JavaScript. Many APIs have lengths limitations given in these kind of numbers.
// Truncate a string at a given length in UTF-8 encoded bytes.
// Many APIs have lengths limitations given in these kind of numbers.
function truncateUtf8Bytes(string, byteLength) {
const charLength = string.length;
let curByteLength = 0;
for (let i = 0; i < charLength; ++ i) {
const start = i;
const w1 = string.charCodeAt(i);
let cp = w1;
if ((w1 & 0xfc00) === 0xd800) {
const w2 = string.charCodeAt(i + 1);
if ((w2 & 0xfc00) === 0xdc00) {
const hi = w1 & 0x3ff;
const lo = w2 & 0x3ff;
cp = (hi << 10) | lo | 0x10000;
++ i;
}
}
curByteLength += (
cp >= 0x10000 ? 4 :
cp >= 0x800 ? 3 :
cp >= 0x80 ? 2 :
1);
if (curByteLength === byteLength) {
return string.slice(0, i + 1);
}
else if (curByteLength > byteLength) {
return string.slice(0, start + 1);
}
}
return string;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment