Skip to content

Instantly share code, notes, and snippets.

@mathiasbynens
Forked from 140bytes/LICENSE.txt
Last active October 5, 2022 10:38
Show Gist options
  • Save mathiasbynens/1010324 to your computer and use it in GitHub Desktop.
Save mathiasbynens/1010324 to your computer and use it in GitHub Desktop.
UTF-8 byte counter in 49 bytes
function(string) {
return unescape( // convert a single `%xx` escape into the corresponding character
encodeURI(string) // URL-encode the string (this uses UTF-8)
).length; // read out the length (i.e. the number of `%xx` escapes)
}
// Note: this fails for input that contains lone surrogates.
// Use http://mths.be/utf8js if you need something more robust.
function(s){return unescape(encodeURI(s)).length}
DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
Version 2, December 2004
Copyright (C) 2011 Mathias Bynens <http://mathiasbynens.be/>
Everyone is permitted to copy and distribute verbatim or modified
copies of this license document, and changing it is allowed as long
as the name is changed.
DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
0. You just DO WHAT THE FUCK YOU WANT TO.
{
"name": "byteSize",
"description": "This function will return the byte size of any UTF-8 string you pass to it.",
"keywords": [
"utf-8",
"utf8",
"byte",
"byte-size"
]
}
<!DOCTYPE html>
<!-- online demo: http://mothereff.in/byte-counter -->
<meta charset=utf-8>
<title>Get the byte size of any UTF-8 string</title>
<input autofocus>
<p>Byte size: <span></span>
<script>
var byteSize = function(s){return unescape(encodeURI(s)).length};
var el = document.getElementsByTagName('span')[0];
document.getElementsByTagName('input')[0].oninput = function() {
el.innerHTML = byteSize(this.value);
};
</script>
@atk
Copy link

atk commented Sep 20, 2012

encodeURI and encodeURIComponent will throw out "URI malformed" errors on certain strings in Google Chrome.

@mathiasbynens
Copy link
Author

@atk Yeah, if the input contains lone surrogates.

@fuweichin
Copy link

//count UTF-8 bytes of a string
function byteLengthOf(s){
    //assuming the String is UCS-2(aka UTF-16) encoded
    var n=0;
    for(var i=0,l=s.length; i<l; i++){
        var hi=s.charCodeAt(i);
        if(hi<0x0080){ //[0x0000, 0x007F]
            n+=1;
        }else if(hi<0x0800){ //[0x0080, 0x07FF]
            n+=2;
        }else if(hi<0xD800){ //[0x0800, 0xD7FF]
            n+=3;
        }else if(hi<0xDC00){ //[0xD800, 0xDBFF]
            var lo=s.charCodeAt(++i);
            if(i<l&&lo>=0xDC00&&lo<=0xDFFF){ //followed by [0xDC00, 0xDFFF]
                n+=4;
            }else{
                throw new Error("UCS-2 String malformed");
            }
        }else if(hi<0xE000){ //[0xDC00, 0xDFFF]
            throw new Error("UCS-2 String malformed");
        }else{ //[0xE000, 0xFFFF]
            n+=3;
        }
    }
    return n;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment