Skip to content

Instantly share code, notes, and snippets.

@elextr

elextr/utf8.cpp Secret

Created September 20, 2016 03:14
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save elextr/994dcf61b7f009297e229aba7f46f93d to your computer and use it in GitHub Desktop.
Save elextr/994dcf61b7f009297e229aba7f46f93d to your computer and use it in GitHub Desktop.
unchecked utf-8 decoding
typedef unsigned int Codepoint; // any unsigned int > 21 bits
Codepoint getcp(const char* &c){
if(*c == 0)return 0;
Codepoint cp = static_cast<Codepoint>(*c++) & 0xff; // works for signed or unsigned chars
if(cp < 0x80)goto onebyte;
if(cp < 0xe0){ cp &= 0x1f; goto twobyte; }
if(cp < 0xf0){ cp &= 0x0f; goto threebyte; }
cp &= 0x07;
cp = (cp << 6) + (*c++ & 0x3f);
threebyte:
cp = (cp << 6) + (*c++ & 0x3f);
twobyte:
cp = (cp << 6) + (*c++ & 0x3f);
onebyte:
return cp;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment