Skip to content

Instantly share code, notes, and snippets.

View rfjakob's full-sized avatar

rfjakob

  • Vienna, Austria
View GitHub Profile
@w-vi
w-vi / utf-8-truncate.c
Last active May 17, 2019 00:04
Truncate correctly UTF-8 encoded string
#include <stdio.h>
#include <string.h>
/* UTF-8 character might be encoded in up to 4 bytes according to RFC
* 3629 so to truncate it correctly care needs to be taken for such
* characters. ASCII character are in single byte using 7 bits (max is
* 127) so any multi-byte character has first byte higher then
* 127. Because first byte in multi-byte sequence is encoding how many
* bytes this sequence has we can easily check for it looking for the
* first byte in sequence and see how many bytes more to truncate.