Skip to content

Instantly share code, notes, and snippets.

@chrisveness
chrisveness / utf8-regex.js
Last active May 25, 2023 01:53
Utf8 string encode/decode using regular expressions
/**
* Encodes multi-byte Unicode string into utf-8 multiple single-byte characters
* (BMP / basic multilingual plane only).
*
* Chars in range U+0080 - U+07FF are encoded in 2 chars, U+0800 - U+FFFF in 3 chars.
*
* Can be achieved in JavaScript by unescape(encodeURIComponent(str)),
* but this approach may be useful in other languages.
*
* @param {string} unicodeString - Unicode string to be encoded as UTF-8.