Skip to content

Instantly share code, notes, and snippets.

@AlttiRi
Last active December 26, 2022 10:48
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save AlttiRi/836fc9b6d48b2e5b0ac334da9dfba62c to your computer and use it in GitHub Desktop.
Save AlttiRi/836fc9b6d48b2e5b0ac334da9dfba62c to your computer and use it in GitHub Desktop.
Binary strings, ByteString [JavaScript]

Binary strings

JavaScript strings are UTF-16 encoded strings. This means that each code unit requires two bytes of memory and is able to represent 65535 different code points. A subset of these strings is represented by UTF-16 strings containing only ASCII characters (i.e., characters whose code point does not exceed 127). For instance, the string "Hello world!" belongs to the ASCII subset, while the string "ÀÈÌÒÙ" does not. A binary string is a concept similar to the ASCII subset, but instead of limiting the range to 127, it allows code points until 255. Its purpose however is not to represent characters, but binary data. The size of the data so represented is twice as big as it would be in normal binary format, however this will not be visible to the final user, since the length of JavaScript strings is calculated using two bytes as the unit.

Binary strings are not part of the JavaScript language design. However at least one native function requires binary strings as its input, btoa(): invoking it on a string that contains codepoints greater than 255 will cause a Character Out Of Range error.

The reason that brought to use UTF-16 code units as placeholders for uint8 numbers is that as web applications become more and more powerful (adding features such as audio and video manipulation, access to raw data using WebSockets, and so forth) it has become clear that there are times when it would be helpful for JavaScript code to be able to quickly and easily manipulate raw binary data.

In the past, this had to be simulated by treating the raw data as a string and using the charCodeAt() method to read the bytes from the data buffer (i.e., using binary strings). However, this is slow and error-prone, due to the need for multiple conversions (especially if the binary data is not actually byte-format data, but, for example, 32-bit integers or floats).

JavaScript typed arrays provide a mechanism for accessing raw binary data much more efficiently.

Source: https://developer.mozilla.org/en-US/docs/Web/API/DOMString/Binary

ByteString

ByteString is a UTF-8 String that corresponds to the set of all possible sequences of bytes. ByteString maps to a String when returned in JavaScript; generally, it's only used when interfacing with protocols that use bytes and strings interchangeably, such as HTTP.

Source: https://developer.mozilla.org/en-US/docs/Web/API/ByteString

2.13.18. ByteString

The ByteString type corresponds to the set of all possible sequences of bytes. Such sequences might be interpreted as UTF-8 encoded strings [RFC3629] or strings in some other 8-bit-per-code-unit encoding, although this is not required.

There is no way to represent a constant ByteString value in IDL, although ByteString dictionary member default values and operation optional argument default values can be set to the value of a string literal.

Specifications should only use ByteString for interfacing with protocols that use bytes and strings interchangeably, such as HTTP. In general, strings should be represented with DOMString values, even if it is expected that values of the string will always be in ASCII or some 8 bit character encoding. Sequences or frozen arrays with octet or byte elements, Uint8Array, or Int8Array should be used for holding 8 bit data rather than ByteString.

Source: https://webidl.spec.whatwg.org/#idl-ByteString

btoa()

The btoa() method creates a Base64-encoded ASCII string from a binary string (i.e., a string in which each character in the string is treated as a byte of binary data).

You can use this method to encode data which may otherwise cause communication problems, transmit it, then use the atob() method to decode the data again. For example, you can encode control characters such as ASCII values 0 through 31.

...

Source: https://developer.mozilla.org/en-US/docs/Web/API/btoa


See also: atob(): https://developer.mozilla.org/en-US/docs/Web/API/atob

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment