Skip to content

Instantly share code, notes, and snippets.

@nyamsprod
Last active April 19, 2024 06:42
Show Gist options
  • Save nyamsprod/8a5cf21c136952a46ec8836f29738c82 to your computer and use it in GitHub Desktop.
Save nyamsprod/8a5cf21c136952a46ec8836f29738c82 to your computer and use it in GitHub Desktop.
base32 proposal

PHP RFC proposal: Add base32_decode and base32_encode functions

Introduction

To improve interoperability between PHP and other programming languages and to simplify base32 usage in PHP we propose to add the ability for the core language to encode and decode data against the base32 algorithm.

Currently this feature is done using userland implementations via third party packages. These userland implementations have various downsides, discussed further below.

Downsides of Common Userland Approaches

First and foremost, as of the time of this writing there are dozens if not more encoding/decoding algorithms that all goes by the "generic" name of base32. Adding to the difficulty to find the correct base32 algorithm that suits your use case, is the diversity of PHP userland packages which all claim to support base32 encoding/decoding without relying on the same algorithm. The situation becomes critical if your application relies on that say encoding for handling data generated from other systems or from other programming languages. The context renders using base32 in PHP more complex than it should be.

The goal of the RFC is to proposed the basic base32 encoding/decoding functionalities as described in RFC4648 to the PHP standard library. As such it would improve base32 usage in PHP while providing a base functionality which improve interoperability with other programming languages.

Proposal

Add two new basic functions:

base32_encode(string $decoded, string $alphabet = PHP_BASE32_ASCII, string $padding = '='): string
base32_decode(string $encoded, string $alphabet = PHP_BASE32_ASCII, string $padding = '=', bool $strict = false): string|false

And two new global constants:

PHP_BASE32_ASCII = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ234567';
PHP_BASE32_HEX = '0123456789ABCDEFGHIJKLMNOPQRSTUV';

Parameters:

  • $decoded : the data to encode using base32_encode
  • $encoded : the data to decode using base32_decode
  • $alphabet : the base32 alphabet, by default the PHP_BASE32_ASCII constant is used with both functions
  • $padding : the padding character, by default the = character is used with both functions

Supported algorithms:

  • If $alphabet is PHP_BASE_ASCII and the padding character is = conversion is performed per RFC4648 US-ASCII standard.
  • If $alphabet is PHP_BASE_HEX and the padding character is =, conversion is performed per RFC4648 HEX standard.

In case of errors:

  • for the $alphabet or $padding parameters, a ValueError exception will be thrown if their value are invalid.
  • during decoding, false is returned as, in PHP, function in normal condition do not throw exceptions.
base32_encode('Bangui');                                      // returns 'IJQW4Z3VNE======'
base32_decode('IJQW4Z3VNE======');                            // returns 'Bangui'
base32_decode('IJQW4Z083VNE======');                          // returns 'Bangui'
base32_decode('IJQW4Z083VNE======', PHP_BASE32_ASCII, true);  // returns false
base32_encode('Bangui', PHP_BASE32_HEX, '*');                 // returns '89GMSPRLD4******'
base32_decode('89GMSPRLD4******', PHP_BASE32_HEX, '*', true); // returns 'Bangui'

When strict decoding is not used, the default behaviour, decoding will allow:

  • the presence of characters outside the defined RFC alphabet (the characters are ignored)
  • lowercased characters (characters are converted to their uppercased values if found or ignored)
  • invalid padding length
  • padding character inside the encoded data (padding character is ignored inside the encoded data)

The current API allows:

  • changing the alphabet by submitting your own alphabet via the $alphabet parameter;
  • changing the padding by submitting your own padding character via the $padding parameter;

with the following restrictions:

  • The padding character can be any one byte long character except \r, \t, \n and the space character.
  • The alphabet must be a 32-byte string that contains unique byte values which must not contain, the padding character, \r, \t, \n and the space character.
  • The alphabet is treated as a sequence of byte values without any special treatment for multi-byte UTF-8.

The following characters: \r, \t, \n and the space character are all ignored during decoding regardless of the value assigned to the $strict parameter.

Backward Incompatible Changes

No backwards incompatible changes inside php itself.

There might be incompatibilities, if this function was implemented in the user-land code. But this issue would be noticed by the developer quickly as such global functions are added rather early in the application boot process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment