To improve interoperability between PHP and other programming languages and to simplify base32 usage in PHP we propose to add the ability for the core language to encode and decode data against the base32 algorithm.
Currently this feature is done using userland implementations via third party packages. These userland implementations have various downsides, discussed further below.
First and foremost, as of the time of this writing there are dozens if not more encoding/decoding algorithms that all goes by the "generic" name of base32. Adding to the difficulty to find the correct base32 algorithm that suits your use case, is the diversity of PHP userland packages which all claim to support base32 encoding/decoding without relying on the same algorithm. The situation becomes critical if your application relies on that say encoding for handling data generated from other systems or from other programming languages. The context renders using base32 in PHP more complex than it should be.
The goal of the RFC is to proposed the basic base32 encoding/decoding functionalities as described in RFC4648 to the PHP standard library. As such it would improve base32 usage in PHP while providing a base functionality which improve interoperability with other programming languages.
Add two new basic functions:
base32_encode(string $decoded, string $alphabet = PHP_BASE32_ASCII, string $padding = '='): string
base32_decode(string $encoded, string $alphabet = PHP_BASE32_ASCII, string $padding = '=', bool $strict = false): string|false
And two new global constants:
PHP_BASE32_ASCII = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ234567';
PHP_BASE32_HEX = '0123456789ABCDEFGHIJKLMNOPQRSTUV';
$decoded
: the data to encode usingbase32_encode
$encoded
: the data to decode usingbase32_decode
$alphabet
: the base32 alphabet, by default thePHP_BASE32_ASCII
constant is used with both functions$padding
: the padding character, by default the=
character is used with both functions
Supported algorithms:
- If
$alphabet
isPHP_BASE_ASCII
and the padding character is=
conversion is performed per RFC4648 US-ASCII standard. - If
$alphabet
isPHP_BASE_HEX
and the padding character is=
, conversion is performed per RFC4648 HEX standard.
In case of errors:
- for the
$alphabet
or$padding
parameters, aValueError
exception will be thrown if their value are invalid. - during decoding,
false
is returned as, in PHP, function in normal condition do not throw exceptions.
base32_encode('Bangui'); // returns 'IJQW4Z3VNE======'
base32_decode('IJQW4Z3VNE======'); // returns 'Bangui'
base32_decode('IJQW4Z083VNE======'); // returns 'Bangui'
base32_decode('IJQW4Z083VNE======', PHP_BASE32_ASCII, true); // returns false
base32_encode('Bangui', PHP_BASE32_HEX, '*'); // returns '89GMSPRLD4******'
base32_decode('89GMSPRLD4******', PHP_BASE32_HEX, '*', true); // returns 'Bangui'
When strict decoding is not used, the default behaviour, decoding will allow:
- the presence of characters outside the defined RFC alphabet (the characters are ignored)
- lowercased characters (characters are converted to their uppercased values if found or ignored)
- invalid padding length
- padding character inside the encoded data (padding character is ignored inside the encoded data)
The current API allows:
- changing the alphabet by submitting your own alphabet via the
$alphabet
parameter; - changing the padding by submitting your own padding character via the
$padding
parameter;
with the following restrictions:
- The padding character can be any one byte long character except
\r
,\t
,\n
and the space character. - The alphabet must be a 32-byte string that contains unique byte values which must not contain, the padding character,
\r
,\t
,\n
and the space character. - The alphabet is treated as a sequence of byte values without any special treatment for multi-byte UTF-8.
The following characters: \r
, \t
, \n
and the space character are all ignored during decoding regardless of the value assigned to the $strict
parameter.
No backwards incompatible changes inside php itself.
There might be incompatibilities, if this function was implemented in the user-land code. But this issue would be noticed by the developer quickly as such global functions are added rather early in the application boot process.