szktty/clockwork-base32.md

## clockwork-base32.md

      
    Raw
  

              clockwork-base32.md
            
          
    Clockwork Base32

Clockwork Base32 is a simple variant of Base32 inspired by Crockford's Base32.
See also a blog post (in Japanese).
Table of Contents


Specification Version
Last updated
Features
Difference Between Clockwork Base32 and Other Specifications
Symbols
Algorithm

Encoding
Decoding
Notes


Implementations

Reference Implementations
Third-Party Implementations


Examples
License
Author
Acknowledgements
Uses
Links
Specification Revision History
Document Revision History


Specification Version

2020.2 (Updated: 2020-07-27)

Last updated

2021-06-06

Features


Human readable
Octet-aligned binary support
No padding character at end of encoded text
Easy to implement (recommends using bitstring libraries)


Difference Between Clockwork Base32 and Other Specifications


RFC 4648
Crockford's Base32
Clockwork Base32


Human readable
Not needed
Needed
Needed


Input data
Octet sequence
Integer
Octet sequence (byte array)


Encoded representation
ASCII character sequence
Symbol sequence
ASCII character sequence


Symbols
32 alphanum + 1 sign characters
32 alphanum + 5 sign characters (optional)
32 alphanum characters


Padding of encoded data
Used
None
None


Ignored characters in decoding
Non-alphabet characters (optional)
Hyphen
None


Checksum
None
1 character (Optional)
None


Symbols


Clockwork's Base32 symbol set is equal to Crockford's Base32's excluding 5 symbols (*~$=U) for checksum.
Symbol is 1 ASCII character.
Case-insensitive.


Value
Decode
Encode


0
0 O o
0


1
1 I i L l
1


2
2
2


3
3
3


4
4
4


5
5
5


6
6
6


7
7
7


8
8
8


9
9
9


10
A a
A a


11
B b
B b


12
C c
C c


13
D d
D d


14
E e
E e


15
F f
F f


16
G g
G g


17
H  h
H h


18
J j
J j


19
K k
K k


20
M m
M m


21
N n
N n


22
P p
P p


23
Q q
Q q


24
R r
R r


25
S s
S s


26
T t
T t


27
V v
V v


28
W w
W w


29
X x
X x


30
Y y
Y y


31
Z z
Z z


Excluded Characters: U

Algorithm


Encoding


Proceeding from left to right, map each 5 bits representation of input data as block length to a symbol character (upper-case recommended).
If length of the most right block is under 5 bits, fill with zero bits.
Combine the symbol characters into a sequence.
Return the sequence.

Bit length of decoded data must be greater than or equal to bit length of the plain data.

Decoding


Proceeding from left to right, map each character of input data to 5 bits representation.
Combine the 5 bits blocks into an octet sequence.
If the sum of the block length is indivisible by 8, truncate most right bits which length is equal to a remainder of division by 8.
Return the octet sequence.

Some error cases:

Including invalid characters: e.g. uU*=

Some corner cases:

If an input data is 1 character (e.g. 0), decoder may return an empty octet sequence or report as error.
Padding length may be greater than or equal to 5-7 bits. For example, if input data is 3 characters, it represent 15 bits which is 1 character and padding 7 bits. Both of input data CR0 and CR can be decoded as f.


Notes


Encoded data does not contain error detection information. Use this algorithm together with any other error detection algorithm to detect errors.


Examples


Input
Encoded


(empty)
(empty)


f
CR or CR0


foobar
CSQPYRK1E8


Hello, world!
91JPRV3F5GG7EVVJDHJ22


The quick brown fox jumps over the lazy dog.
AHM6A83HENMP6TS0C9S6YXVE41K6YY10D9TPTW3K41QQCSBJ41T6GS90DHGQMY90CHQPEBG


Implementations


Reference Implementations

These reference implementations basically are for help with understanding and implementing.
You should not expect improving performance, good API and continuous maintenance.

C: szktty/c-clockwork-base32
Erlang: shiguredo/erlang-base32
Go: szktty/go-clockwork-base32
Swift: szktty/swift-clockwork-base32


Third-Party Implementations


C++: wx257osn2/clockwork_base32_cxx
C++: objectx/cpp-clockwork-base32
JavaScript: mganeko/js_clockwork_base32
AssemblyScript: mganeko/as_clockwork_base32
Rust: woxtu/rust-clockwork-base32
Rust: hnakamur/rs-clockwork-base32
Go: shogo82148/go-clockwork-base32
TypeScript: niyari/base32-ts


License

This document is distributed under CC BY-ND 4.0.

Author

SUZUKI Tetsuya

Acknowledgements

Shiguredo Inc.

Uses


WebRTC SFU Sora by Shiguredo Inc.

used for encoding and decoding UUID to be readable and shorten (32 characters -> 26 characters).


Links


Crockford's Base32
Formal Verification of Encoding and Decoding

A sample script (Isabelle/HOL)
PDF
Thanks @pirapira!


Specification Revision History

2020.2 (2020-07-27)

[CHANGE] Added decoding specification for some corner cases. Thanks @pirapira!
[CHANGE] Changed decoding 1 character from invalid to valid.

2020.1 (2020-07-20)

First release.


Document Revision History

2021-06-06

[CHANGE] Added a third-party implementation.

niyari/base32-ts


2021-02-11

[CHANGE] Added third-party implementations.

shogo82148/go-clockwork-base32
mganeko/as_clockwork_base32
hnakamur/rs-clockwork-base32


2020-08-11

[CHANGE] Added "Uses" section.

2020-08-01

[CHANGE] Added a reference implementation.

szktty/swift-clockwork-base32


[CHANGE] Added some links.

2020-07-30

[CHANGE] Added a reference implementation.

szktty/c-clockwork-base32


2020-07-27

Released 2020.2.
[CHANGE] Added a table of contents.

2020-07-26

[CHANGE] Added a third-party implementation.

woxtu/rust-clockwork-base32


2020-07-25

[CHANGE] Added third-party implementations.

wx257osn2/clockwork_base32_cxx
objectx/cpp-clockwork-base32
mganeko/js_clockwork_base32


2020-07-22

[CHANGE] Added RFC 4648 to the comparison table of specification.
[FIX] InputHello, world is incorrect. Hello, world! is correct. Thanks @mganeko!

2020-07-20

First release.
	RFC 4648	Crockford's Base32	Clockwork Base32
Human readable	Not needed	Needed	Needed
Input data	Octet sequence	Integer	Octet sequence (byte array)
Encoded representation	ASCII character sequence	Symbol sequence	ASCII character sequence
Symbols	32 alphanum + 1 sign characters	32 alphanum + 5 sign characters (optional)	32 alphanum characters
Padding of encoded data	Used	None	None
Ignored characters in decoding	Non-alphabet characters (optional)	Hyphen	None
Checksum	None	1 character (Optional)	None
Value	Decode	Encode
0	`0` `O` `o`	`0`
1	`1` `I` `i` `L` `l`	`1`
2	`2`	`2`
3	`3`	`3`
4	`4`	`4`
5	`5`	`5`
6	`6`	`6`
7	`7`	`7`
8	`8`	`8`
9	`9`	`9`
10	`A` `a`	`A` `a`
11	`B` `b`	`B` `b`
12	`C` `c`	`C` `c`
13	`D` `d`	`D` `d`
14	`E` `e`	`E` `e`
15	`F` `f`	`F` `f`
16	`G` `g`	`G` `g`
17	`H` `h`	`H` `h`
18	`J` `j`	`J` `j`
19	`K` `k`	`K` `k`
20	`M` `m`	`M` `m`
21	`N` `n`	`N` `n`
22	`P` `p`	`P` `p`
23	`Q` `q`	`Q` `q`
24	`R` `r`	`R` `r`
25	`S` `s`	`S` `s`
26	`T` `t`	`T` `t`
27	`V` `v`	`V` `v`
28	`W` `w`	`W` `w`
29	`X` `x`	`X` `x`
30	`Y` `y`	`Y` `y`
31	`Z` `z`	`Z` `z`
Input	Encoded
(empty)	(empty)
`f`	`CR` or `CR0`
`foobar`	`CSQPYRK1E8`
`Hello, world!`	`91JPRV3F5GG7EVVJDHJ22`
`The quick brown fox jumps over the lazy dog.`	`AHM6A83HENMP6TS0C9S6YXVE41K6YY10D9TPTW3K41QQCSBJ41T6GS90DHGQMY90CHQPEBG`