Skip to content

Instantly share code, notes, and snippets.

@szktty
Last active March 4, 2024 04:35
Show Gist options
  • Star 17 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save szktty/228f85794e4187882a77734c89c384a8 to your computer and use it in GitHub Desktop.
Save szktty/228f85794e4187882a77734c89c384a8 to your computer and use it in GitHub Desktop.
Clockwork Base32: A variant of Base32 inspired by Crockford's Base32

Clockwork Base32

Clockwork Base32 is a simple variant of Base32 inspired by Crockford's Base32.

See also a blog post (in Japanese).

Table of Contents

Specification Version

2020.2 (Updated: 2020-07-27)

Last updated

2021-06-06

Features

  • Human readable
  • Octet-aligned binary support
  • No padding character at end of encoded text
  • Easy to implement (recommends using bitstring libraries)

Difference Between Clockwork Base32 and Other Specifications

RFC 4648 Crockford's Base32 Clockwork Base32
Human readable Not needed Needed Needed
Input data Octet sequence Integer Octet sequence (byte array)
Encoded representation ASCII character sequence Symbol sequence ASCII character sequence
Symbols 32 alphanum + 1 sign characters 32 alphanum + 5 sign characters (optional) 32 alphanum characters
Padding of encoded data Used None None
Ignored characters in decoding Non-alphabet characters (optional) Hyphen None
Checksum None 1 character (Optional) None

Symbols

  • Clockwork's Base32 symbol set is equal to Crockford's Base32's excluding 5 symbols (*~$=U) for checksum.
  • Symbol is 1 ASCII character.
  • Case-insensitive.
Value Decode Encode
0 0 O o 0
1 1 I i L l 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 A a A a
11 B b B b
12 C c C c
13 D d D d
14 E e E e
15 F f F f
16 G g G g
17 H h H h
18 J j J j
19 K k K k
20 M m M m
21 N n N n
22 P p P p
23 Q q Q q
24 R r R r
25 S s S s
26 T t T t
27 V v V v
28 W w W w
29 X x X x
30 Y y Y y
31 Z z Z z

Excluded Characters: U

Algorithm

Encoding

  1. Proceeding from left to right, map each 5 bits representation of input data as block length to a symbol character (upper-case recommended). If length of the most right block is under 5 bits, fill with zero bits.
  2. Combine the symbol characters into a sequence.
  3. Return the sequence.

Bit length of decoded data must be greater than or equal to bit length of the plain data.

Decoding

  1. Proceeding from left to right, map each character of input data to 5 bits representation.
  2. Combine the 5 bits blocks into an octet sequence. If the sum of the block length is indivisible by 8, truncate most right bits which length is equal to a remainder of division by 8.
  3. Return the octet sequence.

Some error cases:

  • Including invalid characters: e.g. uU*=

Some corner cases:

  • If an input data is 1 character (e.g. 0), decoder may return an empty octet sequence or report as error.
  • Padding length may be greater than or equal to 5-7 bits. For example, if input data is 3 characters, it represent 15 bits which is 1 character and padding 7 bits. Both of input data CR0 and CR can be decoded as f.

Notes

  • Encoded data does not contain error detection information. Use this algorithm together with any other error detection algorithm to detect errors.

Examples

Input Encoded
(empty) (empty)
f CR or CR0
foobar CSQPYRK1E8
Hello, world! 91JPRV3F5GG7EVVJDHJ22
The quick brown fox jumps over the lazy dog. AHM6A83HENMP6TS0C9S6YXVE41K6YY10D9TPTW3K41QQCSBJ41T6GS90DHGQMY90CHQPEBG

Implementations

Reference Implementations

These reference implementations basically are for help with understanding and implementing. You should not expect improving performance, good API and continuous maintenance.

Third-Party Implementations

License

This document is distributed under CC BY-ND 4.0.

Author

SUZUKI Tetsuya

Acknowledgements

Shiguredo Inc.

Uses

  • WebRTC SFU Sora by Shiguredo Inc.
    • used for encoding and decoding UUID to be readable and shorten (32 characters -> 26 characters).

Links

Specification Revision History

2020.2 (2020-07-27)

  • [CHANGE] Added decoding specification for some corner cases. Thanks @pirapira!
  • [CHANGE] Changed decoding 1 character from invalid to valid.

2020.1 (2020-07-20)

  • First release.

Document Revision History

2021-06-06

  • [CHANGE] Added a third-party implementation.
    • niyari/base32-ts

2021-02-11

  • [CHANGE] Added third-party implementations.
    • shogo82148/go-clockwork-base32
    • mganeko/as_clockwork_base32
    • hnakamur/rs-clockwork-base32

2020-08-11

  • [CHANGE] Added "Uses" section.

2020-08-01

  • [CHANGE] Added a reference implementation.
    • szktty/swift-clockwork-base32
  • [CHANGE] Added some links.

2020-07-30

  • [CHANGE] Added a reference implementation.
    • szktty/c-clockwork-base32

2020-07-27

  • Released 2020.2.
  • [CHANGE] Added a table of contents.

2020-07-26

  • [CHANGE] Added a third-party implementation.
    • woxtu/rust-clockwork-base32

2020-07-25

  • [CHANGE] Added third-party implementations.
    • wx257osn2/clockwork_base32_cxx
    • objectx/cpp-clockwork-base32
    • mganeko/js_clockwork_base32

2020-07-22

  • [CHANGE] Added RFC 4648 to the comparison table of specification.
  • [FIX] InputHello, world is incorrect. Hello, world! is correct. Thanks @mganeko!

2020-07-20

  • First release.
@szktty
Copy link
Author

szktty commented Jul 21, 2020

@mganeko Fixed. Thanks for the report!

@pirapira
Copy link

pirapira commented Jul 25, 2020

When the decoder sees three characters, should it report an error?

Three-character encoding is as weird as one-character encoding. If the original byte sequence is one byte (= 8 bits), the encoding should be two characters. If the original byte sequence is two-byte long (= 16 bits), the encoding should be four characters.

There is no problem with letting the implementors choose.

@szktty
Copy link
Author

szktty commented Jul 26, 2020

@pirapira Encoding does not generate 3 characters data (15 bits), but decoding allows the data contains 1 character (8 bits) and 7 bits padding. For example, both of CR and CR0 can be decoded to f. And decoder may report as error. Updated the document about the specification. Existing implementations may not have to be changed.

Thanks for the comment!

@cmplstofB
Copy link

May the output of the encoder and the input to decoder have newline characters?
In RFC 4648, whether wrap the output of the encoder with line-feed character or not is mentioned as implementation discrepancies. And this specification doesn't seem to specify about newline character in the input data.

@szktty
Copy link
Author

szktty commented Jun 6, 2021

@cmplstofB Newline characters must not contained in encoded data and input data into decoder. The both data must be ASCII character sequence which consists of 32 alphanum characters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment