Skip to content

Instantly share code, notes, and snippets.

@Artoria2e5
Last active November 14, 2016 13:30
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Artoria2e5/eddacfa7998fddf9afd93449f97fcfcf to your computer and use it in GitHub Desktop.
Save Artoria2e5/eddacfa7998fddf9afd93449f97fcfcf to your computer and use it in GitHub Desktop.
baseIdeo encoding

baseIdeo encoding

The baseIdeo encoding encodes an arbitrary stream of bytes to Unicode code points in the "CJK Unified Ideographs" block (U+4E00–U+9FFF). Each code point represents a maximum of 14 bits.

The baseIdeo encoding is inspired by pnck's basecjk.

Codepoints

  • U+6000–U+9FFF are used for normative encoding.
  • U+4E00-U+4E0D is dedicated to padding.

Stream

A baseIdeo

Concat-Var baseIdeo

baseIdeo's padding scheme allows for easy lossless interpretation of padding lengths. This property can be utilized to concatenate streams without re-interpretation[1], given the following modification to the definiton of a stream:

Note that under this variant, the same bitstream, depending how it is segmented, can be encoded as different Concat-Var baseIdeo streams.

Encoder

A baseIdeo encoder has an associated bit-stream sb, and a stream-length property l. Its handler runs the following operations:

  1. Let baseOffset be U+6000.
  2. Let padBase be U+4E00.
  3. Let remaining be l.
  4. While remaining is no less than 14:
  5. Read 14 bits from sb as integer b14.
  6. Decrement remaining by 14.
  7. Emit the codepoint b14 + baseOffset.
  8. If remaining is greater than 0:
  9. Read remaining bits from sb as integer b14.
  10. Bitwise shift b14 left by 14 - remaining bits.
  11. Emit the codepoint b14 + baseOffset.
  12. Emit the codepoint remaining + padBase.

A more realistic byte-oriented encoder will be discussed later in BYTES.md.

Decoder

A baseIdeo decoder has an associated code point string sc, which has a length property l. Its handler performs the following operations to restore the original bit-stream sb:

  1. Let i be 0.
  2. While l is greater than 0:

Byte-Orienated Procedures for baseIdeo Handling

o bbbbbbbb
----------
0 01234567
1 89ABCD
1       01
2 23456789
3 ABCD
3     0123
4 456789AB
5 CD
5   012345
6 6789ABCD
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment