Skip to content

Instantly share code, notes, and snippets.

@bjartur
Last active October 26, 2019 12:19
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bjartur/ea5db281f0b88128455ed79621abbd1d to your computer and use it in GitHub Desktop.
Save bjartur/ea5db281f0b88128455ed79621abbd1d to your computer and use it in GitHub Desktop.
Encode a ByteString to a braille string. Feel each bit in a textualization of an octet stream more compact than a hexdump.
This module exports functions for encoding lazy bytestrings as
lazy Unicode bytestrings of braille letters. Each braille letter
corresponds to one byte (octet) in such a way that each dot corresponds
to one input bit. The dots are arranged in two columns so that
the four least significant bits, correspond to dots in the left column,
a nd each dot correspond to a more significant bit than the dot above.
Five encodings of Unicode are implemented below,
UTF-8, which is invariant under endianness,
UTF-16, in little-endian and big-endian varieties, and
equivalently, UCS-2, in little-endian and big-endian varieties.
Since UTF-16 and UCS-2 agree on the encoding of characters
in the Basic Multilingual Plane, of which Braille characters are part,
only one implementation is defined for either endianness. In UTF-8,
each Braille letter is represented by three consecutive octets,
whereas the other encodings represent each letter by a 16-bit number,
0x2800–0x28FF, which then has to be serialized into two octets.
This is a literate Haskell source file. Run it using
stack runhaskell --package uuid Braille.lhs
or use it as a library depending on binary and bytestring.
> module Braille (toUTF8braille, Endianness(..), utf16, utf8) where
> import Data.Binary (encode, Word8)
> import Data.Bits ((.&.), (.|.), bit, complement, shiftR, shiftL, testBit, zeroBits)
> import Data.ByteString.Lazy (ByteString, cons, hPut, intersperse, map, pack, snoc, unpack)
> import Data.Function ((&))
> import Data.Functor ((<&>))
> import Data.UUID.V4 (nextRandom)
> import Prelude (IO, (++), (*), flip, putStrLn, repeat, reverse, sum, zipWith)
> import System.IO (stdout)
> utf16 :: Endianness-> ByteString-> ByteString
UTF-16LE is the encoding used internally by .NET and Java
on Intel processors and ARM processors in little-endian mode.
Since Braille letters are in the Basic Multilingual Plane,
their UTF-16 representation is only one 16 bit value.
Thus the UTF-16LE representation is two octets.
In UTF-16LE, the latter octet of a braille lettter is 40 (0x28).
The former octet will contain the octet to be braille-encoded
> utf16 LittleEndian = map cycle6543 <&> intersperse 40 <&> flip snoc 40
UTF-16BE is in "network byte-order." Big endianness is
the standard for data interchange, if endianness is required at all.
In UTF-16BE, the former octet of a braille letter is 40 (0x28).
The latter octet will contain the octet to be text-encoded
> utf16 BigEndian = map cycle6543 <&> intersperse 40 <&> cons 40
While bit order within a byte is completely abstracted away,
and code points are always encoded in order, care must be taken to
vary the order of octets encoding each code point by endianness.
Thankfully, only two orders are common, little-endian and big-endian.
> data Endianness = LittleEndian | BigEndian
Conventionally, Braille uses six dots in two columns
to represent up to 32 letters.
Dot Codepoint
--- ---------
0 3 4000 4003
1 4 4001 4004
2 5 4002 4005
But to represent all eight bits of an octet, we need eit dots.
Fortunately, Unicode has a couple of extra codepoints for an extra dot
in each column.
Dot Codepoint
--- ---------
0 3 4000 4003
1 4 4001 4004
2 5 4002 4005
6 7 4006 4007
Unfortunately, that means that means that increamenting a number
by 4000, whie resulting in a braille codepoint, gives an unmemorable and
unintuitive dot-bit correspondance. To make the less significant four bits
correspond to the left column of dots, with dots ordrered from top to bottom
in order of the significance of the corresponding bit, we need to rearrange
the bits around so that bit 3 gets the value of bit 4,
which gets the value of bit 5, which gets the value of bit 6,
which again gets the value of bit 3: 6 -> 5 -> 4 -> 3 -> 6
That way, we get the more orderly dot-bit correspondance.
Bit Dot codepoint
--- -------------
0 4 4000 4003
1 5 4001 4004
2 6 4002 4005
3 7 4006 4007
> cycle6543 :: Word8-> Word8
> cycle6543 octet = let rotate = flip shiftR 1 in
> (bit 0 .|. bit 1 .|. bit 2) .&. octet
> .|. (bit 3 .|. bit 4 .|. bit 5) .&. rotate octet
> .|. if testBit octet 3 then bit 6 else zeroBits
> .|. bit 7 .&. octet
| UTF-8 is described in RFC 3629.
> utf8 :: ByteString-> ByteString
> utf8 = unpack <&> toUTF8braille <&> pack
> toUTF8braille :: [Word8]-> [Word8]
> toUTF8braille octets = octets <&> cycle6543 & \adjusted->
> interleave
> (repeat octet1)
> (adjusted <&> octet2)
> (adjusted <&> octet3)
| In UTF-8 the 16 bits from above are split into three octets as follows:
The first octet needed to encode one braille letter in UTF-8 has
the three most significant bits set, and the fourth cleared,
to indicate that this codepoint is represented by three octets.
Yes, multi-octet sequence length is encoded in zero-terminated unary.
The less signifcant four bits encode the four most significant bits
of the UCS-2 representation of the character.
Since the most signifcant octet of the UCS-2 representation is
40 = 0x28 = 0x20 .|. 0x08 = 32 .|. 8
= 0b0010_1000
The less significant four bits of the first octet are 0,0,1,0.
> octet1 :: Word8
> octet1 = toBits [1,1,1,0] .|. shiftR 40 4
The second and third octets have the most significant bit set, and
the second most significant bit cleared, to indicate that they
continue a multi-octet sequence. Than there are the less significant
four bits of 40, 1,0,0,0. Finally, the most significant couple of bits
of the octet being braille-encoded.
> octet2 :: Word8-> Word8
> octet2 lessSignificantOctet = toBits [1,0] .|. (shiftL 40 4 & flip shiftR 2) .|. shiftR lessSignificantOctet 6
At last, the six least significant bits of the octet being braille-encoded.
> octet3 :: Word8-> Word8
> octet3 lessSignificantOctet = toBits [1, 0] .|. (shiftL lessSignificantOctet 2 & flip shiftR 2)
> bits :: [Word8]
> bits = [7,6..0] <&> bit
> toBits :: [Word8]-> Word8
> toBits = zipWith (*) bits <&> sum
This interleave function assumes that the second and third lists are finite and of equal length.
> interleave xs ys zs = interleave_ xs ys zs [] & reverse
> interleave_ :: [a]-> [a]-> [a]-> [a]-> [a]
> interleave_ (z:[]) _ [] acc = z:acc
> interleave_ (x:xs) ys zs acc = interleave_ ys zs xs (x:acc)
> main :: IO ()
> main = do
> uuid <- nextRandom <&> encode
> let littleEndian = uuid & utf16 LittleEndian
> let bigEndian = uuid & utf16 BigEndian
> hPut stdout ([0..5] ++ [15,25] & toUTF8braille & pack)
> putStrLn ""
> hPut stdout ([complement (bit 0 .|. bit 3), 3, 4, 16, 44, 87, 42] & pack & utf8)
> putStrLn ""
> hPut stdout (uuid & utf8)
> putStrLn ""
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment