mildsunrise/crc.md

## crc.md

      
    Raw
  

              crc.md
            
          
    Notes about the CRC

Note: this uses the functions of my polynomial_arithmetic library.
mathematical definition

CRC in its core is (I * x^N) mod P, where

I are the input bytes converted to a polynomial (bytes 12 53 become 0x1253 aka x^12 + x^9 + x^6 + x^4 + x + 1).
P is a polynomial that is a characteristic of the algorithm
N is the degree of P (and thus, the number of bits resulting from the CRC operation)
the resulting polynomial is returned as an N-bit integer

so it can be understood as:

two operations: polynomial multiplication by x^N, and polynomial modulus by P

because that multiplication is just a shift, people usually say CRC is just the modulus


one operation (multiplication by x^N) in the Z/2Z[x] / P ring

practical algorithms

popular, optimized CRC implementations aren't "just" the operation(s) explained above.

conceptually they do some preprocessing & postprocessing to the input, for various reasons:

to make implementations faster (by memoizing results, etc.)
historical reasons
to prevent leading 0 bytes in the input from mapping to the same CRC

in particular, the variations may be the following:

pre-conditioning: the register is initialized to 0xFFFFFFFF before processing each input byte.
this is another way of saying "we add some prefix A to the input so that the CRC of empty input is 0xFFFFFFFF"
post-conditioning: after processing each byte, we invert the register.
together with pre-conditioning, this results in a double-inversion that makes the CRC of empty input be 0
(but due to the prefix, zero bytes at the start do alter the result).
bit reversal: the input bytes have their bits reversed, and the output 32-bit number also has its bits reversed.

implementations may vary in these details even if they share the same polynomial.

so, to accurately define a CRC algorithm you not only need its polynomial P, but which of these variations it uses.
an example: CRC-32

most CRC-32 algorithms do all three of those.

an (inefficient) implementation of Zlib's CRC-32 that makes mathematical interpretation more explicit is:
# functions to reverse the bits of an integer
rev32 = lambda x: int('{:032b}'.format(x)[::-1], 2)
rev8 = lambda x: int('{:08b}'.format(x)[::-1], 2)

# in optimized implementations the 32th bit is omitted,
# and sometimes the reversed and/or reciprocal is written instead
POLYNOMIAL = 0x104C11DB7

# calculate the appropriate prefix so that we get an 0xFFFFFFFF modulus for an empty input polynomial
INV = p_mod_inv(1 << 32, POLYNOMIAL)
PREFIX = p_mod_mul(0xFFFFFFFF, INV, POLYNOMIAL)  # 0x46af6449

# this function should be equivalent to zlib.crc32
def crc32(input: bytes) -> int:
    # bytes have their bits reversed
    input = bytes(map(rev8, input))
    # convert the bytes to a polynomial, and prepend the prefix
    input = int.from_bytes(input, 'big') ^ (PREFIX << (len(input) * 8))
    # the polynomial modulus!
    crc = p_mod(input, POLYNOMIAL)
    # post-processing: invert and reverse the bits
    return rev32(crc ^ 0xFFFFFFFF)
inverting the CRC-32

a toy example that demonstrates an operation we may want to do is invert the CRC-32.
since the CRC-32 uses a prime polynomial, Z/2Z[x] / P is a field.

this means multiplication is always invertible (and has exactly one solution).

since the preprocessing & postprocessing are also bijections, the whole algorithm is a bijection of elements in the field.
possible 4-byte inputs correspond to distinct elements of the field, therefore each CRC value corresponds to 1 (and only 1) 4-byte input.
the implementation, generalized to n-byte inputs (for n <= 4), is as follows:
def reverse_crc32(crc: int, size: int) -> bytes:
    assert size <= 4

    # undo post-processing
    crc = rev32(crc ^ 0xFFFFFFFF)
    # multiply by the inverse to undo CRC and get input polynomial
    input = p_mod_mul(residual, INV, POLYNOMIAL)
    # subtract the prefix that was added to the input polynomial
    input ^= p_mod(PREFIX << (size * 8), POLYNOMIAL)
    # convert input polynomial to bytes
    return bytes(map(rev8, block.to_bytes(size, 'big')))