Skip to content

Instantly share code, notes, and snippets.

@mildsunrise
Created May 3, 2022 17:43
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mildsunrise/387e908ac7b18b8b2314768d4be72e5d to your computer and use it in GitHub Desktop.
Save mildsunrise/387e908ac7b18b8b2314768d4be72e5d to your computer and use it in GitHub Desktop.
Notes about the CRC

Notes about the CRC

Note: this uses the functions of my polynomial_arithmetic library.

mathematical definition

CRC in its core is (I * x^N) mod P, where

  • I are the input bytes converted to a polynomial (bytes 12 53 become 0x1253 aka x^12 + x^9 + x^6 + x^4 + x + 1).
  • P is a polynomial that is a characteristic of the algorithm
  • N is the degree of P (and thus, the number of bits resulting from the CRC operation)
  • the resulting polynomial is returned as an N-bit integer

so it can be understood as:

  • two operations: polynomial multiplication by x^N, and polynomial modulus by P
    • because that multiplication is just a shift, people usually say CRC is just the modulus
  • one operation (multiplication by x^N) in the Z/2Z[x] / P ring

practical algorithms

popular, optimized CRC implementations aren't "just" the operation(s) explained above.
conceptually they do some preprocessing & postprocessing to the input, for various reasons:

  • to make implementations faster (by memoizing results, etc.)
  • historical reasons
  • to prevent leading 0 bytes in the input from mapping to the same CRC

in particular, the variations may be the following:

  • pre-conditioning: the register is initialized to 0xFFFFFFFF before processing each input byte. this is another way of saying "we add some prefix A to the input so that the CRC of empty input is 0xFFFFFFFF"
  • post-conditioning: after processing each byte, we invert the register. together with pre-conditioning, this results in a double-inversion that makes the CRC of empty input be 0 (but due to the prefix, zero bytes at the start do alter the result).
  • bit reversal: the input bytes have their bits reversed, and the output 32-bit number also has its bits reversed.

implementations may vary in these details even if they share the same polynomial.
so, to accurately define a CRC algorithm you not only need its polynomial P, but which of these variations it uses.

an example: CRC-32

most CRC-32 algorithms do all three of those.
an (inefficient) implementation of Zlib's CRC-32 that makes mathematical interpretation more explicit is:

# functions to reverse the bits of an integer
rev32 = lambda x: int('{:032b}'.format(x)[::-1], 2)
rev8 = lambda x: int('{:08b}'.format(x)[::-1], 2)

# in optimized implementations the 32th bit is omitted,
# and sometimes the reversed and/or reciprocal is written instead
POLYNOMIAL = 0x104C11DB7

# calculate the appropriate prefix so that we get an 0xFFFFFFFF modulus for an empty input polynomial
INV = p_mod_inv(1 << 32, POLYNOMIAL)
PREFIX = p_mod_mul(0xFFFFFFFF, INV, POLYNOMIAL)  # 0x46af6449

# this function should be equivalent to zlib.crc32
def crc32(input: bytes) -> int:
    # bytes have their bits reversed
    input = bytes(map(rev8, input))
    # convert the bytes to a polynomial, and prepend the prefix
    input = int.from_bytes(input, 'big') ^ (PREFIX << (len(input) * 8))
    # the polynomial modulus!
    crc = p_mod(input, POLYNOMIAL)
    # post-processing: invert and reverse the bits
    return rev32(crc ^ 0xFFFFFFFF)

inverting the CRC-32

a toy example that demonstrates an operation we may want to do is invert the CRC-32.

since the CRC-32 uses a prime polynomial, Z/2Z[x] / P is a field.
this means multiplication is always invertible (and has exactly one solution).
since the preprocessing & postprocessing are also bijections, the whole algorithm is a bijection of elements in the field.

possible 4-byte inputs correspond to distinct elements of the field, therefore each CRC value corresponds to 1 (and only 1) 4-byte input.

the implementation, generalized to n-byte inputs (for n <= 4), is as follows:

def reverse_crc32(crc: int, size: int) -> bytes:
    assert size <= 4

    # undo post-processing
    crc = rev32(crc ^ 0xFFFFFFFF)
    # multiply by the inverse to undo CRC and get input polynomial
    input = p_mod_mul(residual, INV, POLYNOMIAL)
    # subtract the prefix that was added to the input polynomial
    input ^= p_mod(PREFIX << (size * 8), POLYNOMIAL)
    # convert input polynomial to bytes
    return bytes(map(rev8, block.to_bytes(size, 'big')))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment