Note: this uses the functions of my polynomial_arithmetic
library.
CRC in its core is (I * x^N) mod P
, where
I
are the input bytes converted to a polynomial (bytes 12 53 become 0x1253 aka x^12 + x^9 + x^6 + x^4 + x + 1).P
is a polynomial that is a characteristic of the algorithmN
is the degree ofP
(and thus, the number of bits resulting from the CRC operation)- the resulting polynomial is returned as an N-bit integer
so it can be understood as:
- two operations: polynomial multiplication by
x^N
, and polynomial modulus byP
- because that multiplication is just a shift, people usually say CRC is just the modulus
- one operation (multiplication by
x^N
) in theZ/2Z[x] / P
ring
popular, optimized CRC implementations aren't "just" the operation(s) explained above.
conceptually they do some preprocessing & postprocessing to the input, for various reasons:
- to make implementations faster (by memoizing results, etc.)
- historical reasons
- to prevent leading 0 bytes in the input from mapping to the same CRC
in particular, the variations may be the following:
- pre-conditioning: the register is initialized to 0xFFFFFFFF before processing each input byte. this is another way of saying "we add some prefix A to the input so that the CRC of empty input is 0xFFFFFFFF"
- post-conditioning: after processing each byte, we invert the register. together with pre-conditioning, this results in a double-inversion that makes the CRC of empty input be 0 (but due to the prefix, zero bytes at the start do alter the result).
- bit reversal: the input bytes have their bits reversed, and the output 32-bit number also has its bits reversed.
implementations may vary in these details even if they share the same polynomial.
so, to accurately define a CRC algorithm you not only need its polynomial P, but which of these variations it uses.
most CRC-32 algorithms do all three of those.
an (inefficient) implementation of Zlib's CRC-32 that makes mathematical interpretation more explicit is:
# functions to reverse the bits of an integer
rev32 = lambda x: int('{:032b}'.format(x)[::-1], 2)
rev8 = lambda x: int('{:08b}'.format(x)[::-1], 2)
# in optimized implementations the 32th bit is omitted,
# and sometimes the reversed and/or reciprocal is written instead
POLYNOMIAL = 0x104C11DB7
# calculate the appropriate prefix so that we get an 0xFFFFFFFF modulus for an empty input polynomial
INV = p_mod_inv(1 << 32, POLYNOMIAL)
PREFIX = p_mod_mul(0xFFFFFFFF, INV, POLYNOMIAL) # 0x46af6449
# this function should be equivalent to zlib.crc32
def crc32(input: bytes) -> int:
# bytes have their bits reversed
input = bytes(map(rev8, input))
# convert the bytes to a polynomial, and prepend the prefix
input = int.from_bytes(input, 'big') ^ (PREFIX << (len(input) * 8))
# the polynomial modulus!
crc = p_mod(input, POLYNOMIAL)
# post-processing: invert and reverse the bits
return rev32(crc ^ 0xFFFFFFFF)
a toy example that demonstrates an operation we may want to do is invert the CRC-32.
since the CRC-32 uses a prime polynomial, Z/2Z[x] / P
is a field.
this means multiplication is always invertible (and has exactly one solution).
since the preprocessing & postprocessing are also bijections, the whole algorithm is a bijection of elements in the field.
possible 4-byte inputs correspond to distinct elements of the field, therefore each CRC value corresponds to 1 (and only 1) 4-byte input.
the implementation, generalized to n-byte inputs (for n <= 4), is as follows:
def reverse_crc32(crc: int, size: int) -> bytes:
assert size <= 4
# undo post-processing
crc = rev32(crc ^ 0xFFFFFFFF)
# multiply by the inverse to undo CRC and get input polynomial
input = p_mod_mul(residual, INV, POLYNOMIAL)
# subtract the prefix that was added to the input polynomial
input ^= p_mod(PREFIX << (size * 8), POLYNOMIAL)
# convert input polynomial to bytes
return bytes(map(rev8, block.to_bytes(size, 'big')))