pervognsen/range_vs_arith.txt

## range_vs_arith.txt
Let's study the similarities between two kinds of entropy coders.

ZPAQ-style binary arithmetic coder (taken from Fabian's mini_arith repository):

    int decode(uint32_t prob)
    {
        int bit;

        // Midpoint of active probability interval subdivided via prob
        uint32_t x = lo + ((uint64_t(hi - lo) * prob) >> kProbBits);

        if (code <= x)
        {
            hi = x;
            bit = 1;
        }
        else
        {
            lo = x + 1;
            bit = 0;
        }

        // Renormalize
        while ((lo ^ hi) < (1u << 24))
        {
            code = (code << 8) | bytes[read_pos++];
            lo <<= 8;
            hi = (hi << 8) | 0xff;
        }

        return bit;
    }

Binary range coder (taken from Fabian's kkrunchy depacker):

    static sBool DecodeBit(sInt index,sInt move)
    {
      sU32 bound;
      sBool result;

      // decode
      bound = (st.Range >> 11) * st.Model[index];
      if(st.Code < bound)
      {
        st.Range = bound;
        st.Model[index] += (2048 - st.Model[index]) >> move;
        result = sFALSE;
      }
      else
      {
        st.Code -= bound;
        st.Range -= bound;
        st.Model[index] -= st.Model[index] >> move;
        result = sTRUE;
      }

      // renormalize
      if(st.Range < 0x01000000U)
      {
        st.Code = (st.Code << 8) | *st.Src++;
        st.Range <<= 8;
      }

      return result;
    }

(Fabian didn't invent either of these algorithms, but his implementations are clean, practical examples.)

These have evident similarities, but let's rewrite them step by step to reveal the exact relationship.

As a first step, we'll just remove the model indexing and model update (these are factored out in mini_arith):

    static sBool DecodeBit(sU32 prob)
    {
      sU32 bound;
      sBool result;

      // decode
      bound = (st.Range >> 11) * prob;
      if(st.Code < bound)
      {
        st.Range = bound;
        result = sFALSE;
      }
      else
      {
        st.Code -= bound;
        st.Range -= bound;
        result = sTRUE;
      }

      // renormalize
      if(st.Range < 0x01000000U)
      {
        st.Code = (st.Code << 8) | *st.Src++;
        st.Range <<= 8;
      }

      return result;
    }

With that done, take a look at mini_arith's decode() again:

    uint32_t x = lo + ((uint64_t(hi - lo) * prob) >> kProbBits);

    if (code <= x)
    {
        hi = x;
        bit = 1;
    }
    else
    {
        lo = x + 1;
        bit = 0;
    }

Let's "change basis" from hi and lo to lo and range = hi - lo:

    uint32_t range2 = ((uint64_t)range * prob) >> kProbBits;
    if (code <= lo + range2)
    {
        range = range2;
        bit = 1;
    }
    else
    {
        lo += range2 + 1;
        range -= range2;
        bit = 0;
    }

Note that code <= lo + range2 is equivalent to code - lo <= range2, so by making an additional change of
basis, we can absorb the lo subtraction into code itself, eliminating lo entirely:

    uint32_t range2 = ((uint64_t)range * prob) >> kProbBits;
    if (code <= range2)
    {
        range = range2;
        bit = 1;
    }
    else
    {
        code -= range2 + 1;
        range -= range2;
        bit = 0;
    }

At this point the only notable difference is <= vs < and the + 1, but <= range2 + 1 is equivalent to < range2.

While this demonstrates an isomorphism between their decode logic, the renormalization logic is not
invariant under the change of basis. In particular, in the arithmetic coder, it's possible for hi and lo
to not have a common most significant byte even though range = hi - lo is quite small (as small as 1).
The consequence when that happens is that your model temporarily loses some bits of precision, degenerating
in the extreme case where hi - lo = 1 to a uniform model where 0 and 1 are equiprobable, even though your
model might want to make a much more skewed prediction.

So from this perspective, the range coder looks strictly superior, though the loss of precision with the
binary arithmetic coder is rare enough, and short term, so it probably doesn't matter in practice. But what,
if anything, am I missing here? Why isn't the range coder variant always used? One thing that comes to mind
is that the range coder's assumption that it can subtract from code makes it harder to interleave with other
coders since code no longer represents a raw prefix of the bitstream.

I'd love for compression experts to weigh in and clarify this relationship for me!

Note: I didn't prove the program transformations did not introduce integer overflow or wraparound. So let's
check: first, range >= range2 since range2 is a fraction of range, and code > range2 since the else-branch is
associated with the negation of the code <= range2 if-condition. Therefore the else-branch's subtractions
are safe from wraparound.
	Let's study the similarities between two kinds of entropy coders.

	ZPAQ-style binary arithmetic coder (taken from Fabian's mini_arith repository):

	int decode(uint32_t prob)
	{
	int bit;

	// Midpoint of active probability interval subdivided via prob
	uint32_t x = lo + ((uint64_t(hi - lo) * prob) >> kProbBits);

	if (code <= x)
	{
	hi = x;
	bit = 1;
	}
	else
	{
	lo = x + 1;
	bit = 0;
	}

	// Renormalize
	while ((lo ^ hi) < (1u << 24))
	{
	code = (code << 8) \| bytes[read_pos++];
	lo <<= 8;
	hi = (hi << 8) \| 0xff;
	}

	return bit;
	}

	Binary range coder (taken from Fabian's kkrunchy depacker):

	static sBool DecodeBit(sInt index,sInt move)
	{
	sU32 bound;
	sBool result;

	// decode
	bound = (st.Range >> 11) * st.Model[index];
	if(st.Code < bound)
	{
	st.Range = bound;
	st.Model[index] += (2048 - st.Model[index]) >> move;
	result = sFALSE;
	}
	else
	{
	st.Code -= bound;
	st.Range -= bound;
	st.Model[index] -= st.Model[index] >> move;
	result = sTRUE;
	}

	// renormalize
	if(st.Range < 0x01000000U)
	{
	st.Code = (st.Code << 8) \| *st.Src++;
	st.Range <<= 8;
	}

	return result;
	}

	(Fabian didn't invent either of these algorithms, but his implementations are clean, practical examples.)

	These have evident similarities, but let's rewrite them step by step to reveal the exact relationship.

	As a first step, we'll just remove the model indexing and model update (these are factored out in mini_arith):

	static sBool DecodeBit(sU32 prob)
	{
	sU32 bound;
	sBool result;

	// decode
	bound = (st.Range >> 11) * prob;
	if(st.Code < bound)
	{
	st.Range = bound;
	result = sFALSE;
	}
	else
	{
	st.Code -= bound;
	st.Range -= bound;
	result = sTRUE;
	}

	// renormalize
	if(st.Range < 0x01000000U)
	{
	st.Code = (st.Code << 8) \| *st.Src++;
	st.Range <<= 8;
	}

	return result;
	}

	With that done, take a look at mini_arith's decode() again:

	uint32_t x = lo + ((uint64_t(hi - lo) * prob) >> kProbBits);

	if (code <= x)
	{
	hi = x;
	bit = 1;
	}
	else
	{
	lo = x + 1;
	bit = 0;
	}

	Let's "change basis" from hi and lo to lo and range = hi - lo:

	uint32_t range2 = ((uint64_t)range * prob) >> kProbBits;
	if (code <= lo + range2)
	{
	range = range2;
	bit = 1;
	}
	else
	{
	lo += range2 + 1;
	range -= range2;
	bit = 0;
	}

	Note that code <= lo + range2 is equivalent to code - lo <= range2, so by making an additional change of
	basis, we can absorb the lo subtraction into code itself, eliminating lo entirely:

	uint32_t range2 = ((uint64_t)range * prob) >> kProbBits;
	if (code <= range2)
	{
	range = range2;
	bit = 1;
	}
	else
	{
	code -= range2 + 1;
	range -= range2;
	bit = 0;
	}

	At this point the only notable difference is <= vs < and the + 1, but <= range2 + 1 is equivalent to < range2.

	While this demonstrates an isomorphism between their decode logic, the renormalization logic is not
	invariant under the change of basis. In particular, in the arithmetic coder, it's possible for hi and lo
	to not have a common most significant byte even though range = hi - lo is quite small (as small as 1).
	The consequence when that happens is that your model temporarily loses some bits of precision, degenerating
	in the extreme case where hi - lo = 1 to a uniform model where 0 and 1 are equiprobable, even though your
	model might want to make a much more skewed prediction.

	So from this perspective, the range coder looks strictly superior, though the loss of precision with the
	binary arithmetic coder is rare enough, and short term, so it probably doesn't matter in practice. But what,
	if anything, am I missing here? Why isn't the range coder variant always used? One thing that comes to mind
	is that the range coder's assumption that it can subtract from code makes it harder to interleave with other
	coders since code no longer represents a raw prefix of the bitstream.

	I'd love for compression experts to weigh in and clarify this relationship for me!

	Note: I didn't prove the program transformations did not introduce integer overflow or wraparound. So let's
	check: first, range >= range2 since range2 is a fraction of range, and code > range2 since the else-branch is
	associated with the negation of the code <= range2 if-condition. Therefore the else-branch's subtractions
	are safe from wraparound.