ahaldane/numpy_coercion.rst

## numpy_coercion.rst

      
    Raw
  

              numpy_coercion.rst
            
          
Using C-style type coercion rules in Numpy

This document explores using C-like type-coercion for + - * // in numpy.
Motivation: Currently, when two dtypes are involved in a binary operation numpy's principle is that "the output dtype's range covers the range of both input dtypes", and when a single dtype is involved there is never any cast. One often-surprising consequence of this is that "np.uint64 + np.int64" gives an "np.float64". This is different from C-style coercion.  The current numpy coercion rules lead to unexpected behaviors like this one, which we often get questions about on github and the mailing list. See the issues collected in numpy/numpy#12525
Why switch to C-style coercion specifically? Because numpy is written in C and is designed around lowlevel C types like uint8, uint32, float64, etc, and the C language has already defined coercion rules for these types.  C-style coercion has gone through 60 years of trial by fire and it's well known and discussed, and the C-standard was probably well debated. The C abstract machine's behavior largely mirrors typical hardware behavior for these types, so using C-coercion should be efficient. Also, C-style coercion is associative, unlike our current coercion rules.
A description of C coercion is here: https://www.oreilly.com/library/view/c-in-a/0596006977/ch04.html

Basic Coercion

Below are proposed C-style conversion rules for a binary operation, like a*b, where a and b can be numpy types or python types. Note that numpy implements types and operations that do not exist in C, such as the half (float16) and bool types and power (**) and float-divide (/ in python3) operations, which is why I call this "C-like coercion" rather than just "C coercion".

If both a and b are numpy types

We define a ranking bool < integer < floating. Within each rank there are sub-ranks depending on the size of the type (in bytes), where larger sizes have larger sub-rank.  Additionally, integer types can be either signed or unsigned, and complex types are treated as "qualified" float types (see below), meaning that float32 and complex64 (made of 2x float32) have the same rank&subrank, and complex64 counts as float32 with the "complex" qualifier. Then the rules are:


If a and b are at different rank or sub-rank use the higher one's rank&subrank.
For floating: If either operand has the complex qualifier, the output has the complex qualifier.
For Integers: Rule 1 left out integer types of the same size (sub-rank) but of different signedness. If the signed type is large enough to contain all values of the unsigned type, use the signed type. Otherwise use the unsigned type.


The full proposed ranking is then:
bool < int8 < uint8 < int16 < uint16 < int32 < uint32 < int64 < uint64 < float16/complex32 < float32/complex64 < float64/complex128 < float128/complex256
Potential unexpected behavior: float16 + uint64 -> float16, even though float16 cannot represent all values of uint64 as finite values.
The complex qualifier: In C99 there is a 'complex' type qualifier which works such that (float complex) * (double) will cast the arguments to (double complex). So 'complex' acts as a qualifier to the float types, rather than a separate type, and this qualifier gets added to both operands if present.

If a is a numpy type and b is a python scalar

This is for cases like np.float32(4.) * 2..
This situation does not really have an analogue in C, so we have more freedom to make up our own rules.  (Possibly, the python scalar might be viewed as playing a role similar to a literal in C, but in C literals have suffixes specifying their type which is not the case in python.)
One scenario which would be nice to have "work" is for porting lowelevel C code to numpy. An example is a CRC check function. Compare the "correct" version in current numpy (left), vs what works in C and would be nice to just cut-n-paste (right):
def CRC(crc : uint16, onech : uint16)              def CRC(crc : uint16, onech : uint16)
    ans = uint32(crc ^ onech << uint16(8))            ans = uint32(crc ^ onech << 8)
    for ind in range(8):                              for ind in range(8):
        if ans & uint32(0x8000):                          if ans & 0x8000:
          ans <<= uint32(1)                                 ans <<= 1
          ans = ans ^ uint32(4129)                          ans = ans ^ 4129
        else:                                             else:
          ans <<= uint32(1)                                 ans <<= 1


Strategy 1

One possible behavior is that python types would always map to a particular numpy type, eg: python floats are always float64, and ints are always int64, etc.
Downside is it forces users to write a lot of code like the left CRC above.

Strategy 2

A more interesting possibility is to to allow numpy's types to override python types, so that for example np.float32(2)*1. gives a float and not a double. Here are the potential rules:
The python types follow the same ranking of bool < int < float/complex, and there are no sub-ranks. Then:


If b's type is at a higher rank, use the "default" numpy equivalent type, which is defined to be bool_, int64, and float64/complex128 respectively for each rank.
If b is at the same rank or less as a, cast it to a's type, no matter the sub-ranks.
Add the complex qualifier if b is complex.


A downside of this strategy is that it is not associative: np.float32(1) + 1.0 + np.uint32(4) -> float32 is different from np.float32(1) + (1.0 + np.uint32(4)) -> float64

Other Details


Inplace Operations

For operations like a += b, we will continue to raise TypeErrors if casting is needed, as numpy currently does.

Scalars vs arrays

In current numpy, ndarray + pythonscalar seems to coerce differently than numpyscalar + pythonscalar. I propose to make coercion identical for ndarrays and numpy-scalars.

Power operation

Python and numpys ** operator does not exist in C. How should it work?
This needs some more thought and discussion, as we have had multiple long mailing list discussions on this topic. There are issues with negative powers of integers.  I remember the suggestion int**int might want to cast to float by default since numpy ints aren't arbitrary precision, in which case we would want a separate np.int_power function.

Transcendnetal functions and /

Transcendental functions cast ints to float, as does python3's / binary operator. In these cases the "default" float type of float64 is used for integer/bool inputs.

undefined behavior with signed integers

Incidentally, +,- and * can overflow, as can shift operations and "abs(INT_MIN)". Signed overflow is undefined behavior in C. Ideally, a C program should never cause undefined behavior since the compiler can feel free to do whatever it wants. Yet, numpy allows signed overflow to occur (causing bugs #2449, #5018, for example), which might be considered a bug. Gcc occasionally does pretty aggressive optimizations in the assumption that signed overflow never occurs, and infamously will even optimize out branches, such as branches meant to check for overflow:
http://www.airs.com/blog/archives/120
https://gcc.gnu.org/ml/gcc-help/2011-07/msg00219.html
I think the inner loops of numpy ufuncs are probably not going to be mangled on "most" platforms for most compilers, but in principle we shouldn't be doing that.
Gcc has some advice:
https://www.gnu.org/software/autoconf/manual/autoconf-2.63/html_node/Signed-Overflow-Advice.html
We could protect from signed add/mul overflow by using unsigned arithmetic, for which overflow is defined behavior. That is, "view" the ints as uints using pointers and then do the operation. The result, viewed as a signed int, is correct. I.e., the int32 inner loop should use the uint32 inner loop. The shift overflow and abs behavior might need a range check (assuming we use C's abs?).