seberg/array-types.rst

## array-types.rst

      
    Raw
  

              array-types.rst
            
          
    Sebastian's Philosophy of the Array Object Types and DTypes

This object is solely for me to organize my thoughts, it probably is a horrific
thing if seen by a theoretical computer scientist, and gets a lot of the
terminalogy wrong.
Type probably usually means a theoretical type, which I think is defined as the
set of all possible instances.
These are many concepts that many smarter people probably have written down
before, so...

The Array object

The array object at its core is a python type which signals that operations
are generalized elementwise operations (array-programming), unlike the normal
Python operations which are scalar in nature.
We can thus argue that the NumPy array serves as the most common:
HomogeneousElementwiseContainer(ElementwiseContainer)

abstract type (the elementwise container could also be called an ensemble, see next paragraph).
An object, that due to its array-programming nature, breaks some
typical qualities of typical (scalar) python objects. For example it cannot define
bool(), and == must return a new array of bools.
It may be prudent to think of a NumPy array as an ensemble of scalars/elements.
For a classical container, the programmer thinks of a container object which contains elements.
The array programmer thinks the opposite way: The elements are the fundamental objects,
the "container" is just a name given to the ensemble of elements.
The array programmer looks from the inside to the outside:
They work on an ensemble of elements (without knowledge of how many or how they are
organized/structure!) most of the time and only look outside to the container when the organization
of the container is interesting.
This ensemble s similar for example to a random variable, it is possible to do math with a random variable
(the ensemble of all possible realizations of the random variable), without thinking of any specific
drawn random number. Ensembles of realizations in (statistical) Physics are a similar concept.
The actual behaviour of the NumPy array is that of an N-dimensional, homogeneously
shaped, integer indexed. Which further defines certain operations (such as
reductions).
It also adds an additional, possibly refinement, property to the above definition:
BroadcastingContainer(HomogeneousElementWiseContainer)

in that NumPy arrays have a logic which means that they will broadcast during
elementwise operations. The way NumPy broadcasts has specific rules which could
be refined in its own abstract container class.
Broadcasting itself probably requires a few small theoretical rules to define,
but they are not of much consequence here: NumPy broadcasting is well accepted
currently, and generalizes easily for example to labelled axis (to be not very
theoretical).
The method by which NumPy arrays are indexed or broadcast, are not very relavent
in practice. It is sufficient to take away that:
Typical caculations involving arrays should work on many elements at once

for speed reasons.
And that this requires to implement computational kernels which work on many
homogeneous elements.
In practice these kernels may be specific to NumPy arrays,
and for example make use of its strided memory layout.
The exact work that such a kernel is expected to do, and who defines it
is in itself a more complex topic discussed in the below Kernels section.
I will also use array-like to identify objects that have compatible syntax
to NumPy, which implies that they are at least close to a
HomogeneousElementwiseContainer.

Mutability and Views

The ElementwiseContainer here is mutable, but could be generalized to
distinguish "frozen" and "mutable" versions.
Especially views are an important additional component of the array object
in practice, and is related to mutability.
Since the array object itself is mutable and has view semantics, changes
can affect views even when the elements itself are considered immutable.
This is most clear for a NumPy array containing Python immutable python objects.

Array of Arrays

One complication (and due to the way NumPy works a very real one) is that
array programming does not generalize well to arrays of arrays.
There is a general issue that an array of arrays always has to assume that
another operand is interpreted as an array and not as a scalar.
An array of arrays thus cannot have a scalar corresponding to its elements.
In NumPy this is also a problem for non-array containers.
Even though they use scalar semantics in Python, they are considered
array-likes by NumPy and coerced greedily.

HomogeneousElementwiseContainer Compared to Python Containers

As noted the NumPy array is a type which is distinct from most Python types
due to its property of indicating array-programming, elementwise (operator)
behaviour.
NumPy arrays are both homogeneous and elementwise. So how does this compare
to other python container types?
The homogeneous part is only relevant mainly as a contract to users:
all elements are described by the same scalar type.
Note that even Python lists are homogeneous (they contain only python objects),
but lack the ability to provide the user with a more strict contract of the
elements type.
The python array.array actually allows for a homogeneous container, which
is similar to C++ and other languages template (container) classes.
It is important to note that the above defined HomogenousElementwiseContainer
does, however, not include Python lists or even Python arrays.
With respect to array-programming principles,
Python sequences have a scalar behaviour contract for operations such as
==, bool(), or +.
These operations or methods may use the elements during their operations, but
the result makes statements or modifications to the container itself:


Is the container equal to the other container? This implies elements
are equal, but does provide information for each individual element.
Is the sequence "truthy" (i.e. not empty)? As opposed to whether each
element included is truthy.
+ returns a new sequence that combines/concatenates both sequences.
It thus mutates the container object, and again, not the elements.


To name just a few. Other operators such as operator.integral() or and
are inherently scalar, and usually error when used on NumPy arrays.
(Arguably, some of these should possibly be limited further.)
The important thing to note is, that in principle == could have scalar contract,
while a new elementwise== would have to exist so that both scalar and array-programming
could be defined on a single container.
Some languages designed for array-programming do this.
Python does not wish to do this, so the NumPy array acts as a type which
indicates a different contract for operator definition.
There some cases where this breaks expectations, since users cannot
easily know that they are dealing with a NumPy array and thus have a different
contract of operation for most operators.
Code written for generic Python containers will rarely generalize correctly
to arrays.
Short of making a full new set of operators, the only solution I can think
of is adding a new ElementwiseContainer abstract base class to Python.
This would users that run into issues with the broken contract of array-programming
(such as == returning something with a "truthyness" defined), have a fighting
chance to detect what is going on.
(Note that in practice e.g. not a number (NaN) violate expectations
of the == operator in different but comparable ways.)
Certainly some of these violations may be design flaws of the NumPy array.
For example, the NumPy array iterates over the first axis instead of all elements
by default (SymPy does it differently for example), thus mimicking a list of
lists rather, while NumPy could instead mimic for example a mapping.
However, the general convenience of elementwise operators must be accepted
as a core design principle of NumPy.

Scalars, Array Scalars, Scalar Arrays, and pure Array Programming

The is an important distinction to note beforehand, that NumPy arrays are
considered to have mutable content, but not mutable in their container properties
(although this is not strictly true, this is problematic and can leas to bugs).
Mutability thus refers to mutable content for arrays.
There are some core concepts regarding scalars and arrays, which are
important in NumPy (and limit future choices somewhat).
They delineate the (typical) python scalar, NumPy array scalar
(which are immutable but mimick an array),
scalar array (zero dimensional array), and a general N-dimensional array.
The following concepts are important:
Mutability:

Arrays are typically mutable independent of whether the elements themselves
are considered mutable. Due to view semantics, this mutability affects
not just the object itself.
Scalars (in python) are typically immutable objects, this also means
that scalars are hashable.
The arrays content also inherits the mutability of its elements (scalars).
Just like a tuple containing a mutable object can effectively change
(the container can only be immutable if all its elements are).

Mutability is important, because a zero dimensional array is typically not
mutable, while Python scalars typically are.
Thus, scalar arrays, unlike scalars, are mutable objects and cannot be used
for example as dictionary keys.
Although for example astropy.unit.Quantity ignores this property (it does
not define a scalar or array scalar).
Scalar vs. Elementwise Array-Programming
Some operations – mainly bool(), int() – are generally not defined on an array,
because it is not consistent for a general number of elements.
NumPy currently defines bool() for scalar arrays and array scalars, since
there is only one element .any() and .all() always match in meaning.
Thus, scalar arrays have some freedom to behave like scalars.
Although, arguably, they break the common principle of other "scalar"
Python containers, which define bool() based on being empty or not.
The definition of bool() is possible, the main question is whether
it should be happy to discard the container information, when it normally
is not.
(I personally think, that an ideal NumPy would never allow bool() on
the container. And I believe that users would not even notice for the most
part. While it can be defined, it also discards container information, thus,
if I were to add this, I would want to know that it causes real pain in code.)
#### Discussion
I personally would argue that only scalars and arrays are sensible concepts
and NumPy should strive to remove array scalars and scalar arrays as special
cases.
##### Scalar arrays
I believe that scalar arrays behaviour serves no purpose, except the appearance
of convenience with little proof of it being true.
They seemingly ignore their container properties, which should require very good
motivation.
We can (and are) trying to slowly remove some special cases, but historically they
exist and I have to admit that these choice seems good on first sight.
I.e. if you try to implement scalar (typical Python) behaviour as much as possible,
you end up with current choices.
However, I would argue that considering the limit of defining this behaviour,
the default choice should be to refuse to implement them.
(I could imagine that at the time, scalars did not exist for all dtypes, in
which case the need for "making things work" is much higher.)
##### Array Scalars
Array scalars are more complex. The main reason they exist is for speed
enhancement, I believe.
NumPy currently converts 0-D arrays automatically to array scalars in many
operations (implicitly making these objects immutable and thus discarding some
container properties!).
To make up for this deficiency, array scalars then pretend to be full array
so that most users will not notice the difference.
There is probably some agreement that ideally array scalars would not exist,
at least with respect to them pretending to be an array/container object.
One question that has never been quite decided is whether true
Python scalars should exist for all NumPy dtypes.
Personally, I feel that, yes, they should exist!
There should be int64 scalars, that behave like a single element in
a NumPy array typed as int64.
It seems natural that if you have a container where you can assign a single
value to a certain position, then you should be able to extract that element
again as a scalar.
Scalars should *not* exist:
From an array-programming perspective, scalars do not need to exist.
In that sense, arrays are not really containers, everything is an array
in the sense that everything uses array-programming semantics.
At that point "extracting an element" is not a useful concept anymore
as such.
The array is not a container, it is a language concept.
The above probably leads to the cleanest possible semantics for array programming.
As soon as NumPy (an array) is involved, all objects obey array-programming
semantics.
There are two problems with this when adding it to a language like Python:

As said above array content is mutable. This is actually specific to
NumPy, and not a requirement (arrays could in theory be copy-on-write).
Having no scalars thus means that the use as dictionary keys is clumsy
(it might require a frozenarray).
NumPy builds on top of Python, and Python objects are not array objects.
Assignment to the array is typically done by using a Python scalar which
feels like a container, when it actually is a coercion to an array and only
then an assignment. This means that extracting a Python scalar can only
exist as a special method arr.item() and not by other means.

I my opinion the advantage of this view is the automatic conversion to arrays
in many functions, since currently np.add(scalar, scalar) is the same as
np.add(np.array(scalar), np.array(scalar)).
Which is different from operators, where scalar + scalar lives in the scalar
Python world.
As such this is not a problem: Functions can choose to be array-programming only.
The problem arises due to the fact that we currently return array scalars when
the input is zero dimensional, so that users may rely on a scalar result for
scalar inputs.
It may also be more useful semantics: You do not silently lose the scalars
immutability advantage. Note that an immutable array class could already
achieve this, but for scalars we choose to not allow it.
I personally believe that scalars are a good idea.
Users starting with Python are used to them, and all Python objects are
scalars, so thinking of arrays as containers is a much easier concept!
Further, with view semantics and mutable arrays, there is no good way
to have immutable and hashable scalars for example to index dictionaries
with.
There is often the notion that the current NumPy scalars (array scalars)
duplicate behaviour and are a huge burden on maintenance.
I believe that because of this, many would prefer to have no scalars at all.
Personally, I think that is unfair towards scalars, in that they are blamed
for the messiness of how they are implemented rather than for the idea itself.
The other issue is that the boundaries of when a scalar and when a 0-D
array is to be expected seems blurry right now, and because of that it
seems easier to just remove that blurry line completely.
However, I believe that it is straight forward to clarify the delineation:

All normal ufuncs will return arrays for array inputs, they do not create
"array scalars".
Functions may (ufuncs should) return scalars for scalar only input.
Indexing can be confusing because arr[0] can be an array or a scalar.
This is a problem with indexing semantics only. arr[0] should be a scalar
and error if it is not. The user should use arr[0, ...] to get the array
(even if it is 0D).
A side not to this is: NumPy arrays should not be seen as list of lists,
that seems to be a problematic concept. In principle, I would prefer if
Python arrays would not allow iteration even, it should be arr.flat
or arr.iter(axis=0) to clarify the intention.
(Even without scalars I think iteration is problematic, although less of
a problem, you still run into more issues when code expects NumPy arrays
to be typical Python objects.)


Especially reductions take an axis argument, and a reduction over all
axes is a zero dimensional result. I believe the delineation is clear
here even now: axis=None means scalar results, axis=range(arr.ndim)
should mean a zero dimensional result.
Most users already use axis=None, and those who end up doing axis=(0, 1)
and get a 0-D result are likely writing generic N-dimensional code which
should return an array.

In short, I cannot think of any operation on arrays where the intention of
whether or not a scalar should be returned is not immediately obvious by
using None to indicate a "reduction" to scalar or removing the list-of-lists
concept, by clarifying indexing and iteration.
(One could even say: The shape of NumPy scalars should be None if they must
have a shape attribute.)
Of course changing the way NumPy arrays are iterated and indexed would be painful,
and maybe not feasible.
But so is the removal of scalars, so I believe the question is what point to
look for on the horizon.
Note that for all practical purposes in the following discussion
elements of NumPy arrays should be considered to be scalars of a type matching
the datatype of the array.
Even if the general notion of NumPy scalars existing may be contest, this seems
like a useful concept for discussion.

The Array's Datatype

A HomogeneousElementwiseContainer` such as NumPy arrays, must know the type
of each element.
In NumPy, further the *storage details* of each element are necessary:
The scalar may be ``int64, but NumPy also needs to know that
this it is stored in little or big endian byte order.
(Most scalars do not expose their innards, and do not have to worry about
storage details, especially since they are typically immutable.)
The primary use of the Datatype as a concept is that of informing the user:
dtype = arr.dtype

Provides an object with all the information necessary to know what can be
stored inside the array.
The dtype object further provides some storage details, such as endianess
or itemsize.
##### User Perspective
From the user perspective, immediately only some storage information is
necessary:

Most users will need to know what type of scalars the elements are. I.e.
what currently is stored as arr.dtype.type is sufficient for them.
This provides all information to know how the elements behave.
A user does not even need to know that an int32 is stored in 32 bits,
the 32 bits are only important because it limits the range of math with this
scalar.
Current strings are different, since they also store a length.
The arr.dtype, thus also provides information on limits of what can
be stored in the array.
While there may also be limits on which operations are possible for the
array, this information does not affect functional behaviour: if an operation
is defined, it gives the same result as for the np.str_ type.

In that sense the elements of an array in general can be a strict subset of
the scalar type.
Any element behaves identically inside the array and as a scalar, but not all
elements can be stored inside the array necessarily, and in some cases the
array may not perform how to do a certain operation.
Ideally, of course, both elements and scalars are functionally equivalent.
In many cases it would thus be sufficient if the array only exposed even the
arr.dtype.type, and hid the arr.dtype storage details.
However, Python provides no good way to describe "strings of length 4 or less" as a type.
(I would believe this is inherent to the dynamic nature of the language?)

Storage Perspective

Additionally to the type information, the arr.dtype provides storage information
such as itemsize and endianess associated with all element and necessary
to access or assign to them.
This storage information could in NumPy 1.18 just as well be stored on the
array itself, since it always includes the exact same, fixed, set of
values.

Container type and dtype

From a user perspective, the scalar type (as mentioned above) is the main
information necessary.
This is much like a C++ templated container, where the template parameter
is the scalars type.
In NumPy (and Python generally), this construct does not exist.
There is no class for np.ndarray<np.int64>, the scalar type is instead
part of the ndarray instance.
From a theoretical point of view, we can still see np.ndarray<np.int64>
as a valid type, one that encompasses all possible array shapes, strides, …
and also all element properties such as endianess and itemsize.
From a typing perspective (do not read class!), np.ndarray is clearly
a super-type of np.ndarra<np.int64>
The (current) dtype storage, however, do not have a defined home if we only consider
np.ndarray<np.int64>.
They could just as well be part of the np.ndarray class (i.e. fields on C-side struct).
In NumPy (as of 1.18) that would actually work, although it would waste storage
space.
However, this means that np.ndarray would be required to know all possible
storage variations for all possible scalar types.
Thus, enter the DType type.
We can describe np.ndarray as "templated" via np.ndarra<DType[np.int64]>
where DType[int64] encompasses all desired storage variations of storing
an np.int64.
Thus the np.ndarray type itself does not need to be extended for new
contained scalar types, it is only necessary to define DType[new_type].

DType instance/realization limiting the Scalar Type

DType[scalar_type] encompasses all possible scalars and can store them
faithfully as elements when a new array is created.
However, a specific array with a specific DType[scalar_type]() instance
has limited storage capabilities.
Since the array is homogeneous, it thus may naturally limit the scalars
that can be stored inside it.
This is the string case above, the values which can be stored may be a subset
of the scalars values.
Theoretical arr.dtype.type of a bytes array is not np.str_, but
np.str_[limited to a length of N].

Enter Quantities (Additional Homogeneous Information)

Quantities extend the array object with additional homogeneous unit information,
such as a NumPy array of "meters":
   Quantity = (np.ndarray<DType[numpy_defined]>, Unit)

And can thus be implemented as a subclass of ``np.ndarray`` (and is currently
e.g. in ``astropy.units.Quantity``.

An alternative spelling would be::

   Quantity = np.ndarray<(DType[numpy_defined], Unit)>

Note that neither of the tuples (np.ndarray, Unit) or (DType[.], Unit)
are necessarily easy to define generally (the second one is currently impossible).
Also note that Astropy Quantities do not have a scalar associated with them
right now.
The above tuples show some of the tradeoffs of the approach.
Implementation Differences:

(np.ndarray, Unit) can be defined as an array subclass (as that is
currently possible)
(DType[numpy_defined], Unit) encompasses the set of types: DType
defined already by NumPy.
It thus either needs a class/wrapper factory or include the
DType[numpy_defined] dynamically.

Technically, both methods should be similar in complexity, assuming "inheritance"
is possible, although 1. is possible a bit more straight forward, because
a single subclass encompasses all existing DTypes.
This advantage is probably diminished a little if you have scalars.
For both cases, the difficult part is designing a good wrapping of the existing
computational kernels which, in NumPy, are defined by the universal functions.
Dependent ndarray type
It should be clarified that both of the above approaches define the
identical type (space of all possible instances).
The difference is that from a Python implementation perspective, the dynamic creation
of a subclass is typically not done.
Thus, there is no actual class object for np.ndarray<DType[np.float64]>
and thus type(np.ndarray<DType[np.float64]>) is np.ndarray.
The user is aware of this, and OK with it. The main result of it is who and
how methods on the np.ndarray<DType[np.float64]> are defined.
Pandas for example induces/choose a subclass based on the DType, i.e.
it may define a class for np.ndarray<DType[np.float64]> during construction.
I.e. this leads to the convenience of DType specific methods on the container.
From a purist point of view, the container should maybe not have sum().
Quantity (DType) induced Methods:
The main advantage of the array subclass solution is that it is straight forward
to define Quantity.to(new_unit) or Quantity.unit which would otherwise
have to be accessed through the dtype attribute and may require explicit
passing of the array.
To some degree, NumPy could of course add support for array methods to be
induced by the existence of a dtype (arguably, .imag is in a sense).
An alternative to "induce" methods could be arr.elem returning a
bound dtype provided object, which can add methods so that arr.elem.method
works.
It is always possible to create a subclass which only adds the new methods
to the ndarray.
But we would have to carefully weigh if we wish to allow dtypes to provide
e.g. MixIns to induce additional methods.
On the up-side a careful designed MixIn could work with arbitrary array-like
objects that implement only a minimal set of what NumPy does.
The purist point of view may be this, if methods are desired:

The base np.ndarray should have almost no methods (e.g. no .sum)
For each DType there should be a subclass of np.ndarray automatically
created. The DType can then MixIn DType related methods, which
do not need to know about the arrays shape, etc..
The problem of methods that work on all values, and must either:
The DType could be allowed to MixIn array methods (such as sum, conjuage),
assuming that they are backed by ufuncs (which are array multi-methods).
An array library chooses to not have such methods. I.e. you always
must use a functional approach. Such as unit.to(ndarray, new_unit)
instead of Quantity.to(new_unit).


Or of course, we see
Generalization to Other Array-Likes (HomogeneousElementwiseContainer)
A subclass of np.ndarray can only hope to generalize to other array-likes
if these array-likes are "templated".
For example, it could largely work for Dask, which is a distributed array:
daskarray<np.ndarray> and coul accept an np.ndarray` subclass as
template parameter in principle.
Other array-likes, such as pandas dataframes, or sparse arrays, however, have
no chance of reusing a subclass.
On the other hand, the DType approach generalizes to other array-like
immediately. The only concern being whether computational kernels
provided are fast for the specific array-like container.
Array-programming methods
The methods defined on NumPy arrays and ufuncs, and which we discuss above
are methods defined on the type HomogeneousElementwiseContainer<DType>,
since we have the distinction of container and DType, i.e. the instance can
also be written as the (np.ndarray, dtype) tuple, methods need to work
with the full type (including the array information) in general.
The simple solution to this is elementwise computation: Every container methods
uses the elementwise computations defined by the scalar type (the DType only
describes the storage of the scalar type).
In practice this may work in C++ templates in some cases, but in NumPy this
is not feasible since working only with single elements incurs a too high
overhead performance wise.
Even in compiled languages it is not advisable in general, since e.g. SIMD
operations may be a large performance gain if they know that they are working
with more than a single element.
This leads to the implementation of Computational Kernels, which are wrapped
inside the UFuncs.

Computational Kernels (UFuncs)

The split into computational kernels is difficult, because it necessarily
blurs container and scalar functions to some degree.
Unlike a typical scalar function, the computational kernel used inside the
ufunc (the ufunc loop), can operate on many, strided, elements.
The array object, thus can call this kernel with (read ufunc loop implementation)
for many elements at once speeding up the computation, but still hiding the full
details of the arrays memory layout.
If a computational kernel, such as currently in NumPy, can be called multiple
times, it may need additional hooks before and after the full computation.
This is to prepare the kernel (e.g. allocate computational memory),
do error checking, or simply perform type related work which only need to be
performed once.
Thus, the line between container and element is blurred:
The kernel provided by the DType to be used in an array-method must have partial
knowledge about the array objects memory layout for fast computation.
It does not need to know the memory layout, but it must anticipate it by
providing a sufficiently optimized kernel function.
It may be helpful to remember the two extreme cases:

The "kernel" works on a single element, so that it knows nothing about the
array object itself. This is too slow in practice, but clearly correct
and simple.
The "kernel" knows the full layout and shape of the array. It thus can
perform the full workload (e.g. datashape of xnd does this) in
one method call.
In this organization, the array and dtype are a single unit of abstraction.
For all practical purposes (ndarray, dtype) is one object and the dtype
is merely an implementation detail of where methods are stored.
While it may work, generally it makes no sense to use a dtype
without also using np.ndarray in this context!

Currently, for many people NumPy probably falls into category 2., while internally
it does use computational kernels on strided elements using DTypes, neither
the DTypes nor the "kernels" (ufunc loops) are extensible.
The organization into "inner-loops" is an implementation detail:
The actual "kernel" is currently the full ufunc machinery minus the dispatching
step!
In NumPy, if you disregard dispatching by fixing the signature in a ufunc call,
the kernel works exactly on ndarrays.
In other words, while we may have a kernel, the input to this kernel is actually
a full ndarray.
You can thus even argue: a NumPy array is right now both the user facing object
and the building block for computational kernels.
In that regard, a Quantity subclass does not have a datatype at all, it
has a NumPy array as data description (datashape) object.
Best of both worlds?:
My view of the above is that NumPy should embrace embrace the kernel design
more clearly and make it an integral and exposed API choice.
As of now, it is largely an implementation detail: We write ufuncs, by providing
computational kernels (i.e. strided loops), but then wrap them up into
the bigger computational kernel that is the full ufunc (with a specified DType).
From a Python perspective this makes sense, since the kernel itself is useless
unless used in conjuncture with an array object.
Maybe as a help, the kernel design exposed to Python could look like this:
float64_add_kernel = np.add.get_implementation([np.float64, np.float64])
np.apply_kernel(float64_add_kernel, (arr1, arr2))

which could be wrapped a bit nicer for the user.
The main point then being that float64_add_kernel provides low-level
callbacks to operate on many elements at the same time for a given (d)type
in an efficient manner. A different array object, could choose to use the same
lowlevel functions, without having the need to wrap it into a numpy array first
(which can be slow, if the memory layout does not match the NumPy one).
I.e. the "kernel", in my opinion, should not be tied rely on a python object,
but only a specific signature, such as the strided one dimensional loop.
This also removes any direct chance of using the Pytohn object without the GIL.
We could also think of it as a collection of low-level callbacks:
int.__add__
int.__add__.contiguous_loop
int.__add__.strided_loop

etc. Except that we need the to do the perparation and dispatching step before
we could call an int.__add__.strided_loop.
The choice of doing this, instead of the full array operation (although of course
there could be a kernel that does the full array one), is mainly due to the
current organization into one dimensional loops being useful.
However, there is one further huge advanage of this: it allows to think of buffering
in a much easier way.
Buffering is the main reason for the low level kernel design!
In the above discussion I forgot one important point, NumPy does casting, and casting
needs to be buffered.
This means that an actual operation is actually a composition of multiple steps, potentially
casting both inputs and outputs and the calculation step itself.
Casting has to be buffered (or at the very least done in chunks) to be cache and memory
friendly.
Casting logic requires a fairly large machinery (at least it does so in NumPy), and it
also requires functionality (kernels) very much like the mathematical kernels.
Since the actual calculation has thus in general many steps, the full array cannot
be the basic element of kernel execution.
Conclusion
I believe due to buffering being, the only reasonable organization is into low level C-style
kernels which are not tied to any specific python object or data structure (of course the
kernel design itself could be targeted to NumPy arrays).
But even that could be expanded: There is no special reason why NumPy should not provide
kernels, or allow to add kernels, which it does not require itself.
NumPy already effectively embraces that design in implementation, and probably people smarter
than me decided to do it this way exactly for these reasons.
A further reason is that casting and operations are chained, but also multiple operations could
be chained in principle.
The main point of this for me, is to clarify that the array should not be seen as the input to
or host of the computational kernel.
From my perspective the computational kernel is tied solely to the datatype while the array object
uses those kernel to compose the full operation efficiently: the datatypes are the orchestra, the
array is only the conductor (although in reality since the kernels are limited, that is a lot of
work to do – which is done in the iterator machinery).