ahaldane/npy_alignment.mkd

## npy_alignment.mkd

      
    Raw
  

              npy_alignment.mkd
            
          
    These are notes on how memory alignment currently works in numpy.
Numpy Alignment Goals

There are three use-cases related to memory alignment in numpy I see:

Creating structured datatypes with fields aligned like in a C-struct.
Speeding up copy operations by using word/double-word assignment in instead of memcopy
Guaranteeing safe aligned access for ufuncs/setitem/casting code

Alignment variables

There are 3 relevant uses of the word align used in numpy:

the align keyword of the dtype constructor (only affects structured arrays)
the dtype.alignment attribute (descr->alignment in C)
the ALIGNED flag of an ndarray, computed in _IsAligned and checked in PyArray_ISALIGNED.

Here is how they are computed, first to last:

In structured arrays, if field offsets are not manually provided numpy determines the offsets automatically. In that case, align=True pads the structure so that each field is aligned in memory (ie, its memory location is a multiple of field.dtype.alignment), sets dtype.alignment to be the largest of the field alignments, and sets dtype.itemsize to the smallest posible multiple of this alignment. This is what C-structs usually do. Otherwise if offsets or itemsize were manually provided align=True simply checks that all the fields are aligned and that the total itemsize is a multiple of the largest field alignment.
dtype.alignment has arch-dependent default values for non-flexible types (defined as the alignment of the corresponding C type, except for complex types which are doubled). It is equal to 1 for flexible types (including structured types), with the exception of structured types created with align=True where it is determined in the previous step.
For an ndarray, the ALIGNED flag is determined based on dtype.alignment. It is set to True if every item in the array is at a memory location consistent with dtype.alignment, which is the case if the data ptr and all strides of the array are multiples of that alignment. (There are some recently added exceptions to this for flexible types).

Consequences of alignment

Here is how the variables above are used:

creating aligned structs: In order to know how to offset a field when align=True, numpy looks up field.dtype.alignment. This includes fields which are nested structured arrays.
Ufuncs: If the ALIGNED flag of an array is False, ufuncs will buffer/cast the array before evaluation. This is needed since ufunc inner loops access raw elements directly, which might fail on some archs if the elements are not aligned. A couple of get/setitem functions use ALIGNED in the same way.
Copy code: The ALIGNED flag  determines which code path is used during array copies.  If the itemsize of an array is equal to 1, 2, 4, 8 or 16 bytes and the ALIGNED flag is True, then instead of using memcpy(dst, src, N) numpy will do *(uintN*)dst) = *(uintN*)src) for appropriate N.
Cast code: if ALIGNED is True, this will essentially do *dst = CASTFUNC(*src). If False, it does memmove(srcval, src); dstval = CASTFUNC(srcval); memmove(dst, dstval) where dstval/srcval are aligned.

alignment problems with complex64

Note that ufuncs/casting care about the arch's definition of alignment, but the copy code is different: It requires the items to be aligned like a uint of equal size would be, rather than the type's alignment itself. Unfortunately it doesn't check that, and assumes that the type's alignment is equal to the equiv uint's alignment.
Now consider the case of complex64, which is implemented as struct { float real, imag; } in C. On my system (x64 linux gcc) this has alignment of 4 in C, but the equivalent sized uint64 has an alignment of 8. Therefore from the point of view of ufuncs/casting the ALIGNED flag should be caculated with 4, but from the point of view of copy code it should be 8. Using 4 would cause misaligned access in the copy code, so numpy has artifically modified the alignment attribute of complex64 to 8 to compensate.
However, the fact that currently numpy says complex64 has alignment 8 means that structured types involving complex64 are incorrectly aligned relative to C (eg dtype('u1,c8', align=True)), and it means that ufuncs occasionally buffer the array when they don't need to (luckily we overestimate rather than underestimate alignment).