These are notes on how memory alignment currently works in numpy.
There are three use-cases related to memory alignment in numpy I see:
- Creating structured datatypes with fields aligned like in a C-struct.
- Speeding up copy operations by using word/double-word assignment in instead of memcopy
- Guaranteeing safe aligned access for ufuncs/setitem/casting code
There are 3 relevant uses of the word align
used in numpy:
- the
align
keyword of the dtype constructor (only affects structured arrays) - the
dtype.alignment
attribute (descr->alignment
in C) - the
ALIGNED
flag of an ndarray, computed in_IsAligned
and checked inPyArray_ISALIGNED
.
Here is how they are computed, first to last:
- In structured arrays, if field offsets are not manually provided numpy determines the offsets automatically. In that case,
align=True
pads the structure so that each field is aligned in memory (ie, its memory location is a multiple offield.dtype.alignment
), setsdtype.alignment
to be the largest of the field alignments, and setsdtype.itemsize
to the smallest posible multiple of this alignment. This is what C-structs usually do. Otherwise if offsets or itemsize were manually providedalign=True
simply checks that all the fields are aligned and that the total itemsize is a multiple of the largest field alignment. dtype.alignment
has arch-dependent default values for non-flexible types (defined as the alignment of the corresponding C type, except for complex types which are doubled). It is equal to 1 for flexible types (including structured types), with the exception of structured types created withalign=True
where it is determined in the previous step.- For an ndarray, the
ALIGNED
flag is determined based ondtype.alignment
. It is set toTrue
if every item in the array is at a memory location consistent withdtype.alignment
, which is the case if the data ptr and all strides of the array are multiples of that alignment. (There are some recently added exceptions to this for flexible types).
Here is how the variables above are used:
- creating aligned structs: In order to know how to offset a field when
align=True
, numpy looks upfield.dtype.alignment
. This includes fields which are nested structured arrays. - Ufuncs: If the
ALIGNED
flag of an array is False, ufuncs will buffer/cast the array before evaluation. This is needed since ufunc inner loops access raw elements directly, which might fail on some archs if the elements are not aligned. A couple of get/setitem functions useALIGNED
in the same way. - Copy code: The
ALIGNED
flag determines which code path is used during array copies. If the itemsize of an array is equal to 1, 2, 4, 8 or 16 bytes and theALIGNED
flag isTrue
, then instead of usingmemcpy(dst, src, N)
numpy will do*(uintN*)dst) = *(uintN*)src)
for appropriate N. - Cast code: if
ALIGNED
is True, this will essentially do*dst = CASTFUNC(*src)
. If False, it doesmemmove(srcval, src); dstval = CASTFUNC(srcval); memmove(dst, dstval)
where dstval/srcval are aligned.
Note that ufuncs/casting care about the arch's definition of alignment, but the copy code is different: It requires the items to be aligned like a uint of equal size would be, rather than the type's alignment itself. Unfortunately it doesn't check that, and assumes that the type's alignment is equal to the equiv uint's alignment.
Now consider the case of complex64, which is implemented as struct { float real, imag; }
in C. On my system (x64 linux gcc) this has alignment of 4 in C, but the equivalent sized uint64 has an alignment of 8. Therefore from the point of view of ufuncs/casting the ALIGNED
flag should be caculated with 4, but from the point of view of copy code it should be 8. Using 4 would cause misaligned access in the copy code, so numpy has artifically modified the alignment attribute of complex64 to 8 to compensate.
However, the fact that currently numpy says complex64 has alignment 8 means that structured types involving complex64 are incorrectly aligned relative to C (eg dtype('u1,c8', align=True)
), and it means that ufuncs occasionally buffer the array when they don't need to (luckily we overestimate rather than underestimate alignment).