Object arrays are ndarrays with a datatype of np.object
whose elements are Python objects, enabling use of numpy's vectorized operations and broadcasting rules with arbitrary Python types. Object arrays have certain special rules to resolve ambiguities that arise between python types and numpy types, described here.
Envisioned uses of object arrays include:
- Creating ndarrays whose elements are other ndarrays of varying length
- Creating ndarrays containing number-like Python objects, for example mpmath's multiprecision types, or Python's built-in arbitrary precision integers or Decimal type.
Object arrays are often useful for storing python string types, because it allows arbitrary string lenths (while a numpy array's string length is fixed), and because if the strings in the array are repeated python only stores a string once and only references it upon repeated use (string interning), saving memory).
(Add a note here about how in many cases a "proper" solution is to create a new dtype, but object arrays can be a quick workaround)
Object arrays can be created using np.array
and explicitly supplying np.object
as the dtype argument.
>>> a = array([1,2,3], dtype=np.object)
Note that unlike normal coercion rules, numpy will not attempt to create an object array unless the dtype of np.object
is explicitly supplied[, or unless the supplied data contains a python integer larger than is representable with the largest numpy integer type?]. This is to prevent the common error of mistakenly supplying subsequences of different lengths, which the normal coercion rules would convert to an object array.
In deviation from normal unpacking rules, for datatypes of np.object
, np.array
will only descend into subsequences of the input sequence if the subsequences are of python List type, and not any other sequence type such as np.ndarray
, and the lists are of equal length. This resolves ambiguity in the amount of nesting desired: An object [[1,2],[3,4]]
will thus be interpreted as an object array with shape (2,2)
containing Python integers, and not as an object array of shape (2,)
containing python lists. Creating an object array containing equal sized Python lists is more complicated, but may be accomplished in two steps:
>>> a = empty(2, dtype=np.object)
>>> a[:] = [[1,2,3],[4,5,6]]
Numpy defines an additional object type, np.pytype
, which is used to cast to built-in Python types. Viewing an ndarray as np.object
dtype will create an object array but will not cast values to Python types, while doing so with np.pytype
will also cast to python native types, by calling .item
on each element.
>>> def printresult(v):
... print(v.dtype, type(v[0]))
>>> printresult( np.arange(10).astype(np.object) )
np.object, numpy.int64
>>> printresult( np.arange(10).astype(np.pytype) )
np.object, int
This is useful to take advantage of properties of Python's native types, such as its multiprecision integers.
Numpy handles integers larger than its largest integer type by using object arrays. Operations involving large Python integers will often automatically coerce to object type:
>>> np.array([2**128])
array([340282366920938463463374607431768211456L], dtype=object)
>>> np.array([0], dtype=np.int64) + 2**128
array([340282366920938463463374607431768211456L], dtype=object)
Nesting ndarrays (and other sequence objects such as lists) inside of object arrays can be tricky because numpy will attempt to broadcast assignment operations involving two ndarrays. As a special case for object arrays, values may be assigned to each index individually to avoid broadcasting:
>>> a = array([8,9], dtype=np.object)
>>> a[:] = array([1,2]) # broadcasts
>>> a[0] = array([1,2]) # does not broadcast
This also applies to fields of object type of structured scalars:
>>> a = np.empty(2, dtype='O,i8')
>>> a['f0'][0] = arange(3)
>>> a[0]['f0'] = arange(3)
This is also how one can create object arrays containing equal size lists:
>>> a = empty(2, dtype=np.object)
>>> a[0] = [1,2,3]
>>> a[1] = [4,5,6]
Viewing object arrays as a different type is not allowed, as it could result in modification of the underlying object pointer. Similarly, viewing a non-object array as an object array is not allowed.
>>> a = np.array([1,2,3], dtype=np.object)
>>> a.view(np.int64)
TypeError: Cannot change data-type for object array.
>>> np.array([1,2,3], dtype='i4').view(np.object)
TypeError
The array may still be cast to another type using astype
as usual.
Ufuncs operate specially on objects arrays, since the objects contained in the array may not be numpy types. To evaluate a ufunc numpy tries a series of strategies in order for each element of the array (for unary ufuncs) or for each pair of elements from the arrays (for binary ufuncs):
-
If there is a Python "Special method" which corresponds to the ufunc numpy uses it to evaluate the ufunc. This step only applies to the ufuncs
add
,subtract
,multiply
,divide
,true_divide
,floor_divide
,remainder
,negative
,[positive]
,power
,mod
,absolute
,bitwise_and
,bitwise_or
,bitwise_xor
,invert
,left_shift
,right_shift
,greater
,greater_equal
,less
,less_equal
,not_equal
,equal
. -
If all elements passed to the ufunc are one of: a numpy scalar, a python
bool
,int
,long
,float
orcomplex
, numpy handles evaluation of the ufunc. Unary ufuncs will return a numpy scalar if the input element was a numpy scalar and a python type otherwise. Binary ufuncs will return a numpy scalar if either input element was a numpy scalar, and a Python type otherwise. Note that multiprecision Python integers are evaluated specially to give a multiprecision result. -
If the first element has a method with the same name as the ufunc then that method will be called, eg
elem.sqrt()
. Binary ufuncs such aslogaddexp(x,y)
will callx.logaddexp(y)
. -
For a small number of ufuncs, notably
np.minimum
andnp.maximum
, numpy implements a fallback implementation using pure python code shown in the table below. For the remaining ufuncs a TypeError is raised.
This provides a rough way of creating new numeric types compatible with numpy, as you can define a class which implements all ufuncs missing from step 1 as methods, and then create an object array containing elements of your type. However, note that creating a user-defined type (see "User-Defined Types") is often preferrable. User defined types will be much faster and you will have control over casting and coercion.
Internally in step 2, numpy evaluates ufuncs involving Python float
or complex
by converting to the equivalent numpy type, computing the ufunc, and converting back to the python type. However, it uses a custom ufunc implementation for Python int and long (not documented here, see _objectmath.py
in numpy source) to handle python integers larger than the maximum integer numpy types can represent.
The ufunc implementations for ufuncs evaluated in step 4 are in the following table:
Numpy Ufunc | Python Implementation |
---|---|
np.logical_and(x, y) |
bool(x and y) |
np.logical_or(x, y) |
bool(x or y) |
np.logical_xor(x, y) |
bool(x or y) and not bool(x and y) |
np.logical_not(x) |
bool(not x) |
np.maximum(x, y) , np.fmax(x, y) |
max(x, y) |
np.minimum(x, y) , np.fmin(x, y) |
min(x, y) |
np.degrees(x) , np.rad2deg(x) |
x*180/np.pi |
np.radians(x) , np.deg2rad(x) |
x*np.pi/180 |
np.square(x) |
x*x |
Relevant issues/PRs:
explicitly specifying dtype=object:
Nesting issues:
Casting issues:
Note that with the plans above
>>> np.array([0,1,2], dtype=np.object)
will give an object array containing Python ints. To get an object array containing numpy types using np.array
one must do something like
>>> np.array([np.int64(x) for x in [0,1,2]], dtype=np.object)
but really it's easier to do
>>> np.arange(3).astype(np.object)
>>> type(_[0])
numpy.int64
The general idea is that if you use np.array
with dtype=np.object
it will take your supplied objects exactly as they are, no conversion.
On the 'pytype' type: What are the alternatives? Maybe this can already be done with vectorize
? Defining itemize = vectorize(lambda x: x.item())
, it looks like itemize(arr.astype(np.object))
might work, but this casts to int for some reason.
But actually maybe having a special pytype
type also makes it clearer that some kind of casting is going on.