The principle of this aNEP is to separate the APIs for masking and for missing values, according to
- The current implementation of masked arrays
- Nathaniel Smith's proposal.
This discussion is only of the API, and not of the implementation.
Authors:
- Matthew Brett
First, missing values can be set and be displayed as np.NA, NA
:
>>> np.array([1.0, 2.0, np.NA, 7.0], dtype='NA[f8]')
array([1., 2., NA, 7.], dtype='NA[<f8]')
As the initialization is not ambiguous, this can be written without the NA dtype:
>>> np.array([1.0, 2.0, np.NA, 7.0])
array([1., 2., NA, 7.], dtype='NA[<f8]')
Masked values can be set and be displayed as np.MASKED, MASKED
:
>>> np.array([1.0, 2.0, np.MASKED, 7.0], masked=True)
array([1., 2., MASKED, 7.], masked=True)
As the initialization is not ambiguous, this can be written without masked=True
:
>>> np.array([1.0, 2.0, np.MASKED, 7.0])
array([1., 2., MASKED, 7.], masked=True)
By default, NA values propagate:
>>> na_arr = np.array([1.0, 2.0, np.NA, 7.0])
>>> np.sum(na_arr)
NA('float64')
unless the skipna
flag is set:
>>> np.sum(na_arr, skipna=True)
10.0
By default, masking does not propagate:
>>> masked_arr = np.array([1.0, 2.0, np.MASKED, 7.0])
>>> np.sum(masked_arr)
10.0
unless the propmsk
flag is set:
>>> np.sum(masked_arr, propmsk=True)
MASKED
An array can be masked, and contain NA values:
>>> both_arr = np.array([1.0, 2.0, np.MASKED, np.NA, 7.0])
In the default case, the behavior is obvious:
>>> np.sum(both_arr)
NA('float64')
It's also obvious what to do with skipna=True
:
>>> np.sum(both_arr, skipna=True)
10.0
>>> np.sum(both_arr, skipna=True, propmsk=True)
MASKED
To break the tie between NA and MSK, NAs propagate harder:
>>> np.sum(both_arr, propmsk=True)
NA('float64')
is obvious in the NA case:
>>> arr = np.array([1.0, 2.0, 7.0])
>>> arr[2] = np.NA
TypeError('dtype does not support NA')
>>> na_arr = np.array([1.0, 2.0, 7.0], dtype='NA[f8]')
>>> na_arr[2] = np.NA
>>> na_arr
array([1., 2., NA], dtype='NA[<f8]')
Direct assignnent in the masked case is magic and confusing, and so happens only via the mask:
>>> masked_array = np.array([1.0, 2.0, 7.0], masked=True)
>>> masked_arr[2] = np.NA
TypeError('dtype does not support NA')
>>> masked_arr[2] = np.MASKED
TypeError('float() argument must be a string or a number')
>>> masked_arr.visible[2] = False
>>> masked_arr
array([1., 2., MASKED], masked=True)