To implement a Rust function for R with extendr, users can choose to use either (or both) of the following:
- Rust types (e.g.
i32
, and&str
) - Proxy types (e.g.
Integer
, andCharacter
)
For Rust types, we need to consider how to handle the case where the types are incompatible between R and Rust. On the other hand, proxy types are easy in that they are just cast, so the operation should not affect the underlying data. So, this document focuses on the conversion to Rust types.
We need to handle these types of conversions (c.f. the FromLossy / TryFromLossy RFC):
- safe, non-lossy conversions (e.g. an integer to
i32
) - fallible, non-lossy conversions (e.g. an integer to
i16
, which fails when the value is larger thani16::MAX
or smaller thani16::MIN
) - safe, lossy conversions (e.g. a numeric to
f32
) - fallible, lossy conversions (e.g. a numeric to
u32
, which fails when the value is negative, and is lossy when the value is not integer-ish (e.g.1.5
))
Note that
- Even the "safe" conversions can fail when a vector is supplied to a scalar argument, or a missing value is supplied to a non-
Option
argument - The output conversion should never fail on runtime, though it might be lossy
Currently, extendr provides two modes of auto-conversions with #[extendr]
macro.
I personally feel the "safety-first" mode should be the default.
- The inputs might or might not be checked for the sake of speed. The conversion might fallible or lossy.
- It's user's responsibility to validate and ensure these values are within the limits that are documented, otherwise it's considered as undefined behavior.
- The inputs are strictly checked and, if the conversion would be lossy, it fails with an error.
To enable the behavior described above, we should implement traits and methods following this convention
- Use
From
for safe, non-lossy conversion - Use
TryFrom
for fallible, non-lossy conversion - Provide other traits (i.e.
FromRobj
) or methods (e.g.<T>::from_xxx()
,<T>::to_xxx()
) for lossy conversions
(For the rationale, please refer to the Appendix)
A input whose length is more than one, can be converted from/to
Vec<T>
&[T]
The conversion should fail when a vector is supplied to a scalar argument
A input that is possibly a missing value can be converted from/to
Option<T>
The conversion should fail when a missing value is supplied to a non-Option
argument
A logical (i.e. TRUE
, FALSE
, or NA
) can be converted from/to
bool
Option<bool>
The conversions can fail or be lossy in the following cases:
- attempt to convert
NA
to a non-Option
type (fail)
Note that it seems there's no lossy case.
There's no fallible or lossy case.
An integer (R's integer is 32bit) can be converted to
- signed integer types (
i8
,i16
,i32
,i64
) - unsigned integer types (
u8
,u16
,u32
,u64
) - floating-point types (
f32
,f64
) - either of the above types wrapped with
Option
The conversions can fail or be lossy in the following cases:
- convert a value that exceeds
<T>::MIN
or<T>::MAX
to an integer type (fail) - convert a negative value to an unsigned integer type (fail)
- convert
NA_integer_
to a non-Option
type (fail)
Note that it seems there's no lossy case; even i64::MAX as f32 as i64
, the value seems to be preserved.
The conversions can fail or be lossy in the following cases:
- convert a value that exceeds
-.Machine$integer.max
or.Machine$integer.max
A numeric (R's numeric is 64bit c.f. https://stackoverflow.com/q/50217954) can be converted to
- signed integer types (
i8
,i16
,i32
,i64
) - unsigned integer types (
u8
,u16
,u32
,u64
) - floating-point types (
f32
,f64
) - either of the above types wrapped with
Option
The conversions can fail or be lossy in the following cases:
- attempt to convert a value that exceeds
<T>::MIN
or<T>::MAX
to an integer type (fail) - convert an infinite value to an integer type (fail)
- convert a
NaN
to an integer type (fail) - convert a negative value to an unsigned integer type (fail)
- convert
NA_real_
to a non-Option
type (fail) - convert to
f32
(lossy) - convert an non-integerish value (e.g.
1.5
) to an integer type (lossy)
There's no fallible or lossy case.
A character can be converted to
&str
String
- either of the above types wrapped with
Option
The conversions can fail or be lossy in the following cases:
- convert
NA_character_
to a non-Option
type (fail)
Note that
- Even when the character is invalid as UTF-8, the operation will never fail because it's handled by
std::str::from_utf8_unchecked()
. - The encoding information is always lost, so, in the sense, this might be considered as lossy conversion.
There's no fallible or lossy case.
A factor should be handled in the same manner as character. Only difference is that it's always lossy in that the levels gets lost in conversion.
A list can be converted to
HashMap<&str, Robj>
HashMap<String, Robj>
The conversions can fail or be lossy in the following cases:
- The list contains any unnamed elements (fail)
- The list contains any duplicated name of elements (fail)
Note that
HashMap
doesn't have the information about the order of the elements, so, in the sense, this might be considered as lossy conversion.
This won't be provided?
An environment doesn't have a corresponding Rust type.
A function doesn't have a corresponding Rust type.