Skip to content

Instantly share code, notes, and snippets.

@yutannihilation
Last active July 23, 2021 11:41
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save yutannihilation/15e1f90099985f35c331ea78b8edb4cd to your computer and use it in GitHub Desktop.
Save yutannihilation/15e1f90099985f35c331ea78b8edb4cd to your computer and use it in GitHub Desktop.

(Draft) extendr's conversion specs and semantics

Scope of this document

To implement a Rust function for R with extendr, users can choose to use either (or both) of the following:

  1. Rust types (e.g. i32, and &str)
  2. Proxy types (e.g. Integer, and Character)

For Rust types, we need to consider how to handle the case where the types are incompatible between R and Rust. On the other hand, proxy types are easy in that they are just cast, so the operation should not affect the underlying data. So, this document focuses on the conversion to Rust types.

Type of conversions

We need to handle these types of conversions (c.f. the FromLossy / TryFromLossy RFC):

  • safe, non-lossy conversions (e.g. an integer to i32)
  • fallible, non-lossy conversions (e.g. an integer to i16, which fails when the value is larger than i16::MAX or smaller than i16::MIN)
  • safe, lossy conversions (e.g. a numeric to f32)
  • fallible, lossy conversions (e.g. a numeric to u32, which fails when the value is negative, and is lossy when the value is not integer-ish (e.g. 1.5))

Note that

  • Even the "safe" conversions can fail when a vector is supplied to a scalar argument, or a missing value is supplied to a non-Option argument
  • The output conversion should never fail on runtime, though it might be lossy

General convention

#[extendr] macro

Currently, extendr provides two modes of auto-conversions with #[extendr] macro. I personally feel the "safety-first" mode should be the default.

Speed-first mode (the current default)

  • The inputs might or might not be checked for the sake of speed. The conversion might fallible or lossy.
  • It's user's responsibility to validate and ensure these values are within the limits that are documented, otherwise it's considered as undefined behavior.

Safety-first mode (i.e. use_try_from = true)

  • The inputs are strictly checked and, if the conversion would be lossy, it fails with an error.

Related implementations

To enable the behavior described above, we should implement traits and methods following this convention

  • Use From for safe, non-lossy conversion
  • Use TryFrom for fallible, non-lossy conversion
  • Provide other traits (i.e. FromRobj) or methods (e.g. <T>::from_xxx(), <T>::to_xxx()) for lossy conversions

(For the rationale, please refer to the Appendix)

Conversion specs

Length (scalar or length)

A input whose length is more than one, can be converted from/to

  • Vec<T>
  • &[T]

The conversion should fail when a vector is supplied to a scalar argument

Missing values

A input that is possibly a missing value can be converted from/to

  • Option<T>

The conversion should fail when a missing value is supplied to a non-Option argument

Types

Logical

A logical (i.e. TRUE, FALSE, or NA) can be converted from/to

  • bool
  • Option<bool>
R-to-Rust conversions

The conversions can fail or be lossy in the following cases:

  • attempt to convert NA to a non-Option type (fail)

Note that it seems there's no lossy case.

Rust-to-R conversions

There's no fallible or lossy case.

Integer

An integer (R's integer is 32bit) can be converted to

  • signed integer types (i8, i16, i32, i64)
  • unsigned integer types (u8, u16, u32, u64)
  • floating-point types (f32, f64)
  • either of the above types wrapped with Option
R-to-Rust conversions

The conversions can fail or be lossy in the following cases:

  • convert a value that exceeds <T>::MIN or <T>::MAX to an integer type (fail)
  • convert a negative value to an unsigned integer type (fail)
  • convert NA_integer_ to a non-Option type (fail)

Note that it seems there's no lossy case; even i64::MAX as f32 as i64, the value seems to be preserved.

Rust-to-R conversions

The conversions can fail or be lossy in the following cases:

  • convert a value that exceeds -.Machine$integer.max or .Machine$integer.max

Numeric

A numeric (R's numeric is 64bit c.f. https://stackoverflow.com/q/50217954) can be converted to

  • signed integer types (i8, i16, i32, i64)
  • unsigned integer types (u8, u16, u32, u64)
  • floating-point types (f32, f64)
  • either of the above types wrapped with Option
R-to-Rust conversions

The conversions can fail or be lossy in the following cases:

  • attempt to convert a value that exceeds <T>::MIN or <T>::MAX to an integer type (fail)
  • convert an infinite value to an integer type (fail)
  • convert a NaN to an integer type (fail)
  • convert a negative value to an unsigned integer type (fail)
  • convert NA_real_ to a non-Option type (fail)
  • convert to f32 (lossy)
  • convert an non-integerish value (e.g. 1.5) to an integer type (lossy)
Rust-to-R conversions

There's no fallible or lossy case.

Character

A character can be converted to

  • &str
  • String
  • either of the above types wrapped with Option
R-to-Rust conversions

The conversions can fail or be lossy in the following cases:

  • convert NA_character_ to a non-Option type (fail)

Note that

  • Even when the character is invalid as UTF-8, the operation will never fail because it's handled by std::str::from_utf8_unchecked().
  • The encoding information is always lost, so, in the sense, this might be considered as lossy conversion.
Rust-to-R conversions

There's no fallible or lossy case.

Factor

A factor should be handled in the same manner as character. Only difference is that it's always lossy in that the levels gets lost in conversion.

List

A list can be converted to

  • HashMap<&str, Robj>
  • HashMap<String, Robj>
R-to-Rust conversions

The conversions can fail or be lossy in the following cases:

  • The list contains any unnamed elements (fail)
  • The list contains any duplicated name of elements (fail)

Note that

  • HashMap doesn't have the information about the order of the elements, so, in the sense, this might be considered as lossy conversion.
Rust-to-R conversions

This won't be provided?

Environment

An environment doesn't have a corresponding Rust type.

Function

A function doesn't have a corresponding Rust type.

Appendix

Overview

extendr implements three types of traits for conversions:

  • From
  • TryFrom
  • FromRobj

and provides these name formats of methods:

  • as_*
  • to_*
  • into_*

This document will discuss which should be used for what case, and how the conversion should handle R values.

Rust's conventions

Rust has several conventions and recommendations about the traits and the methods. Let's review them here briefly.

Rust API Guidelines

Rust API Guidelines is

a set of recommendations on how to design and present APIs for the Rust programming language. They are authored largely by the Rust library team, based on experiences building the Rust standard library and other crates in the Rust ecosystem.

This has two sections related to the topic in question.

Conversions use the standard traits From, AsRef, AsMut (C-CONV-TRAITS)

url: https://rust-lang.github.io/api-guidelines/interoperability.html#conversions-use-the-standard-traits-from-asref-asmut-c-conv-traits

The main point is that Into and TryInto should not be implemented as they are provided by blanket implementation based on From and TryFrom.

Ad-hoc conversions follow as_, to_, into_ conventions (C-CONV)

url: https://rust-lang.github.io/api-guidelines/naming.html#ad-hoc-conversions-follow-as_-to_-into_-conventions-c-conv

The main point is this table. For more details with realistic examples, please refer to the URL above.

Prefix Cost Ownership
as_ Free borrowed -> borrowed
to_ Expensive borrowed -> borrowed
borrowed -> owned (non-Copy types)
owned -> owned (Copy types)
into_ Variable owned -> owned (non-Copy types)

The traits' documentation

In short, if the conversion might fail, choose TryFrom, not From.

From

url: https://doc.rust-lang.org/std/convert/trait.From.html

From is

Used to do value-to-value conversions while consuming the input value. It is the reciprocal of Into.

and the important point is

Note: This trait must not fail. If the conversion can fail, use TryFrom.

TryFrom

url: https://doc.rust-lang.org/std/convert/trait.TryFrom.html

TryFrom is

Simple and safe type conversions that may fail in a controlled way under some circumstances. It is the reciprocal of TryInto.

Existing discussions

Can we use From (or even TryFrom) for lossy conversion?

While the document of From doesn't explicitly prohibit lossy conversion, it should probably be avoided.

For example, the the FromLossy / TryFromLossy RFC categorizes the types of the conversions as the following:

Several types of conversion are possible:

  • safe, exact conversions (e.g. u32u64) are handled by the From trait (RFC 529)
  • fail conversions (e.g. u32i8) are handled by the TryFrom trait (RFC 1542)
  • lossy conversions (e.g. i64f32)
  • lossy fail conversions (e.g. f64u64)
  • truncations on unsigned integers (e.g. u64u32, dropping unused high bits)
  • sign-ignoring coercions/transmutations (e.g. i8u8, i64i32); these can yield totally different values due to interpretation of the sign bit (e.g. 3i32 << 14 is 49152=2^14+2^15; converting to i16 yields -16384=2^14-2^15)
  • conversions between types with platform-dependent size (i.e. usize and isize)

They propose to clarify that From can be used only for exact conversions, and even to remove the existing lossy From implementations (and actually they are already removed? I can't find such a From in packed_simd_2):

Nightly rust currently has several implementations of From on SIMD types which should be removed (e.g. f32x4i8x4 (fail) and u64x4f32x4 (lossy i.e. not injective)).

While the RFC is not accepted, I feel we should follow this manner. I'm also wondering if we should move away from using as, but this is probably a battle for another day.

Other references
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment