Skip to content

Instantly share code, notes, and snippets.

@lionel-
Last active July 15, 2024 13:09
Show Gist options
  • Save lionel-/1ebcbd5ec69c0775d514c329522408a3 to your computer and use it in GitHub Desktop.
Save lionel-/1ebcbd5ec69c0775d514c329522408a3 to your computer and use it in GitHub Desktop.
Compatibility of tidyverse with the public C API of R

Summary of meeting between Tidyverse members and Luke Tierney at useR! 2024.

Frame/Environment inspection

Frontends and low level tools need to know what kind of bindings they are dealing with. Objectives include:

  • Avoiding side effects such as triggering a promise or causing a missing argument error. Low level tools often can't afford to protect against those for every variable lookup. Figuring out what happened by inspecting errors is also ambiguous, and sometimes impossible (promises may cause longjumps in a variety of ways).

  • Transparency in debugging/development settings. Providing context to the user about what's going to happen if they attempt to retrieve the value of a binding (i.e. an active binding invokation, a promise forcing leading to the evaluation of such and such expression, etc).

  • Completeness of the API to inspect and manipulate bindings. It should be possible to write an environment cloner using these tools: Iterate over bindings, retrieve type, given type, retrieve components (prexpr, prenv, active binding function, etc), given components, create duplicate binding in new environment.

  • tidyeval (the NSE framework for the tidyverse) needs to obtain both the expression and the original frame environment of substituted dots.

API considerations

Binding type

Existing API:

Rboolean R_existsVarInFrame(SEXP env, SEXP sym);  // Unfortunate inconsistency in param order
Rboolean R_BindingIsActive(SEXP sym, SEXP env);

New API:

typedef enum {
    R_BindingTypeUnbound = 0,          /* Unbound in this environment */
    R_BindingTypeValue = 1,            /* Direct value binding */
    R_BindingTypeMissing = 2,          /* Missing argument */
    R_BindingTypeDelayedPromise = 3,   /* Delayed promise */
    R_BindingTypeForcedPromise = 4,    /* Forced promise */
    R_BindingTypeActive = 5,           /* Active binding */
} R_BindingType;

R_BindingType R_GetBindingType(SEXP sym, SEXP env);

Binding components

Existing:

SEXP R_ActiveBindingFunction(SEXP sym, SEXP env);

New:

SEXP R_DelayedPromiseBindingExpression(SEXP sym, SEXP env);
SEXP R_DelayedPromiseBindingEnvironment(SEXP sym, SEXP env);

SEXP R_ForcedPromiseBindingExpression(SEXP sym, SEXP env);

Binding creation

Existing:

void R_MakeActiveBinding(SEXP sym, SEXP fun, SEXP env);
void Rf_setVar(SEXP sym, SEXP value, SEXP env); // Value
void R_removeVarFromFrame(SEXP sym, SEXP env);  // Unbound

New:

void R_MakeDelayedPromiseBinding(SEXP sym, SEXP promiseExpr, SEXP promiseEnv, SEXP env);
void R_MakeForcedPromiseBinding(SEXP sym, SEXP promiseExpr, SEXP env);
void R_MakeMissingBinding(SEXP sym, SEXP env);

We need a way to create forced promises that work with substitute(). This could be achieved by passing a NULL environment or by splitting the constructor into two variants.

Simpler promise API

If we use a NULL environment as an indicator for forced promises, we can simplify the API by sharing the type, accessors, and constructor:

typedef enum {
    R_BindingTypeUnbound = 0,    /* Unbound in this environment */
    R_BindingTypeValue = 1,      /* Direct value binding */
    R_BindingTypeMissing = 2,    /* Missing argument */
    R_BindingTypePromise = 3,    /* Delayed or forced promise */
    R_BindingTypeActive = 4,     /* Active binding */
} R_BindingType;

SEXP R_PromiseBindingExpression(SEXP sym, SEXP env);
SEXP R_PromiseBindingEnvironment(SEXP sym, SEXP env);

void R_MakePromiseBinding(SEXP sym, SEXP promiseExpr, SEXP promiseEnv, SEXP env);

Iterating over dots

Useful to do at C level for two things:

typedef enum {
    R_DotsBindingTypeValue = 0,      /* Direct value binding */
    R_DotsBindingTypePromise = 1,    /* Delayed or forced promise */
} R_DotsBindingType;

typedef struct {
    R_DotsBindingType type;    
    SEXP name;
} R_DotsIteratorItem;

/* Returns a private LISTSXP containing: the iterator state as a RAWSXP in the
   CAR, a protecting container in the CDR (for extra safety we might want to
   protect the current binding), and a type identifier in the TAG (for runtime
   error checking). The caller must protect this object and consider it opaque. 
   
   The behaviour in case `env` does not contain a DOTSEXP could be an error
   (check the binding type for `...` beforehand) or an empty iterator. */
SEXP R_MakeDotsIterator(SEXP env);

/* Returns true if advanced, in which case `item` is safely readable. */
Rboolean R_DotsNext(SEXP dotsIterator, R_DotsIteratorItem *item);

SEXP R_DotsPromiseBindingExpression(SEXP dotsIterator);
SEXP R_DotsPromiseBindingEnvironment(SEXP dotsIterator);
SEXP R_DotsValueBinding(SEXP dotsIterator);

SEXP iter = R_MakeDotsIterator(env);
R_DotsIteratorItem item;

while (R_DotsNext(iter, &item)) {
    switch (item.type) {
        case R_DotsBindingTypeValue: Rf_PrintValue(R_DotsValueBinding(iter)); break;
        case R_DotsBindingTypePromise: Rf_PrintValue(R_DotsPromiseBindingExpression(iter)); break;
    }
}

Attributes

Currently our main concern is avoid materialising row names. In the future, getAttrib() should return an altrep string sequence for automatic row names. In the meantime, if an object already has altrep row names, it should not materialise it, which is currently the case via INTEGER().

It might be useful to have a way of getting and setting a list of attributes, but we'll first try to manage without that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment