mraleph/ffi.md

## ffi.md

      
    Raw
  

              ffi.md
            
          
    Dart VM FFI Vision

Background

The aim of Dart FFI project (tracked as Issue #34452) is to provide a low boilerplate, low ceremony & low overhead way of interoperating with native C/C++ code.
The motivation behind this project is twofold:

one of the most common requests for Flutter is a request for a low overhead synchronous mechanism for interacting with native (C/C++) code (see Issue #7053).
We want a replacement for Dart VM C API that reflects how Dart language looks today and the contexts in which it is used.

Currently Flutter has support for interacting with platform specific code written in Java (Kotlin) and Objective-C (Swift) code via platform channels. This mechanism based on asynchronous message passing and requires people to write glue code in both Dart and a respective platform language. It's a high overhead solution, both in terms of performance and boilerplate code that programmer is required to write.
Dart VM provides a C API defined in the dart_api.h header and a mechanism of binding Dart code to native C/C++ code via native extensions. However this mechanism is not integrated with Flutter, and can't be used out of the box.
While it is possible to make necessary changes to Flutter engine and tools to enable developers write VM API based native extensions (see Native Flutter Extensions Prototype doc), we believe that using Dart VM C API is not the right way going forward for the following reasons:


C API is name based e.g. Dart_Handle Dart_GetField(Dart_Handle container, Dart_Handle name);

This makes it AOT unfriendly;
This makes it slow - because the results of the name resolution are not cached;


It is reflective: native functions have signature void (Dart_NativeArguments args) which allows them to accept any arguments and return any results, even though the signature of the function on the Dart side is usually much more strict and provides enough information to the compiler to perform necessary marshalling automatically. Unwrapping arguments and wrapping results requires multiple roundtrips through API boundary - and can't be optimized by the Dart compilation toolchain. One of the core ideas behind FFI is:
If a native function with a statically known signature is bound to a Dart function with a known signature then marshalling of arguments and results based on statically known types is more efficient than reflective marshalling of arguments based on Dart C API.


It is verbose.


Based on these observations we also expect that a more lean way to integrate with native code should also benefit current users of Dart VM C API - for example, we expect that moving Flutter Engine from C API to FFI should significantly reduce overheads associated with crossing the boundary between Dart and native code.
Design Sketch

Note about type system

In general we try to fit FFI design into existing Dart type system as much as possible so that things like code completion and static errors work as expected.
However it would be evident from sections below that it is not always possible, usually due to the lack of type system features that would allow us to encode necessary information into static types and enforce additional typing rules.
This means that FFI implementation potentially would have to come with its own extensions to Dart type system, with rules enforced as an additional Kernel transformation at the CFE level and linter at the analyzer level. Incomplete unification of Dart front-ends unfortunately means that this work will have to be duplicated - just like it is duplicated for other language features.
Accessing Native Types from Dart

The first pillar of an FFI is a way to access native memory from Dart. Design of how this is expressed in Dart code is constrained by Dart semantics:

Dart types are reference types
The mapping between native types and Dart builtin types is usually many to one. For example native int8_t and int32_t both correspond to int type on the Dart side.

Pointers and Primitives

library dart.ffi;

/// Classes representing native width integers from the native side.
/// They are not constructible in the Dart code and serve purely as
/// markers in type signatures.
class _NativeType { }
class _NativeInteger extends _NativeType { }
class _NativeDouble extends _NativeType { }
class Int8 extends _NativeInteger { }
class Int16 extends _NativeInteger { }
class Int32 extends _NativeInteger { }
class Int64 extends _NativeInteger { }
class Uint8 extends _NativeInteger { }
class Uint16 extends _NativeInteger { }
class Uint32 extends _NativeInteger { }
class Uint64 extends _NativeInteger { }
class IntPtr extends _NativeInteger { }
class Float extends _NativeDouble { }
class Double extends _NativeDouble { }
class Void extends _NativeType {}

// Note: do we need to have Char type?
// Note: do we need to have ConstPointer type that only supports loads?

/// An class representing a pointer into the native heap.
abstract class Pointer<T extends _NativeType> extends _NativeType {
  /// Cast Pointer<T> to Pointer<U>.
  Pointer<U> cast<U extends _NativeType>();

  /// Pointer arithmetic (takes element size into account).
  Pointer<T> elementAt(int index);

  /// Pointer arithmetic (byte offset).
  Pointer<T> offsetBy(int offsetInBytes);

  /// Store a value of Dart type R into this location.
  void store<R>(R value);
  
  /// Load a value of Dart type R from this location.
  R load<R>();

  /// Access to the raw pointer value and construction from raw value.
  int toInt();
  factory fromInt(int ptr);
}
Note how store and load methods have their own type parameter R which denotes Dart representation of the stored/loaded value. Unfortunately Dart type system does not allow us to express mutual constraints for T and R (e.g. if  T extends _NativeInteger then R should be int) - this would have to be reported by "FFI-typing pass".
ffi.Pointer<ffi.Int32> ptr;
final i = ptr.load<int>();  // valid
final s = ptr.load<String>();  // compile time error 
Note that we would rely on the FFI typing pass to outlaw usage of Pointer<T> in such a way that T (or R) are not statically known.
// Compile time error: Pointer<T> has to be statically instantiated.
int load<T extends _NativeType>(Pointer<T> p) => p.load();

// Compile time error: R has to be statically instantiated.
R load(Pointer<Int32> p) => p.load();
This restriction exists to ensure that backend can generate simplest, monomorphic code for pointer loads.
Note: load<R> and store<R> could become extension methods of Dart supported them, then you could write
on Pointer<T extends _NativeInteger> {
  int load();
  void store(int value);
}
Alternatives considered to load<R>/store<R>

A question here is what kind of operations can be performed with Pointer<T>. The obvious idea is to allow loading and storing values of type T via this pointer:
abstract class Pointer<T extends _NativeType> {
  void store(T value);
  T load();
}
But that does not make sense, because that would mean that Pointer<Int32> dereferences to Int32 where we would like it to dereference to int - a type that Dart programmers understand how to use. This is similar to how Int32List.[] returns int and not Int32.
Unfortunately the Dart type system does not allow us to write something like this:
abstract class Pointer<T extends _NativeType> {
  void store(Representation(T) value);
  Representation(T) load();
}
Where Representation(T) is:

int when  T extends _NativeInteger;
double when  T extends _NativeDouble;
T when T extends Pointer.

A possible approach is to introduce Pointer subclasses for those different cases:
/// An class representing a pointer into the native heap.
abstract class Pointer<T extends _NativeType> {
  /// Cast Pointer<T> to any other pointer type.
  P cast<P extends Pointer>();
}

abstract class IntPointer<T extends _NativeInteger> extends Pointer<T> {
  void store(int value);
  int load(int value);
}

abstract class DoublePointer<T extends _NativeDouble> extends Pointer<T> {
  void store(double value);
  double load(); 
}

abstract class PointerPointer<T extends Pointer> extends Pointer<T> {
  void store(T value);
  T load();
}
Another possible way to look at this is to think that any Pointer<T> can be converted to typed array:
/// An class representing a pointer into the native heap.
abstract class Pointer<T extends _NativeType> {
  U asList<U extends TypedData>();
}
Then you could write code like this:
import 'dart:ffi' as ffi;

ffi.Pointer<ffi.Void> ptr;
// Equivalent of ptr.cast<IntPointer<Int32>>.store(10);
ptr.asArray<Int32List>()[0] = 10; 
Note: Dart type system does not allow us to express constraint on U that would ensure that U is a concrete subclass of TypedData and not for example TypedData itself.
This approach leads to a very verbose PointerPointer<IntPointer<Int32>> types.
Allocating and Freeing Memory

library dart.ffi;

/// Allocate [count] elements of type [T] and return a pointer
/// to the newly allocated memory.
Pointer<T> allocate<T extends _NativeType>({int count: 1});

/// Free memory pointed to by [p].
void free<P extends Pointer>(P p);

/// Return a pointer object that has a finalizer attached to it. When this
/// pointer object is collected by GC the given finalizer is invoked.
///
/// Note: the pointer object passed to the finalizer is not the same as 
/// the pointer object that is returned from [finalizable] - it points
/// to the same memory region but has different identity. 
Pointer<T> finalizable<T>(Pointer<T> p, void finalizer(Pointer<T> ptr))
Structures/Unions

In general just pointers themselves are enough to work with structured data:
import 'dart:ffi' as ffi;

/// Same as
///
///     struct Point { 
///       double x;
///       double y; 
///       Point* next;
///    };
///
class Point {
  final _ptr = ;

  Point.fromPtr(Pointer<ffi.Void> ptr) : _ptr = ptr.cast<ffi.Uint8>();
  
  Point(double x, double y, Point next) : 
    _ptr = ffi.allocate<ffi.Uint8>(
       count: ffi.sizeOf<ffi.Double>() * 2 + 
              ffi.sizeOf<ffi.Pointer<Void>>()) {
    this.x = x;
    this.y = y;
    this.next = next;
  }

  ffi.Pointer<ffi.Double> get _xPtr => 
    _ptr.offsetBy(0).cast<ffi.Double>();
  set x (double v) { _xPtr.store(v); }
  double get x => _xPtr.load();

  ffi.Pointer<ffi.Double> get _yPtr => 
    _ptr.offsetBy(ffi.sizeOf<ffi.Double>() * 1).cast<ffi.Double>();
  set y (double v) { _yPtr.store(v); }
  double get y => _yPtr.load();

  ffi.Pointer<ffi.Pointer<ffi.Void>> get _nextPtr =>
    _ptr.offsetBy(ffi.sizeOf<ffi.Double>() * 2).cast<ffi.Double>();
  set next (Point v) { _nextPtr.store(v._ptr); }
  Point get next => Point.fromPtr(_nextPtr.load()); 
}
However this sort of code is very verbose, so we want to hide it under a layer of syntactic sugar.  The core idea is that we use normal field declarations to describe the layout and each field has two types associated with it:

normal Dart type of the field specifies how type is exposed to Dart code;
an annotation specifies the native storage format for the corresponding field.

For example the declaration like this:
import 'dart:ffi' as ffi;

@ffi.struct  // Specifies layout (either ffi.struct or ffi.union)
class Point extends ffi.Pointer<Point> {
  @ffi.Double()  // () are confusing :-(
  double x;
  
  @ffi.Double()
  double y;

  @ffi.Pointer()  // To distinguish from the case when one struct embeds
  Point next;     // another by value.
}
Can be transformed by front-end in a way that matches a more verbose declaration from above.
Note: Few questions to answer here:

How to conveniently cast Pointer<Point> to Point?
What kind of constructors should Point have?
...

Structure Layouts and Portability

Structure layouts are inherently non-portable between platforms. For example struct stat used by POSIX file status APIs has different layout on Mac OS X and Linux.
Dart does not have an equivalent of the preprocessor so specifying platform specific layouts require some other mechanism.
A potential way to do could be something like this:
@ffi.struct({
  'x64 && linux': { // Layout on 64-bit Linux
    'x': ffi.Field(ffi.Double, 0),
    'y': ffi.Field(ffi.Double, 8),
    'next': ffi.Field(ffi.Double, 16)
  },
  'arm && ios': {  // Layout on 32-bit iOS
    'x': ffi.Field(ffi.Float, 4),
    'y': ffi.Field(ffi.Float, 8),
    'next': ffi.Field(ffi.Pointer, 0)
  },
})
class Point extends ffi.Pointer<Point> {
  double x;
  double y;
  Point next;
}
Function types

Before we dive into the details of how function pointers can be represented in Dart, let us outline what has to happen to invoke a native function from Dart and vice versa.
Invoking Native Function from Dart

To invoke a native function from Dart we need:

Convert outgoing arguments from their Dart representation into native representation; 

Note: an important decision here is to decide how much of an automatic argument marshalling we want to allow, e.g. does String get automatically converted to uint8_t* or programmer must do this conversion explicitly when invoking function?
If callee can re-enter Dart ("non-leaf"): record the exit frame information for Dart GC to be able to find it; 

Note: declaring that function is a leaf (= will not enter Dart code) is an optimization because it simplifies marshalling of arguments, transition from Dart into native code and also allows optimizations across such function call - because such function can't affect pure Dart objects. If a function is a leaf then converting String into const uint8_t* parameter might be as simple as passing a pointer into String-s body (if string is a one-byte string);  

Note: in general this also means that FFI can't (easily) interoperate with native non-local control-flow (longjmp or exceptions) when control is transferred from one native frame to another native frame bypassing Dart frames sandwiched in between. (there are ways to interoperate with exceptions - but they are non-trivial and are left outside of the scope for now).
Arrange outgoing arguments on the stack and registers according to the calling convention of the callee;
Invoke callee;
When callee returns we need to convert result into Dart representation and tear down the exit frame. 

Note: an important question here is how to represent structs returned by value? [the closest idea is to allocate them on the native heap and return a pointer with a finalizer instead of a value].

Invoking Dart Function from Native

Invoking Dart Function from Native is not that different from the process described above - steps are just somewhat inverted. There are just a few questions to answer:

Do we allow invocation of closures and _class methods _or we limit ourselves to static functions?
If yes, how are these represented in native code and how receivers are represented in native code. (Note: previously we talked only about passing native data back and forth. Passing Dart objects into native code requires a handle system, so that GC would know).
Do we expect the thread invoking a Dart function to be attached to the _isolate _(e.g. via Dart_EnterIsolate API call)? Do we guard against possibilities that user might misuse the FFI and try to invoke a function (e.g. callback) on a wrong thread? Should the FFI be structured in a way that highlights the possibility of such error, and allows to report it - or should we just crash?

Representing Function Pointers

Imagine we want to convert this code to Dart FFI:
typedef int32_t (*binary_t)(int32_t x, int32_t y); 
struct Ops {
  binary_t add;
  binary_t sub;
};

// Invoke by pointer
int32_t invoke(binary_t f, int32_t x, int32_t y) {
  return f(x, y);
}
We can follow the same design we had for fields: use a combination of two types, one that describes native nature of a function pointer and one that describes how it will be used from Dart. For example we could extend Pointer class with a way to coerce it to a Dart function and also create the NativeFunction class which would represent the type of native functions:
library dart.ffi;

abstract class Pointer<T extends _NativeType> {
  // Should only be valid if T is a function type. Creates a function that
  // will marshall all incoming parameters, perform an invocation via
  // this pointer and then unmarshall the result. 
  U asFunction<U extends Function>();
} 

class NativeFunction<T extends Function> extends _NativeType {
} 
Which can be used like this:
import 'dart:ffi' as ffi;

typedef ffi.Int32 NativeBinaryOp(ffi.Int32, ffi.Int32);
typedef int BinaryOp(int, int);

@ffi.struct 
class Ops extends ffi.Pointer<Ops> {
  // Front-end ensures that type of the annotation is 
  @ffi.NativeFunction<ffi.Int32 Function(ffi.Int32, ffi.Int32)>()
  BinaryOp add;

  @ffi.NativeFunction<ffi.Int32 Function(ffi.Int32, ffi.Int32)>()
  BinaryOp sub;
}

// Invoke by pointer. Note: have to write ffi.Pointer<NativeFunction<...>>
// because Pointer constraints T to be a subtype of _NativeType.
void invoke(ffi.Pointer<NativeFunction<NativeBinaryOp>> op, int x, int y) {
  op.asFunction<BinaryOp>()(x, y);
}
Note: we want the code to be AOT compilable to we will specify that invocation Pointer<F>.asFunction<G>() only depends on static values of F and G and not on reified type of the receiver - otherwise we can not precompile all necessary marshalling stubs. (a language feature to specify generic invariance / exactness would be beneficial here).
This looks relatively clean, but unfortunately it does not capture some of the information required:

Calling convention;
Whether function is a leaf or not.

Unfortunately it is not entirely clear what is the best way to encode this information into the type of the pointer. One possible way is to do something like this:
class _CallingConvention {}
class Cdecl extends _CallingConvention {}
class StdCall extends _CallingConvention {}

class _Leafness {}
class Leaf extends _Leafness {}
class NotLeaf extends _Leafness {}

class NativeFunction<T extends Function, 
                     CC extends _CallingConvention, 
                     L extends _Leafness> extends _NativeType {
}
But this might be too verbose (especially because Dart does not support default values for type parameter values).
Conversion between builtin types and native types

TODO(vegorov) describe helpers that can be used to convert for example between pointer and a string, pointer and array; note: can use external typed data and strings for efficient conversions.
Converting Dart Functions to Function Pointers

What if native function requires you to pass a callback in?
typedef intptr_t (*callback_t)(void* baton, void* something);
void with_something(callback_t cb, void* baton);
If we want to invoke this from Dart, how do we pass a function in?
For simplicity, initially we should only allow to pass down _static methods _- this is very simple to implement because static methods could simply have redirecting trampolines.
For APIs that allow to associate _batons _with callbacks users can use handmade persistent handles to pass closures along these lines:
typedef int Callback(ffi.Pointer<ffi.Void> something);

int _id = 0;
final _i2cb = <int, Callback>{};
final _cb2i = <Callback, int>{};

int _trampoline(ffi.Pointer<ffi.Void> baton, ffi.Pointer<ffi.Void> something) {
  _i2cb[baton.toInt()](something);
}

ffi.Pointer<ffi.Void> _toHandle(Callback cb) {
  return ffi.Pointer<ffi.Void>.fromInt(_cb2i.putIfAbsent(cb, () {
    _i2cb[_id] = cb;
    return _id++;
  }));
}

void withSomething(Callback cb) {
  with_something(_trampoline, _toHandle(cb));
}
Note that this will be leaking memory - so this really only works for APIs that are oneshot or support deregistration.
For APIs that don't have batons there is still a way to pass closures as a function pointer - by having a closure specific trampoline for each different closure, however this works only if the number of closured passed to the other side is small (because AOT has to pregenerate fixed number of trampolines) and again only really works for APIs which support both registration and deregistration.
Binding Native Code to Dart Methods

A previous section already covers a possibility of invoking native code from Dart via function pointers.  So if dart:ffi library provides dlopen / dlsym like primitives that alone would already be enough to cross the boundary in that direction.
library dart.ffi;

class DynamicLibrary {
  // Equivalent of dlopen
  factory DynamicLibrary.open(String name);

  // Equivalent of dlsym
  Pointer<SymbolType> lookup<SymbolType extends _NativeType>(String symbolName);

  // Helper that combines lookup and cast to a Dart function.  
  // Note: user code is would not be permitted to be generic like this.
  // However FFI own code can.
  // Note: ignoring leafness and calling convention for brevity.
  F lookupFunction<SymbolType extends Function, F extends Function>(String symbolName) {
    return lookup<SymbolType>(symbolName)?.asFunction<F>();
  }
}
import 'dart:ffi' as ffi;

// Invoke int32_t add(int32_t, int32_t) from library libfoo.so
final lib = DynamicLibrary.open('libfoo.so');
final add = lib.lookupFunction<ffi.Int32 Function(ffi.Int32, ffi.Int32), int Function(int, int)>('add');
print(add(1, 2));
However the code in this style is unnecessary verbose, so we should also provide a declarative way of binding Dart functions to native functions. For example:
library dart.ffi;

/// An annotation that can be used to make FE/VM generate binding code
/// between an extern static method declaration and native code.
class Import<NativeType> {
  /// Native library that contains the target native method.
  /// Can be null - then the symbol is resolved globally.
  final String library;

  /// Symbol to bind to.
  final String symbol;

  /// Specifies whether the target function is expected to call 
  /// the Dart code back.
  final bool isLeaf;

  final callingConvention;

  const Import({
    this.library,
    this.symbol,
    this.isLeaf: true,
    this.callingConvention: Cdecl  // Note: Cdecl is a Type literal.
  });
}
import 'dart:ffi' as ffi;

@ffi.Import<ffi.Int32 Function(ffi.Int32, ffi.Int32)>(
  library: 'foo',  // Q: should mangle library name in platform specific way?
  symbol: 'add',
)
extern int nativeAdd(int a, int b);

@ffi.Import<ffi.Int32>(symbol: 'g_counter')
extern int globalCounter;
About platform specific bindings

TODO(vegorov) we need to consider how bindings that are portable between different platforms (e.g. Linux, Android, MacOS, iOS, Windows, etc) would have to be structured.
This potentially requires using conditional imports.
Generating Dart bindings from C headers

TODO(vegorov) As a stretch goal we could imagine a tool that could generate Dart bindings from C Headers.
Binding Native Functions to Dart Code

[Note: this is a stretch goal and is not going to be included into the initial prototype]
Imagine an inverse of a problem that we described in the previous section. We have Dart program defining a static method int add(int x, int y) and we would like to invoke it from the C++ code.
The core of the idea here is to introduce annotation ffi.Export:
library foo;

@ffi.Export<ffi.Int32 Function(ffi.Int32, ffi.Int32)>(symbol: 'add')
int add(int a, int b) => a + b;
this annotation would instruct VM to generate an externally callable trampoline with a corresponding native signature int32_t (int32_t, int32_t).
From native code developer can then do:
typedef int32_t (*add_t)(int32_t, int32_t);
add_t f = Dart_LookupFFIExport("foo", "add");
f(1, 2);
Note that this can be taken further - we can have a tool that would generate binding modules from annotations, that contain the following code:
#if defined(DART_AOT_USING_DLL)
// AOT compiler would generate a symbol that can be hooked up by 
// the normal dynamic linkage process.
extern "C" int32_t dart_foo_add(int32_t x, int32_t y);
#else
// In JIT or blob based AOT we have to lookup dynamically.
int32_t dart_foo_add(int32_t x, int32_t y) {
  static int32_t (*f) (int32_t, int32_t) = Dart_LookupFFIExport("foo", "add");
  return f(x, y);
}
#endif