skids/coerce-native

## coerce-native
A deeper look at coercion.

This is both an elaboration on a few points that were touched on
in ab5tract's Day 4 advent post, and a deep look at the definition
of "coerce" focusing on the soon-to-be-timely subject of buffers
and native data types, and "coercive types".

The most important part of the specification when it comes to
defining what it means to "coerce" is this (in S02):

    ...if you say:

         $fido = Dog.new($spot)

    it certainly creates a new C<Dog> object.  But if you say:

         $fido = Dog($spot)

    it might call C<Dog.new>, or it might pull a C<Dog> with Spot's
    identity from the dog cache, or it might do absolutely nothing if
    C<$spot> already knows how to be a C<Dog>. ...

It is this last part that is critical, because a mutable object
coerced to another type may be modifiable through the product
of that coercion.

Currently on Rakudo that does not seem to be the case for built-in types.
We can look at the simple case of a type being cast to itself:

    my @a = Array.new(0,1,2);
    my @b := Array(@a);
    @b[0] = 2;
    @a.say; # 0 1 2 ... this is not guaranteed on all implementations
    @b.say; # 2 1 2
    @b := @a.Array;
    @b[0] = 2;
    @a.say; # 0 1 2 ... neither is this guaranteed to be same as above
    @b.say; # 2 1 2

As an aside, the second comment merely draws attention to the fact that
the rules for finding the right way to "coerce" are different for the
method and sub forms, and there are effectively two different paths by
which we can define coercion -- one from the coercer and the other
from the ceorcee.

Anyway, any module author can decide to do it this way:

    class A {
        has $.foo is rw;
        method Int () is rw { $!foo }
    }
    my $a = A.new(:foo(4));
    my $b := Int($a);
    $b = 5;
    $a.foo.say; # 5

Of course, in the majority of cases coercion is simply used to
read from the object, and in a lot of cases there is no good
way to implement writing back to the orginal object from the
product of the coercion.

Now take a look at the next sentence in the spec:

    As a fallback, if no method responds to a coercion request, the
    class will be asked to attempt to do C<Dog.new($spot)> instead.

When this scenario happens, the product of the coercion may end
up being a mutable copy of the thing we coerced.  So to recap,
a coercion may produce:

1) A mutable copy of the original object
2) An immutable view into the original object that pretends to
   be the new type for all readable purposes
3) A mutable view into the original object that allows
   modification of the original object
4) Some mixture of 2 and 3 for complex classes

TLDR: Coercion is at heart a way to fit square pegs in round
holes, and those pegs can be mashed in with a wide selection
of hammers.

Modules, of course, can document what their behavior is
in their manpage along with all the other particulars of
a module.  For base types we have a situation where a coder
will have to know which types coerced to which other types
result in a writeable view of the original value, which
of those views are copies as opposed to write-back, and on
which implementations that behavior changes.

It is tempting to say "I don't want to memorize another table
like operator precedence chart, let's just spec all base type
coercions to behave the same way."  In fact this is what has
been done with "coercion types".  Oh but wait, we are
getting ahead of ourselves for those not on IRC.  What's a
coercion type?

That's this:

    sub foo (Int(Rat) $f) { $f + 1 };
    foo(5/4).say;  # 2

The above did not even parse until today when the 6pe merged.  This
deprecated syntax does what the above is supposed to on Rakudo,
for those that don't take their code intraveinously:

    sub foo (Rat $f as Int) { $f + 1 };
    foo(5/4).say; # 2

This is not merely a syntactic macro that converts the above to:

    sub foo (Rat $f) { $f = Int($f); $f + 1 };

...as the "Int(Rat)" is actually going to be a type unto
itself, apparently.  Which means you can introspect it.  Which,
we will see later, might be a very good thing.

So, where were we?  Oh yes, wha the spec currently says about
"coercive types" is this:

   This only works for one-way coercion, so you may not declare any C<rw>
   parameter with a coercive type.

I suspect there might be some back-pressure from users over that clause
in the spec, especially when it comes to native arrays and buffers, because
it seems kind of arbitrary to force just those cases to do it longhand,
and also it is less introspectable that way.

So let's talk about bufs and native arrays for a bit.  First let's look
at what the spec has to say about the relationship between buffers and
native typed arrays a.k.a "compact arrays":

    A compact array is for most purposes interchangeable with the
    corresponding buffer type.  For example, apart from the sigil,
    these are equivalent declarations:

        my uint8 @buffer;
        my buf8 $buffer;

    (Note: If you actually said both of those, you'd still get two
    different names, since the sigil is part of the name.)

    So given C<@buffer> you can say

        $piece = substr(@buffer, $beg, $end - $beg);

    and given C<$buffer> you can also say

        @pieces = $buffer[$n ..^ $end];

I haven't been able to find it in the spec explicitly, but in addition to
the above, this is currently implemented behavior:

    $ perl6 -e 'my @b := buf8.new(1,2,3); @b[1].say; @b.say'
    2
    Buf[uint8]:0x<01 02 03>

...which makes sense since a buffer does Positional.  Another critical
part of the spec is the following:

    the presence of a low-level type tells Perl that it is free to
    implement the array with "compact storage", that is, with a chunk
    of memory containing contiguous (or as contiguous as practical)
    elements of the specified type without any fancy object boxing that
    typically applies to undifferentiated scalars.

...but Buf is defined as:

    A mutable container for an array of integer values in contiguous
    memory.

So Bufs are guaranteed to be stored contiguously in memory, while
native typed arrays are only contiguous on the backend to the point
that it is practical to do so.  NativeCall users take note: Buf/Blob is
what is safe to pass to C functions.

This tells us that we have to look really hard as what "is rw", which
normally pertains only to the "container" part of an argument, mean in
the case of these objects when they are bound directly to an @-sigiled
parameter.  I don't know the answer to that; I suspect it will be
banged out in time (last I saw from outside "the loop", there was some
chafe between implementation and spec as to how deep the default
read-only protection is supposed to go even on undifferentiated Scalar
arrays.)

The arrival of native typed arrays will remove roadblocks to more
sophisticated handling of buffers than we have had previously.  Also the
new support for Buf in NativeCall is going to have module authors
working with more buffers in ways that have not been thoroughly
exercised before.  Buffers are different than objects in that they
have an increased tendency to be very large.  In the case of crypto
functionality, they are also potential targets for a lot of iterative
math, and could play a big role on the back-end of hyperoperators.

So, efficiency topics like this come up:

grondilu: anyway so me, if I had to convert a Buf[uint8] to a Buf[uint16],
          I'd first get the list of bytes, group them by two and then
          create the corresponding 16-bits words.
jnthn:    I'm sure we can provide a better way to do that :)
grondilu: ideally there should be a constructor candidate that takes a
          buffer of an other type as argument.
          something like:  my Buf[uint16] $a .= new: Buf[uint8].new: ^10
grondilu: That's not a bad idea.

Now, it isn't clear exactly what the use case was here: there are indeed
situations where you need to copy the contents of a buf8 into a buf16
and then modify the buf16 while leaving the original buf8 untouched.
There are many more situations where all you need to do is read values
from the same memory area as the buf8 but read them as words, so there's
no good reason to be copying all the values to a new memory location.
And finally, there are some situations where you need to have writes to
the buf16 not only alter the buf16, but also alter the corresponding
values in the original buf8 view.

And while construction can always be used to get copy semantics, the
specification of "coercion" is (neccessarily) too broad to allow us
to ask explicitly for the other two behaviors.  Also, even when you
want copy semantics, you may want "copy on write" a.k.a. lazy mutation,
a.k.a. "COW" semantics, so that large buffers are not copied until
someone actually decides to write to them.  Or you may not, depending
on when you want the performance hit to occur.

Now, structured native data (CStruct in NativeCall, "compact structs" in
the spec) is also supposed to behave as if it is packed, even if the
implementation plays tricks on the back-end.  That behavior means you
can pass it back to C (or whatever) as a properly serialized structure.
Specifically the spec says:

    The packing serialization is performed by coercion to an appropriate
    buffer type.  The unpacking is performed by coercion of such a buffer
    type back to the type of the compact struct.

    Of course, a lazy implementation will probably find it easiest just
    to keep the object in its serialized form all the time.  In particular,
    an array of compact structs must be stored in their serialized form
    (see next section).

Again, Buf is what is safe to pass to NativeCall, though NativeCall has
rules about its REPRs that make this seamless by skipping a manual Buf
coercion.  Also again, the definition of "coerce" when it comes to
mutability, write-back, and COW behavior is left up to the implementation
and also to to indiviudual modules.

TDLR: There are 4 types of behavior C interfacers and pure-Perl6
data acrobats will need to be able to explicitly ask Perl 6 for when
working with native data aggregates in their serialized Buf forms:

1) Read-only views with no copy performed when possible.
2) Mutable copies that are copied when they are created.
3) Mutable copies that copy-on-write ("COWercion"?)
4) Mutable views that write mutations back to the originating object.

...and this is currently unspecced territory.
	A deeper look at coercion.

	This is both an elaboration on a few points that were touched on
	in ab5tract's Day 4 advent post, and a deep look at the definition
	of "coerce" focusing on the soon-to-be-timely subject of buffers
	and native data types, and "coercive types".

	The most important part of the specification when it comes to
	defining what it means to "coerce" is this (in S02):

	...if you say:

	$fido = Dog.new($spot)

	it certainly creates a new C<Dog> object. But if you say:

	$fido = Dog($spot)

	it might call C<Dog.new>, or it might pull a C<Dog> with Spot's
	identity from the dog cache, or it might do absolutely nothing if
	C<$spot> already knows how to be a C<Dog>. ...

	It is this last part that is critical, because a mutable object
	coerced to another type may be modifiable through the product
	of that coercion.

	Currently on Rakudo that does not seem to be the case for built-in types.
	We can look at the simple case of a type being cast to itself:

	my @a = Array.new(0,1,2);
	my @b := Array(@a);
	@b[0] = 2;
	@a.say; # 0 1 2 ... this is not guaranteed on all implementations
	@b.say; # 2 1 2
	@b := @a.Array;
	@b[0] = 2;
	@a.say; # 0 1 2 ... neither is this guaranteed to be same as above
	@b.say; # 2 1 2

	As an aside, the second comment merely draws attention to the fact that
	the rules for finding the right way to "coerce" are different for the
	method and sub forms, and there are effectively two different paths by
	which we can define coercion -- one from the coercer and the other
	from the ceorcee.

	Anyway, any module author can decide to do it this way:

	class A {
	has $.foo is rw;
	method Int () is rw { $!foo }
	}
	my $a = A.new(:foo(4));
	my $b := Int($a);
	$b = 5;
	$a.foo.say; # 5

	Of course, in the majority of cases coercion is simply used to
	read from the object, and in a lot of cases there is no good
	way to implement writing back to the orginal object from the
	product of the coercion.

	Now take a look at the next sentence in the spec:

	As a fallback, if no method responds to a coercion request, the
	class will be asked to attempt to do C<Dog.new($spot)> instead.

	When this scenario happens, the product of the coercion may end
	up being a mutable copy of the thing we coerced. So to recap,
	a coercion may produce:

	1) A mutable copy of the original object
	2) An immutable view into the original object that pretends to
	be the new type for all readable purposes
	3) A mutable view into the original object that allows
	modification of the original object
	4) Some mixture of 2 and 3 for complex classes

	TLDR: Coercion is at heart a way to fit square pegs in round
	holes, and those pegs can be mashed in with a wide selection
	of hammers.

	Modules, of course, can document what their behavior is
	in their manpage along with all the other particulars of
	a module. For base types we have a situation where a coder
	will have to know which types coerced to which other types
	result in a writeable view of the original value, which
	of those views are copies as opposed to write-back, and on
	which implementations that behavior changes.

	It is tempting to say "I don't want to memorize another table
	like operator precedence chart, let's just spec all base type
	coercions to behave the same way." In fact this is what has
	been done with "coercion types". Oh but wait, we are
	getting ahead of ourselves for those not on IRC. What's a
	coercion type?

	That's this:

	sub foo (Int(Rat) $f) { $f + 1 };
	foo(5/4).say; # 2

	The above did not even parse until today when the 6pe merged. This
	deprecated syntax does what the above is supposed to on Rakudo,
	for those that don't take their code intraveinously:

	sub foo (Rat $f as Int) { $f + 1 };
	foo(5/4).say; # 2

	This is not merely a syntactic macro that converts the above to:

	sub foo (Rat $f) { $f = Int($f); $f + 1 };

	...as the "Int(Rat)" is actually going to be a type unto
	itself, apparently. Which means you can introspect it. Which,
	we will see later, might be a very good thing.

	So, where were we? Oh yes, wha the spec currently says about
	"coercive types" is this:

	This only works for one-way coercion, so you may not declare any C<rw>
	parameter with a coercive type.

	I suspect there might be some back-pressure from users over that clause
	in the spec, especially when it comes to native arrays and buffers, because
	it seems kind of arbitrary to force just those cases to do it longhand,
	and also it is less introspectable that way.

	So let's talk about bufs and native arrays for a bit. First let's look
	at what the spec has to say about the relationship between buffers and
	native typed arrays a.k.a "compact arrays":

	A compact array is for most purposes interchangeable with the
	corresponding buffer type. For example, apart from the sigil,
	these are equivalent declarations:

	my uint8 @buffer;
	my buf8 $buffer;

	(Note: If you actually said both of those, you'd still get two
	different names, since the sigil is part of the name.)

	So given C<@buffer> you can say

	$piece = substr(@buffer, $beg, $end - $beg);

	and given C<$buffer> you can also say

	@pieces = $buffer[$n ..^ $end];

	I haven't been able to find it in the spec explicitly, but in addition to
	the above, this is currently implemented behavior:

	$ perl6 -e 'my @b := buf8.new(1,2,3); @b[1].say; @b.say'
	2
	Buf[uint8]:0x<01 02 03>

	...which makes sense since a buffer does Positional. Another critical
	part of the spec is the following:

	the presence of a low-level type tells Perl that it is free to
	implement the array with "compact storage", that is, with a chunk
	of memory containing contiguous (or as contiguous as practical)
	elements of the specified type without any fancy object boxing that
	typically applies to undifferentiated scalars.

	...but Buf is defined as:

	A mutable container for an array of integer values in contiguous
	memory.

	So Bufs are guaranteed to be stored contiguously in memory, while
	native typed arrays are only contiguous on the backend to the point
	that it is practical to do so. NativeCall users take note: Buf/Blob is
	what is safe to pass to C functions.

	This tells us that we have to look really hard as what "is rw", which
	normally pertains only to the "container" part of an argument, mean in
	the case of these objects when they are bound directly to an @-sigiled
	parameter. I don't know the answer to that; I suspect it will be
	banged out in time (last I saw from outside "the loop", there was some
	chafe between implementation and spec as to how deep the default
	read-only protection is supposed to go even on undifferentiated Scalar
	arrays.)

	The arrival of native typed arrays will remove roadblocks to more
	sophisticated handling of buffers than we have had previously. Also the
	new support for Buf in NativeCall is going to have module authors
	working with more buffers in ways that have not been thoroughly
	exercised before. Buffers are different than objects in that they
	have an increased tendency to be very large. In the case of crypto
	functionality, they are also potential targets for a lot of iterative
	math, and could play a big role on the back-end of hyperoperators.

	So, efficiency topics like this come up:

	grondilu: anyway so me, if I had to convert a Buf[uint8] to a Buf[uint16],
	I'd first get the list of bytes, group them by two and then
	create the corresponding 16-bits words.
	jnthn: I'm sure we can provide a better way to do that :)
	grondilu: ideally there should be a constructor candidate that takes a
	buffer of an other type as argument.
	something like: my Buf[uint16] $a .= new: Buf[uint8].new: ^10
	grondilu: That's not a bad idea.

	Now, it isn't clear exactly what the use case was here: there are indeed
	situations where you need to copy the contents of a buf8 into a buf16
	and then modify the buf16 while leaving the original buf8 untouched.
	There are many more situations where all you need to do is read values
	from the same memory area as the buf8 but read them as words, so there's
	no good reason to be copying all the values to a new memory location.
	And finally, there are some situations where you need to have writes to
	the buf16 not only alter the buf16, but also alter the corresponding
	values in the original buf8 view.

	And while construction can always be used to get copy semantics, the
	specification of "coercion" is (neccessarily) too broad to allow us
	to ask explicitly for the other two behaviors. Also, even when you
	want copy semantics, you may want "copy on write" a.k.a. lazy mutation,
	a.k.a. "COW" semantics, so that large buffers are not copied until
	someone actually decides to write to them. Or you may not, depending
	on when you want the performance hit to occur.

	Now, structured native data (CStruct in NativeCall, "compact structs" in
	the spec) is also supposed to behave as if it is packed, even if the
	implementation plays tricks on the back-end. That behavior means you
	can pass it back to C (or whatever) as a properly serialized structure.
	Specifically the spec says:

	The packing serialization is performed by coercion to an appropriate
	buffer type. The unpacking is performed by coercion of such a buffer
	type back to the type of the compact struct.

	Of course, a lazy implementation will probably find it easiest just
	to keep the object in its serialized form all the time. In particular,
	an array of compact structs must be stored in their serialized form
	(see next section).

	Again, Buf is what is safe to pass to NativeCall, though NativeCall has
	rules about its REPRs that make this seamless by skipping a manual Buf
	coercion. Also again, the definition of "coerce" when it comes to
	mutability, write-back, and COW behavior is left up to the implementation
	and also to to indiviudual modules.

	TDLR: There are 4 types of behavior C interfacers and pure-Perl6
	data acrobats will need to be able to explicitly ask Perl 6 for when
	working with native data aggregates in their serialized Buf forms:

	1) Read-only views with no copy performed when possible.
	2) Mutable copies that are copied when they are created.
	3) Mutable copies that copy-on-write ("COWercion"?)
	4) Mutable views that write mutations back to the originating object.

	...and this is currently unspecced territory.