Juerd/p6pack.txt

## p6pack.txt
RFC: A more Perl6-esque "unpack"
================================

This is an idea for an "unpack" replacement. The basic reasoning behind it, is
that number encodings and string encodings needn't be treated all that
differently. Instead of passing the name of a string encoding, you can pass
a native type object. When decoding things of determinable lengths, any number
of types can be given.

A variable length thing without a length indication can only be passed at the
end.


Decode according to a template:

    $blob.decode( [ ... ] )

Decode a string:

    my $s = $blob.decode("utf8")
    # actually short for: $blob.decode([ ::Inf => "utf8" ])

Decode a natively encoded numeric value:

    my $i = $blob.decode(uint16);

Decode a natively encoded numeric value, and a string:

    my ($n, $s) = $blob.decode([ num, "latin1" ]);

This doesn't work:

    my ($s, $i) = $blob.decode([ "latin1", uint16 ]);  # FAILS
    # Can't determine string length!

Force endianness for a single value:

    my $i = $blob.decode([ :big(uint32) ]);

Set default endianness for the rest of the template:

    my @i = $blob.decode([ :big, uint32, uint16, uint8 ]);

Decode two byte-length-prefixed blobs:

    my ($blob1, $blob2) = $blob.decode([ ::uint32 => Blob, ::uint32 => Blob ]);

    or:

    my ($blob1, $blob2) = $blob.decode([ (::uint32 => Blob) xx 2 ]);

Decode any number of byte-length-prefixed blobs:

    my @blobs = $blob.decode([ ::Inf => [ ::uint32 => Blob ] ]);

Decode any number of byte-length-prefixed strings:

    my @strings = $blob.decode([ ::Inf => [ ::uint32 => "Windows-1252" ] ]);

A list of equityped things, with a counter prefix (as opposed to byte length):

    my @i = $blob.decode([ :elems(uint8) => uint32 ]);

A sub-template with a typed byte length prefix:

    [ ::uint32 => [ int32, uint16, "latin1" ] ]

A list of equityped things, with a BYTE length prefix:

    [ ::uint32 => uint32 ]

Skipping a byte with Nil (when packing (encoding), Nil becomes \0):

    [ int, int, int, Nil, int, int ]

User-defined number encoding in the mix:

    my ($command, $param) = $blob.decode([ :big, uint8, MQTT::Length => Blob ]);
    if $command == 0x30 {
        my ($topic, $message) = $param.decode([:big,
            ::uint16 => "utf8",
            Blob
        ]);
    }

Note that:

* The KEY of a pair is part of the template, but NOT of the actual data returned
  by decode. This holds true for length prefixes (key is a type object) and for
  hints like :big and :little (key is a string).
* Pairs can nest like this :
  :big(uint16) => Blob
  :elems(:big(uint16)) => uint64
* The compiler will eat pairs, thinking they're named arguments. This is why
  templates are arrays.

Things that P5's unpack does, that this proposal does not cover:

* Hexadecimal, binary, or uuencoded strings. These are actually string
  encodings, and should be implemented as such. (p5 <b B h H u U>)

* Absolute position based extraction ('@' and '.' in p5's pack). Don't know if
  this is actually ever used, or how it even works.

* Pointers to strings.

* Null-terminated strings. Just have a Nil in there.


Juerd <juerd@tnx.nl>
	RFC: A more Perl6-esque "unpack"
	================================

	This is an idea for an "unpack" replacement. The basic reasoning behind it, is
	that number encodings and string encodings needn't be treated all that
	differently. Instead of passing the name of a string encoding, you can pass
	a native type object. When decoding things of determinable lengths, any number
	of types can be given.

	A variable length thing without a length indication can only be passed at the
	end.


	Decode according to a template:

	$blob.decode( [ ... ] )

	Decode a string:

	my $s = $blob.decode("utf8")
	# actually short for: $blob.decode([ ::Inf => "utf8" ])

	Decode a natively encoded numeric value:

	my $i = $blob.decode(uint16);

	Decode a natively encoded numeric value, and a string:

	my ($n, $s) = $blob.decode([ num, "latin1" ]);

	This doesn't work:

	my ($s, $i) = $blob.decode([ "latin1", uint16 ]); # FAILS
	# Can't determine string length!

	Force endianness for a single value:

	my $i = $blob.decode([ :big(uint32) ]);

	Set default endianness for the rest of the template:

	my @i = $blob.decode([ :big, uint32, uint16, uint8 ]);

	Decode two byte-length-prefixed blobs:

	my ($blob1, $blob2) = $blob.decode([ ::uint32 => Blob, ::uint32 => Blob ]);

	or:

	my ($blob1, $blob2) = $blob.decode([ (::uint32 => Blob) xx 2 ]);

	Decode any number of byte-length-prefixed blobs:

	my @blobs = $blob.decode([ ::Inf => [ ::uint32 => Blob ] ]);

	Decode any number of byte-length-prefixed strings:

	my @strings = $blob.decode([ ::Inf => [ ::uint32 => "Windows-1252" ] ]);

	A list of equityped things, with a counter prefix (as opposed to byte length):

	my @i = $blob.decode([ :elems(uint8) => uint32 ]);

	A sub-template with a typed byte length prefix:

	[ ::uint32 => [ int32, uint16, "latin1" ] ]

	A list of equityped things, with a BYTE length prefix:

	[ ::uint32 => uint32 ]

	Skipping a byte with Nil (when packing (encoding), Nil becomes \0):

	[ int, int, int, Nil, int, int ]

	User-defined number encoding in the mix:

	my ($command, $param) = $blob.decode([ :big, uint8, MQTT::Length => Blob ]);
	if $command == 0x30 {
	my ($topic, $message) = $param.decode([:big,
	::uint16 => "utf8",
	Blob
	]);
	}

	Note that:

	* The KEY of a pair is part of the template, but NOT of the actual data returned
	by decode. This holds true for length prefixes (key is a type object) and for
	hints like :big and :little (key is a string).
	* Pairs can nest like this :
	:big(uint16) => Blob
	:elems(:big(uint16)) => uint64
	* The compiler will eat pairs, thinking they're named arguments. This is why
	templates are arrays.

	Things that P5's unpack does, that this proposal does not cover:

	* Hexadecimal, binary, or uuencoded strings. These are actually string
	encodings, and should be implemented as such. (p5 <b B h H u U>)

	* Absolute position based extraction ('@' and '.' in p5's pack). Don't know if
	this is actually ever used, or how it even works.

	* Pointers to strings.

	* Null-terminated strings. Just have a Nil in there.



	Juerd <juerd@tnx.nl>