Ovid/data-types-in-perl.md

## data-types-in-perl.md

      
    Raw
  

              data-types-in-perl.md
            
          
    Data Types in Perl

This document is to open discussion about what it takes to create an optional,
native data type constraint system for Perl. It it not about building a type
system.
In the book Types and Programming
Languages by Benjamin Pierce, he
writes:

A type system is a tractable syntactic method for proving the absence of
certain program behaviors by classifying phrases according to the kinds of
values they compute.

We're not getting that, but in fact, Perl already has existing type systems
and we may very well have to work around those. (For a gentle introduction to
type systems, read
this)
For example, enabling strict ensures compile-time failures for all sorts of
unwanted behavior.  Or trying to access a hashref as an arrayref generates a
runtime failure because we don't allow that. And then there's taint checking,
a type system which attempts to prevent developers from using unvalidated data
in unsafe ways.
For another approach, in the 2005 Google Summer of Code
project, Gary
Jackson attempted to develop a type inference engine for Perl. What came out
of that is is Devel::TypeCheck.
Sadly, that work is both incomplete and out-of-date. It would certainly be
amazing if someone picked that up and got it working again, but that's not the
approach we're looking at here.  Reini Urban attempted this for
5.10, but the author was
not responsive (it was several years after the work, so the author has likely
moved on).
Goals

Major Caveat: Everything in this document is open for discussion. None of
this is set in stone.
Before we attempt to create a design and/or implementation for types in Perl,
we should identify the goals we wish to achieve.

Easy-to-use
Optional
No infectious data types
Must be extendable
Can be extended to signatures (including return types)
Can work with Corinna

All of the above is achievable and will be explained as we go.
Syntax

There are two aspects of this proposal/discussion: semantics and syntax.
Hopefully we can address these separately.
I've seen several proposals for how data types can be attached to Perl
variables, the most common of which is this syntax:
my Dog $spot = Dog->new( name => 'Spot' );
Currently, perldoc -f my show the following grammar:
my VARLIST
my TYPE VARLIST
my VARLIST : ATTRS
my TYPE VARLIST : ATTRS

And has this caveat (emphasis mine):

The exact semantics and interface of TYPE and ATTRS are still evolving.
TYPE may be a bareword, a constant declared with "use constant", or
"PACKAGE".  It is currently bound to the use of the fields pragma, and
attributes are handled using the attributes pragma, or starting from Perl
5.8.0 also via the Attribute::Handlers module.  See "Private Variables via
my()" in perlsub for details.

Thus, any attempt to repurpose the current syntax for type constraints could
potentially break a existing code. We should tread lightly here. Instead, I
suggest we go all in on the KIM
syntax
proposed by Damian Conway for Corinna. KIM stands for "Keyword Identifier
Modifier" and proposes a standard declarative syntax for identifying things:
KEYWORD IDENTIFIER MODIFIERS? DEFINITION?

For example, from Corinna:
# keyword   identifier   modifier     definition
class       Customer     :isa(Person) {
    # keyword   identifier     modifiers
    field       $customer_id   :param :reader;
}
For a data type constraint, we might have something like this:
# keyword    identifier   modifier    definition
my           $counter     :Type(UInt) = 0;
Pushing Perl forward using KIM syntax essentially means we're adding new
behavior to the language without the need for additional syntax (not true for
signatures) or new keywords littering out code. Further, by having a
standard declaration syntax going forward, Perl can evolve to b a more
predictable language.
For signatures, types could look like this:
sub factorial( $bar :Type(UInt) ) :Returns(PositiveInt) {
    ...
}
That might complicate the parser, but we want this to be as easy to use as
possible.
Types for Humans, not Computers

The C language provides primitive types such as char, int, float,
double, and so on. This helps the C compiler generate code like a race car
that runs blazingly fast and crashes all the time.
Java, on the other hand, has primitive and non-primitive types. The
non-primitive types include classes, interfaces (sort of like roles), and
arrays.
The non-primitive types allow us to create types which specifically fit our
problem domain. Consider this pseudo-code:
var int temp1   = 32;
var int temp2   = 0;
var int average = ( temp1 + temp2 ) / 2;

The above might be masking a very common kind of error. So let's rewrite it.
var Celsius    temp1   = 32;
var Fahrenheit temp2   = 0;
var int        average = ( temp1 + temp2 ) / 2;

Most developers glancing at the above code can tell it's not correct. But how
do we prevent this?
At its core, a data type is:

A name for the type
A set of allowed values for that type
A set of allowed operations for that type

I think we can accomplish the first two, but the third might be difficult.
Consider:
my $count :Type(UInt) = 23;
$count += "19 apples";
say $count;
What should happen there? We're adding a string to an unsigned integer.
Currently, without types, the above generates a warning, but we still get
our answer.
Argument "19 apples" isn't numeric in addition (+) at -e line 1.
42

We could:

Make this fatal
Make this a warning
Ignore it

For now, I would suggest maintaining the warning to allow developers to
gradually upgrade existing code. Disallowing this "illegal" operation might be
a bridge too far for the Perl community. However, this destroys types.
Clearly the following is illegal, but we get a warning, apples is coerced to
zero, and it prints 23.
my $count :Type(UInt) = 23;
$count += "apples";
Do we want this behavior? I would argue, "no", but many billion-dollar
companies who rely on Perl's current behavior may have a different point of
view. Perhaps explicit type casting could help here, but this requires
discussion.
Type the Variable, not the Data

I once read (but can't find) an article about a company that was delighted
with a third-party type inference package that saved them much grief (and
money) when writing new code. They stopped using it. Why? Because when the
code crossed the boundary between the new code and libraries that didn't use
type inference, it kept breaking. Retro-fitting a type system on a language
not designed for it is hard. Thus, I suggest we type the variable and not
the data. Consider:
my $count :Type(UInt) = $var;
$count = code_i_did_not_write($count);

# elsewhere
sub code_i_did_not_write ($count) {
    my $temp = $count;
    $count   = undef;
    ...
    return $temp;
}
If we typed the data, the above code would fail on the $count = undef line,
even if that code is perfectly correct under Perl standards.
However, if we type the variable, so long as code_i_did_not_write returned
a scalar containing an unsigned integer, we're good to go. But what if they do
this:
sub code_i_did_not_write {
    my $temp = $_[0];
    $_[0]    = undef;
    ...
    return $temp;
}
That breaks because using the @_ array directly is using an alias to the
original variable. More discussion needed on this.
Type Libraries

Native Types

If this works, we will want a native type library for Perl. I'm using
Type::Tiny types as a reference point,
not as a suggestion (though they're likely to be more familiar to Perl
developers). Because we use attributes, the type names will not infect the
current namespace.
Extended Types

However, businesses would certainly want to extend these core types with their
own, so we'll need some way to handle this. Again, using Type::Tiny-like
syntax:
builtin::types->register(
    subtype Probability :Type(Num) = sub ($val) { 0 <= $val <= 1 };
);
The above is conceptual, not suggested.
Classes

For the Probability type above, that's simply a float constrained to values
between zero and one, inclusive. We also want to declare something to be a
class. Also, we don't want the class names to conflict with native types.
The InstanceOf[ClassName] syntax is cumbersome. I think a single
unary plus would work:
class Probability { ... }

my $probability :Type(+Probability) = Probability->new;
Alternatively, we could infer the type:
my $probability :Type = Probability->new;
That raises interesting questions. Something like my $count :Type = 3; might
infer an integer type. However, we might expect my $count :Type = $prev_count; to be an integer, but in reality, $prev_count might have been
stringified, have an overloaded object, be an array reference, and so on.
Inferred types, if used, should probably only be allowed for classes and typed
variables.
Complex Types

The above syntax suggestions allow us to work around complex types:
builtin::types->register(
    subtype Record, as HashRef[HashRef[Str]];
);
But having to do that every time we want a complex type is annoying,
especially if it's a one off. We want something like this:
my @records : Type(HashRef[HashRef[Str]]);
I don't think Attribute::Handlers will handle that case well, but we will
want complex types. If the arguments to an attribute could be code instead of
a string, this might be feasible. Suggestions welcome.