This document is to open discussion about what it takes to create an optional, native data type constraint system for Perl. It it not about building a type system.
In the book Types and Programming Languages by Benjamin Pierce, he writes:
A type system is a tractable syntactic method for proving the absence of certain program behaviors by classifying phrases according to the kinds of values they compute.
We're not getting that, but in fact, Perl already has existing type systems and we may very well have to work around those. (For a gentle introduction to type systems, read this)
For example, enabling strict
ensures compile-time failures for all sorts of
unwanted behavior. Or trying to access a hashref as an arrayref generates a
runtime failure because we don't allow that. And then there's taint checking,
a type system which attempts to prevent developers from using unvalidated data
in unsafe ways.
For another approach, in the 2005 Google Summer of Code project, Gary Jackson attempted to develop a type inference engine for Perl. What came out of that is is Devel::TypeCheck. Sadly, that work is both incomplete and out-of-date. It would certainly be amazing if someone picked that up and got it working again, but that's not the approach we're looking at here. Reini Urban attempted this for 5.10, but the author was not responsive (it was several years after the work, so the author has likely moved on).
Major Caveat: Everything in this document is open for discussion. None of this is set in stone.
Before we attempt to create a design and/or implementation for types in Perl, we should identify the goals we wish to achieve.
- Easy-to-use
- Optional
- No infectious data types
- Must be extendable
- Can be extended to signatures (including return types)
- Can work with Corinna
All of the above is achievable and will be explained as we go.
There are two aspects of this proposal/discussion: semantics and syntax. Hopefully we can address these separately.
I've seen several proposals for how data types can be attached to Perl variables, the most common of which is this syntax:
my Dog $spot = Dog->new( name => 'Spot' );
Currently, perldoc -f my
show the following grammar:
my VARLIST
my TYPE VARLIST
my VARLIST : ATTRS
my TYPE VARLIST : ATTRS
And has this caveat (emphasis mine):
The exact semantics and interface of TYPE and ATTRS are still evolving. TYPE may be a bareword, a constant declared with "use constant", or "PACKAGE". It is currently bound to the use of the fields pragma, and attributes are handled using the attributes pragma, or starting from Perl 5.8.0 also via the Attribute::Handlers module. See "Private Variables via my()" in perlsub for details.
Thus, any attempt to repurpose the current syntax for type constraints could potentially break a existing code. We should tread lightly here. Instead, I suggest we go all in on the KIM syntax proposed by Damian Conway for Corinna. KIM stands for "Keyword Identifier Modifier" and proposes a standard declarative syntax for identifying things:
KEYWORD IDENTIFIER MODIFIERS? DEFINITION?
For example, from Corinna:
# keyword identifier modifier definition
class Customer :isa(Person) {
# keyword identifier modifiers
field $customer_id :param :reader;
}
For a data type constraint, we might have something like this:
# keyword identifier modifier definition
my $counter :Type(UInt) = 0;
Pushing Perl forward using KIM syntax essentially means we're adding new behavior to the language without the need for additional syntax (not true for signatures) or new keywords littering out code. Further, by having a standard declaration syntax going forward, Perl can evolve to b a more predictable language.
For signatures, types could look like this:
sub factorial( $bar :Type(UInt) ) :Returns(PositiveInt) {
...
}
That might complicate the parser, but we want this to be as easy to use as possible.
The C language provides primitive types such as char
, int
, float
,
double
, and so on. This helps the C compiler generate code like a race car
that runs blazingly fast and crashes all the time.
Java, on the other hand, has primitive and non-primitive types. The non-primitive types include classes, interfaces (sort of like roles), and arrays.
The non-primitive types allow us to create types which specifically fit our problem domain. Consider this pseudo-code:
var int temp1 = 32;
var int temp2 = 0;
var int average = ( temp1 + temp2 ) / 2;
The above might be masking a very common kind of error. So let's rewrite it.
var Celsius temp1 = 32;
var Fahrenheit temp2 = 0;
var int average = ( temp1 + temp2 ) / 2;
Most developers glancing at the above code can tell it's not correct. But how do we prevent this?
At its core, a data type is:
- A name for the type
- A set of allowed values for that type
- A set of allowed operations for that type
I think we can accomplish the first two, but the third might be difficult. Consider:
my $count :Type(UInt) = 23;
$count += "19 apples";
say $count;
What should happen there? We're adding a string to an unsigned integer. Currently, without types, the above generates a warning, but we still get our answer.
Argument "19 apples" isn't numeric in addition (+) at -e line 1.
42
We could:
- Make this fatal
- Make this a warning
- Ignore it
For now, I would suggest maintaining the warning to allow developers to
gradually upgrade existing code. Disallowing this "illegal" operation might be
a bridge too far for the Perl community. However, this destroys types.
Clearly the following is illegal, but we get a warning, apples
is coerced to
zero, and it prints 23.
my $count :Type(UInt) = 23;
$count += "apples";
Do we want this behavior? I would argue, "no", but many billion-dollar companies who rely on Perl's current behavior may have a different point of view. Perhaps explicit type casting could help here, but this requires discussion.
I once read (but can't find) an article about a company that was delighted with a third-party type inference package that saved them much grief (and money) when writing new code. They stopped using it. Why? Because when the code crossed the boundary between the new code and libraries that didn't use type inference, it kept breaking. Retro-fitting a type system on a language not designed for it is hard. Thus, I suggest we type the variable and not the data. Consider:
my $count :Type(UInt) = $var;
$count = code_i_did_not_write($count);
# elsewhere
sub code_i_did_not_write ($count) {
my $temp = $count;
$count = undef;
...
return $temp;
}
If we typed the data, the above code would fail on the $count = undef
line,
even if that code is perfectly correct under Perl standards.
However, if we type the variable, so long as code_i_did_not_write
returned
a scalar containing an unsigned integer, we're good to go. But what if they do
this:
sub code_i_did_not_write {
my $temp = $_[0];
$_[0] = undef;
...
return $temp;
}
That breaks because using the @_
array directly is using an alias to the
original variable. More discussion needed on this.
If this works, we will want a native type library for Perl. I'm using Type::Tiny types as a reference point, not as a suggestion (though they're likely to be more familiar to Perl developers). Because we use attributes, the type names will not infect the current namespace.
However, businesses would certainly want to extend these core types with their
own, so we'll need some way to handle this. Again, using Type::Tiny
-like
syntax:
builtin::types->register(
subtype Probability :Type(Num) = sub ($val) { 0 <= $val <= 1 };
);
The above is conceptual, not suggested.
For the Probability
type above, that's simply a float constrained to values
between zero and one, inclusive. We also want to declare something to be a
class. Also, we don't want the class names to conflict with native types.
The InstanceOf[ClassName]
syntax is cumbersome. I think a single
unary plus would work:
class Probability { ... }
my $probability :Type(+Probability) = Probability->new;
Alternatively, we could infer the type:
my $probability :Type = Probability->new;
That raises interesting questions. Something like my $count :Type = 3;
might
infer an integer type. However, we might expect my $count :Type = $prev_count;
to be an integer, but in reality, $prev_count
might have been
stringified, have an overloaded object, be an array reference, and so on.
Inferred types, if used, should probably only be allowed for classes and typed variables.
The above syntax suggestions allow us to work around complex types:
builtin::types->register(
subtype Record, as HashRef[HashRef[Str]];
);
But having to do that every time we want a complex type is annoying, especially if it's a one off. We want something like this:
my @records : Type(HashRef[HashRef[Str]]);
I don't think Attribute::Handlers
will handle that case well, but we will
want complex types. If the arguments to an attribute could be code instead of
a string, this might be feasible. Suggestions welcome.
Note that all the above is a work in progress. There are many areas of undefined semantics, but I'm hopeful we can start a discussion before getting too far into the weeds.