Skip to content

Instantly share code, notes, and snippets.

@Ovid
Created December 23, 2022 11:12
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Ovid/5ae3752e260219a575ddfdea4c2194f7 to your computer and use it in GitHub Desktop.
Save Ovid/5ae3752e260219a575ddfdea4c2194f7 to your computer and use it in GitHub Desktop.
Data Types in Perl

Data Types in Perl

This document is to open discussion about what it takes to create an optional, native data type constraint system for Perl. It it not about building a type system.

In the book Types and Programming Languages by Benjamin Pierce, he writes:

A type system is a tractable syntactic method for proving the absence of certain program behaviors by classifying phrases according to the kinds of values they compute.

We're not getting that, but in fact, Perl already has existing type systems and we may very well have to work around those. (For a gentle introduction to type systems, read this)

For example, enabling strict ensures compile-time failures for all sorts of unwanted behavior. Or trying to access a hashref as an arrayref generates a runtime failure because we don't allow that. And then there's taint checking, a type system which attempts to prevent developers from using unvalidated data in unsafe ways.

For another approach, in the 2005 Google Summer of Code project, Gary Jackson attempted to develop a type inference engine for Perl. What came out of that is is Devel::TypeCheck. Sadly, that work is both incomplete and out-of-date. It would certainly be amazing if someone picked that up and got it working again, but that's not the approach we're looking at here. Reini Urban attempted this for 5.10, but the author was not responsive (it was several years after the work, so the author has likely moved on).

Goals

Major Caveat: Everything in this document is open for discussion. None of this is set in stone.

Before we attempt to create a design and/or implementation for types in Perl, we should identify the goals we wish to achieve.

  • Easy-to-use
  • Optional
  • No infectious data types
  • Must be extendable
  • Can be extended to signatures (including return types)
  • Can work with Corinna

All of the above is achievable and will be explained as we go.

Syntax

There are two aspects of this proposal/discussion: semantics and syntax. Hopefully we can address these separately.

I've seen several proposals for how data types can be attached to Perl variables, the most common of which is this syntax:

my Dog $spot = Dog->new( name => 'Spot' );

Currently, perldoc -f my show the following grammar:

my VARLIST
my TYPE VARLIST
my VARLIST : ATTRS
my TYPE VARLIST : ATTRS

And has this caveat (emphasis mine):

The exact semantics and interface of TYPE and ATTRS are still evolving. TYPE may be a bareword, a constant declared with "use constant", or "PACKAGE". It is currently bound to the use of the fields pragma, and attributes are handled using the attributes pragma, or starting from Perl 5.8.0 also via the Attribute::Handlers module. See "Private Variables via my()" in perlsub for details.

Thus, any attempt to repurpose the current syntax for type constraints could potentially break a existing code. We should tread lightly here. Instead, I suggest we go all in on the KIM syntax proposed by Damian Conway for Corinna. KIM stands for "Keyword Identifier Modifier" and proposes a standard declarative syntax for identifying things:

KEYWORD IDENTIFIER MODIFIERS? DEFINITION?

For example, from Corinna:

# keyword   identifier   modifier     definition
class       Customer     :isa(Person) {
    # keyword   identifier     modifiers
    field       $customer_id   :param :reader;
}

For a data type constraint, we might have something like this:

# keyword    identifier   modifier    definition
my           $counter     :Type(UInt) = 0;

Pushing Perl forward using KIM syntax essentially means we're adding new behavior to the language without the need for additional syntax (not true for signatures) or new keywords littering out code. Further, by having a standard declaration syntax going forward, Perl can evolve to b a more predictable language.

For signatures, types could look like this:

sub factorial( $bar :Type(UInt) ) :Returns(PositiveInt) {
    ...
}

That might complicate the parser, but we want this to be as easy to use as possible.

Types for Humans, not Computers

The C language provides primitive types such as char, int, float, double, and so on. This helps the C compiler generate code like a race car that runs blazingly fast and crashes all the time.

Java, on the other hand, has primitive and non-primitive types. The non-primitive types include classes, interfaces (sort of like roles), and arrays.

The non-primitive types allow us to create types which specifically fit our problem domain. Consider this pseudo-code:

var int temp1   = 32;
var int temp2   = 0;
var int average = ( temp1 + temp2 ) / 2;

The above might be masking a very common kind of error. So let's rewrite it.

var Celsius    temp1   = 32;
var Fahrenheit temp2   = 0;
var int        average = ( temp1 + temp2 ) / 2;

Most developers glancing at the above code can tell it's not correct. But how do we prevent this?

At its core, a data type is:

  1. A name for the type
  2. A set of allowed values for that type
  3. A set of allowed operations for that type

I think we can accomplish the first two, but the third might be difficult. Consider:

my $count :Type(UInt) = 23;
$count += "19 apples";
say $count;

What should happen there? We're adding a string to an unsigned integer. Currently, without types, the above generates a warning, but we still get our answer.

Argument "19 apples" isn't numeric in addition (+) at -e line 1.
42

We could:

  • Make this fatal
  • Make this a warning
  • Ignore it

For now, I would suggest maintaining the warning to allow developers to gradually upgrade existing code. Disallowing this "illegal" operation might be a bridge too far for the Perl community. However, this destroys types. Clearly the following is illegal, but we get a warning, apples is coerced to zero, and it prints 23.

my $count :Type(UInt) = 23;
$count += "apples";

Do we want this behavior? I would argue, "no", but many billion-dollar companies who rely on Perl's current behavior may have a different point of view. Perhaps explicit type casting could help here, but this requires discussion.

Type the Variable, not the Data

I once read (but can't find) an article about a company that was delighted with a third-party type inference package that saved them much grief (and money) when writing new code. They stopped using it. Why? Because when the code crossed the boundary between the new code and libraries that didn't use type inference, it kept breaking. Retro-fitting a type system on a language not designed for it is hard. Thus, I suggest we type the variable and not the data. Consider:

my $count :Type(UInt) = $var;
$count = code_i_did_not_write($count);

# elsewhere
sub code_i_did_not_write ($count) {
    my $temp = $count;
    $count   = undef;
    ...
    return $temp;
}

If we typed the data, the above code would fail on the $count = undef line, even if that code is perfectly correct under Perl standards.

However, if we type the variable, so long as code_i_did_not_write returned a scalar containing an unsigned integer, we're good to go. But what if they do this:

sub code_i_did_not_write {
    my $temp = $_[0];
    $_[0]    = undef;
    ...
    return $temp;
}

That breaks because using the @_ array directly is using an alias to the original variable. More discussion needed on this.

Type Libraries

Native Types

If this works, we will want a native type library for Perl. I'm using Type::Tiny types as a reference point, not as a suggestion (though they're likely to be more familiar to Perl developers). Because we use attributes, the type names will not infect the current namespace.

Extended Types

However, businesses would certainly want to extend these core types with their own, so we'll need some way to handle this. Again, using Type::Tiny-like syntax:

builtin::types->register(
    subtype Probability :Type(Num) = sub ($val) { 0 <= $val <= 1 };
);

The above is conceptual, not suggested.

Classes

For the Probability type above, that's simply a float constrained to values between zero and one, inclusive. We also want to declare something to be a class. Also, we don't want the class names to conflict with native types. The InstanceOf[ClassName] syntax is cumbersome. I think a single unary plus would work:

class Probability { ... }

my $probability :Type(+Probability) = Probability->new;

Alternatively, we could infer the type:

my $probability :Type = Probability->new;

That raises interesting questions. Something like my $count :Type = 3; might infer an integer type. However, we might expect my $count :Type = $prev_count; to be an integer, but in reality, $prev_count might have been stringified, have an overloaded object, be an array reference, and so on.

Inferred types, if used, should probably only be allowed for classes and typed variables.

Complex Types

The above syntax suggestions allow us to work around complex types:

builtin::types->register(
    subtype Record, as HashRef[HashRef[Str]];
);

But having to do that every time we want a complex type is annoying, especially if it's a one off. We want something like this:

my @records : Type(HashRef[HashRef[Str]]);

I don't think Attribute::Handlers will handle that case well, but we will want complex types. If the arguments to an attribute could be code instead of a string, this might be feasible. Suggestions welcome.

@Ovid
Copy link
Author

Ovid commented Dec 23, 2022

Note that all the above is a work in progress. There are many areas of undefined semantics, but I'm hopeful we can start a discussion before getting too far into the weeds.

@perigrin
Copy link

There is an entire thread about this on Mastodon https://fosstodon.org/@ovid/109546525135152664 that may need some work exploring because of the way the fediverse works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment