Skip to content

Instantly share code, notes, and snippets.

@Ovid
Last active August 25, 2023 13:56
Show Gist options
  • Save Ovid/4cc649c1eb3142b6a856d94c54b1d4ed to your computer and use it in GitHub Desktop.
Save Ovid/4cc649c1eb3142b6a856d94c54b1d4ed to your computer and use it in GitHub Desktop.
Perl class tutorial

This is a work-in-progress based on this class tutorial. It covers features that are not yet implemented.

NAME

perlclasstut - Object-Oriented Programming via the class keyword

DISCLAIMER

This tutorial is a rough work-in-progress and only covers features of the new object syntax that are well-defined. The implementation is ongoing and while most of the basics are mapped out, there are some edge cases still being nailed down. We will not discuss those here.

DESCRIPTION

With the release of Perl version X, the Perl language has added a new object system. Originally code-named Corinna, the new object system is a result of years of collaboration between the Corinna team and the Perl community to bring a modern, easy-to-use object system that still feels like Perl. If you're looking for information about Perl's original object system, see perlobj (and later, perlootut to see object systems built on top of the legacy system).

Note that the following assumes you understand Perl and have a beginner's knowledge of object-oriented programming. You should understand classes, instances, inheritance, and methods. We'll cover the rest. Also, this is only an introduction, not a full manual.

Further, for simplicity, we'll often refer to "object-oriented programming" as OOP.

The legacy object system in Perl remains and no code will be broken by this change.

OBJECT-ORIENTED FUNDAMENTALS

There are many ways to describe object systems. Some focus on the implementation ("structs with behavior"), but we'll focus on the purpose. Objects are experts about a problem domain. You construct them with the information they need to do their job. For example, consider the case of an LRU cache. An LRU cache is a type of cache that keeps cache size down by deleting the least-recently-used cache entry. Let's construct a hypothetical cache:

my $cache = Cache::LRU->new( max_size => 20 );

In the above example, we will assume that max_size is the maximum number of entries in the cache. Adding a 21st unique entry will cause the "least recently used" entry to be ejected from the cache.

And then you can tell the object to do something, by calling "methods" on the object. Let's save an item in the cache and retrieve it.

$cache->set( $customer_id => $customer );

my $cached_customer = $cache->get($customer_id);

How does it work internally? You don't care. You should trust the object to do the right thing. Read the docs. That's the published interface.

In this tutorial, we'll build the Cache::LRU class so you can see how this works, but after we have described a few fundamentals.

The Four Keywords

use feature 'class';

When you use the class feature, four new keywords are introduced into the current scope.

  • class

    Declare a class.

  • method

    Declare a method

  • field

    Declare a field (data for the class)

  • role

    Declare a role.

Note the use of the word "declare" for all of those definitions. Use of the class feature allows a declarative way of writing OOP code in Perl. It's both concise and expressive. Because you're declaring your intent instead of manually wiring all of the bits together, there are fewer opportunities for bugs.

That the general syntax for each of these keywords is:

KEYWORD IDENTIFIER MODIFIERS? DEFINITION?

For example:

class Employee :isa(Person) {
    ...
}

In the above, class is the KEYWORD, Employee is the IDENTIFIER (the unique name of the thing), :isa(Person) is an optional MODIFIER that assigns additional properties to the thing you've identified (in this case, Employee inherits from Person), and the postfix block is the DEFINITION of the class.

Note that, like the package declarator, class does not require a postfix-block, even though we'll show some examples using it.

Also, modifiers are almost always regular Perl attributes, with an exception made for declaring the class version.

class

The class keyword declares a class and the namespace for that class. In future versions of Perl, it's possible we'll have private classes which are lexically bound, so do not make assumptions about the implementation.

Let's get started on our Cache::LRU class.

 use feature 'class'; # from now on, this will be assumed

 class Cache::LRU {}
 # or
 class Cache::LRU;

The above shows declaring a class and you can now make a new instance of it:

my $cache = Cache::LRU->new;

if ( $cache->isa('Cache::LRU') ) { # true
    ...
}
else {
    # we never get to here
}

Of course, that's all you can do. It's kinda useless, but we'll cover more in a bit.

Note that the new method is provided for you automatically. Do not declare your own new method in the class.

Versions

Any valid v-string may be used to declare the class version. This should be after the identifier:

class Cache::LRU v0.1;
my $cache = Cache::LRU->new;
say $cache->VERSION; # prints v0.1

Note: due to how the Perl grammar works, the version declaration must come before any attributes.

Inheritance

In OOP, sometimes you want a class to inherit from another class. This means that your class will extend the behavior of the parent class (er, that's the simple explanation. We'll keep it simple).

For example, a Cat might inherit from Mammal. In OOP, we often say that a Cat isa Mammal. You do this with the :isa(...) modifier.

class Cat :isa(Mammal);

Note that objects declared with class are single-inheritance only. As an alternative to multiple inheritance, we provide roles. More on that later.

Abstract Classes

In OOP, an abstract class is a class that cannot be instantiated. Instead, another class must inherit from the abstract class and provide the full functionality. In the "Inheritance" example above, the Mammal class might be abstract, so we declare it with the :abstract modifier.

class Mammal :abstract {
    ...
}

Any attempt to instantiate an abstract class is a fatal error.

my $mammal = Mammal->new; # boom

Methods declared with a forward declaration (i.e. any method whose name is declared, but without any corresponding code block) must be provide by a subclass, either via direct implementation or via a role. At the present time, forward declarations of methods do not take signatures due to more work being needed to make signatures introspectable.

class Mammal :abstract {
    method eat; # must be declared in a subclass at compile-time
}

Multiple Modifiers

Note that modifiers may not be duplicated, but the order in which they're specified does not matter.

class Mammal v1.0 :abstract :isa(Animalia);

class Mammal v1.0 :isa(Animalia) :abstract; # same thing

(With apologies to the biology fans who know that biological taxonomy is both misrepresented here and more complex than this simple hierarchy).

field

The field keyword allows you to create data storage for your class. You can create instance data and class data. This data is stored in normal Perl variables, but with special syntax to bind them to the class.

Instance Data

Classes are not very useful without data. In our Cache::LRU class, we have a max_size field to indicate how many cache entries we can have. Let's declare that field, provide a "reader" for that field, and a default value of 20.

Underneath the hood, we'll also use the Hash::Ordered module to provide the actual caching. Note that Hash::Ordered is written using legacy Perl, but you shouldn't (and don't) have to care about that.

class Cache::LRU {
    use Hash::Ordered;

    field $cache            { Hash::Ordered->new };

    field $max_size :reader { 20 };
}

my $cache = Cache::LRU->new;
say $cache->max_size;    # 20

In the above example, both $cache and $max_size are instance variables, which are unique to every instance of the class. They are never available outside the class. For each of them, we have an optional postfix block to assign a default value to those fields. If you omit the block, those fields will contain the value undef unless your class assigns a value to them.

Unlike Perl's legacy OOP system, you cannot use $cache->{cache}, $cache->{'$class'} or any other tricks to get at this data. It's completely encapsulated. However, in case of emergency, the meta-object protocol (MOP) will allow access to this data (but that's beyond the scope of this tutorial).

So how can we read the max_size data? Because we used the :reader attribute (also called a "modifier"). By default, the :reader modifier removes the $ sigil from the variable name and that becomes the name of a read-only method. So declaring field $foo :reader will create a foo method that will return the value contained in $foo. However, you can change the name of the method:

field $max_size :reader(max_entries);

Naturally, we provide a corresponding :writer modifier

field $rank :reader :writer;

By default, the :writer modifier will prepend a set_ to the method name, so the above allows:

say $object->rank;             # returns the value of $rank

$object->set_rank('General');  # sets the value of $rank.

Important: being able to mutate an object (i.e. change the values of its fields via writer methods) is often a dangerous thing, as other code using that object may have already made decisions or assumptions based on the previous value of that field. If that previous value is no longer valid, those decisions or assumptions may now be inconsistent or incorrect.

Each writer method returns its own invocant to allow chaining:

$object->set_rank('General')
       ->set_name('Toussaint Louverture');

Though it's discouraged, you can set the name of the writer to the same name as the reader:

field $rank :writer(rank) :reader;

This allows for a common Perl convention of creating a single reader/writer method by overloading the behaviour of the method based on whether or not it is passed an argument:

say $object->rank;         # returns the value of $rank
$object->rank('General');  # sets the value of $rank.

Obviously, the rank method now does two entirely separate things, which can be confusing and error-prone, but this technique is ingrained in Perl OOP culture, so we support this edge case.

Having a default of 20 for max_size is useful, but we need to allow the programmer to say what the max size is. We do this with the :param modifier.

field $max_size :reader :param { 20 };

This tells the class that this value may be passed as a named parameter to the constructor.

my $cache = Cache::LRU->new( max_size => 100 );
say $cache->max_size; # 100

It's important to remember that every constructor parameter is required to be passed to the constructor if a default is not provided. Thus, if we have this:

class NamedPoint {
    field ( $x, $y ) :param :reader {0};
    field $name      :param :reader;
}

The above would allow you to do any of these:

my $point = NamedPoint->new( name => 'Origin' );
my $point = NamedPoint->new( name => 'Origin', x => 3 );
my $point = NamedPoint->new( name => 'Origin', x => 3, y => 3.14 );

But not this:

my $point = NamedPoint->new( x => 23, y => 42 );   # Missing 'name' initializer

If a field is required, but not passed to the constructor, you will get a fatal runtime error.

method

Now that we know how to construct a basic object, we probably want to do things with it. To do that, we write methods. Methods use the method keyword instead of sub. They also take argument lists. Let's look at a "transposable" point class (i.e. X,Y --> Y,X).

class Point {
    field ( $x, $y ) :reader :param;

    method invert () {
        ( $x, $y ) = ( $y, $x );
    }

    method to_string () {
        return sprintf "(%d, %d)" => $x, $y;
    }
}

my $point = Point->new( x => 23, y => 42 );
say $point->to_string; # (23, 42)

$point->invert;
say $point->to_string; # (42, 23)

In the above, you can see that methods have direct access to field variables. However, they also have $self injected in them. So you could also write invert as follows:

    method invert () {
        ( $x, $y ) = ( $self->y, $self->x );
    }

However, method calls are not only slower than direct variable access, but it's more typing. Plus, if we don't use :reader for a given field, we have no method to call.

Putting all of this together, we get the following as a very basic Cache::LRU class:

use feature 'class';
class Cache::LRU {
    use Hash::Ordered;

    field $cache                   { Hash::Ordered->new };
    field $max_size :param :reader { 20 };

    method set( $key, $value ) {
        $cache->unshift( $key, $value );    # new values in front
        if ( $cache->keys > $max_size ) {
            $cache->pop;
        }
    }

    method get($key) {
        return unless $cache->exists($key);
        my $value = $cache->get($key);
        $self->unshift( $key, $value );     # put it at the front
        return $value;
    }
}

With the above, we have a working LRU cache. It doesn't have a lot of features, but it shows you the core of writing OOP code with the class feature. We have a powerful, well-encapsulated declarative means of writing objects without having to wire together all of the various bits and pieces.

role

The new class syntax only provides for single inheritance. Sometimes you need additional behavior that you would like to "transparently" provide. For example, you might want two or more unrelated classes to be able to serialize themselves to JSON, even though each class itself has nothing to do with JSON. Let's do that with our Cache::LRU class.

To provide functionality shared across unrelated classes, we use the role keyword. A role is similar to a class, but it cannot be instantiated. Instead, it is "consumed" by a class and the class provides the specifics of the role behavior. Roles can both provide methods and exclude methods. For our JSON role, it might look like this:

use feature 'class';

role Role::Serializable::JSON {
    use JSON::PP 'encode_json';  # provided in core Perl since v5.13.9

    method to_hash;   # forward declaration: the class must provide this

    method to_json () {
        encode_json( $self->to_hash );
    }
}

And you can use this in your class with the :does attribute.

class Cache::LRU :does(Role::Serializable::JSON) {
    ...
}

But our class fails at compile-time because it doesn't have a to_hash method. So let's write one.

class Cache::LRU  v0.1.0  :does(Role::Serializable::JSON) {
    use Hash::Ordered;
    use Carp 'croak';

    field $cache                   { Hash::Ordered->new };
    field $max_size :param :reader { 20 };

    method set ( $key, $value ) {...}

    method get($key) {...}

    method to_hash () {
        my %entries;
        foreach my $key ($cache->keys) {
            my $value      = $cache->get($key);
            my $ref        = defined $value ? (ref $value || 'SCALAR') : 'UNDEF';
            $entries{$key} = $ref;
        }
        return {
            max_size => $max_size,
            entries => \%entries,
        }
    }
}

In the above, the method to_hash; forward declaration defines a method that the Role::Serializable::JSON role requires the consuming class to provide. It can do so by either having the method defined in the class or consuming it from another role.

The method to_json provided by the role will be "flattened" into the Cache::LRU class almost as if it had been written there. However, fields defined in the class are always lexically scoped (like a my or state variable) and so are not directly accessible to the role method.

With that, can do this:

my $cache = Cache::LRU->new(max_size => 5);
$cache->set( first  => undef );
$cache->set( second => 'bob' );
$cache->set( third  => { foo => 'bar' } );
say $cache->to_json;

And we should get output similar to the following:

{"max_size":5,"entries":{"first":"UNDEF","third":"HASH","second":"SCALAR"}}

You can also consume multiple roles:

class Foo :does(Role1) :does(Role2) {
    ...
}

Roles may declare fields, but those field variables are private to that role. This protects against the case where a class and a role might both define field $x.

If any method defined directly in the class has the same name as a method provided by a role, a compile-time error will result. If two roles have duplicate method names, this will also cause a compile-time failure if they're consumed together. Traditionally, roles have syntax for "excluding" or "aliasing" methods, but this is not (yet) provided by the new mechanism. In practice, we find this is rarely an issue, but as roles are more widely shared, this will need to be addressed.

As a workaround, you can create a new object that consumes the role and store that object in a field, or you can use interstitial base classes that consume the role. Neither solution is great.

Miscellaneous

Non-scalar Fields

You can declare arrays and hashes as fields, with or without defaults:

field @colors { qw/green yellow red/ };
field %seen;

However, array and hash fields cannot have modifiers:

field %seen :reader;  # compile-time error
field @array :param;  # compile-time error
field %hash :writer;  # compile-time error

Class Data, Methods, and Phasers

Class data and methods are shared by all instances of a given class. They are declared with the :common attribute. For example, let's say you're making a game and you only allow 20 point objects to be created. How do you track how many are created? You don't. That's the responsibility of the class. Let's use class data for this.

class Point {
    field $num_points :common :reader { 0 }; # all classes share this
    field ( $x, $y ) :param  :reader;

    ADJUST {
        $num_points++;
        if ( $num_points > 20 ) {
            die "No more than 20 points may be created at any time";
        }
    }

    DESTRUCT { $num_points-- }
}

In the above, ADJUST is a phaser (like BEGIN or END), which is called every time a class is instantiated. You can have multiple ADJUST phasers and they are called in order declared. So you could also write the above ADJUST as follows:

    ADJUST { $num_points++ }

    ADJUST {
        if ( $num_points > 20 ) {
            die "No more than 20 points may be created at any time";
        }
    }

The DESTRUCT phaser behaves similarly to ADJUST, but only fires when the reference count of the object drops to zero (in other words, when it goes out of scope).

We can now do this:

say Point->num_points;   # 0

my $point1 = Point->new( x => 2, y => 4 );
say Point->num_points;   # 1
# or
say $point1->num_points; # 1

my $point2 = Point->new; # accepts defaults
say Point->num_points;   # 2
undef $point1;           # triggers DESTRUCT
say $point1->num_points; # 1

There's a lot more to say about ADJUST and DESTRUCT, but some of the finer points are sill being nailed down.

Types?

In Moose, you can declare attributes like this:

has limit => (
    is  => 'rw',
    isa => 'Int',
);

With that, you can cannot pass anything but an integer to the constructor, nor can you later do $object->limit('unlimited'). Sadly, we do not have this at the present time for the class syntax, but there is a work around: Types::Standard and ADJUST. Note that this workaround is only safe for immutable objects. Mutable objects will (for the time being) have to jump through more hoops to ensure type safety.

The following trivial example shows the potential, but obviously, there's a lot more you could do with Types::Standard to make this more robust.

class Point {
    use Types::Standard qw(is_Int);

    field ( $x, $y ) :reader :param;

    ADJUST {
        my @errors;
        is_Int($x) or push @errors => "x must be an integer, not $x.";
        is_Int($y) or push @errors => "y must be an integer, not $y.";
        if (@errors) {
            die join ', ' => @errors;
        }
    }
}

With the above, you can guarantee that your Point object only has integer values for $x and $y.

Putting It All Together

use feature 'class';
role Role::Serializable::JSON {
    use JSON::PP 'encode_json'; # provided in core Perl since v5.13.9
    method to_hash; # the class must provide this

    method to_json () {
        encode_json($self->to_hash);
    }
}

class Cache::LRU v0.1.0 :does(Role::Serializable::JSON) {
    use Hash::Ordered;
    use Carp 'croak';

    field $num_caches :common :reader { 0 };
    field $cache                      { Hash::Ordered->new };
    field $max_size   :param  :reader { 20 };
    field $created    :reader         { time };

    ADJUST { # called after new()
        $num_caches++;
        if ( $max_size < 1 ) {
            croak(...);
        }
    }

    DESTRUCT { $num_caches-- }

    method set( $key, $value ) {
        $cache->unshift( $key, $value );
        if ( $cache->keys > $max_size ) {
            $cache->pop;
        }
    }

    method get($key) {
        return unless $cache->exists($key);
        my $value = $cache->get($key);
        $self->unshift( $key, $value );
        return $value;
    }

    method to_hash () {
        my %entries;
        foreach my $key ($cache->keys) {
            my $value = $cache->get($key);
            my $ref        = defined $value ? (ref $value || 'SCALAR') : 'UNDEF';
            $entries{$key} = $ref;
        }
        return {
            max_size   => $max_size,
            entries    => \%entries,
            created    => $created,
            num_caches => $num_caches,
        }
    }
}

Conclusion

I hope you've enjoyed this far-too-brief introduction to the new class keyword. This has been the result of years of design effort from the Corinna design team and the Perl community at large.

This work is dedicated to the memory of Jeff Goff and David Adler, two prominent members of the Perl community who were wonderful people and left this life far too soon.

Contributors

Paul "LeoNerd" Evans and Damian Conway both were kind enough to help with some of my silly mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment