Cor — A minimal OO proposal for the Perl core
This is version 0.10 of this document.
Curtis "Ovid" Poe
Nothing in the following proposal is set in stone.
It has been repeatedly proposed that we have OO in the Perl 5 core. I support this notion. However, there's been much disagreement over what that OO should look like. I propose a simple OO syntax that would nonetheless be modern, but still "feel like Perl 5." Here's a small taste (will be shown again later in the document):
class Cache::LRU {
use Hash::Ordered;
has cache => ( default => method { Hash::Ordered->new } );
has max_size => ( default => method { 20 } );
method set ( $key, $value ) {
if ( self->cache->exists($key) ) {
self->cache->delete($key);
}
elsif ( self->cache->keys > self->max_size ) {
self->cache->shift;
}
self->cache->set( $key, $value );
}
method get ($key) { self->cache->get($key) }
}
To distinguish this OO system from the (too) many others, such as Moose
, Moo
, Dios
, Class::InsideOut
, Mu
, Spiffy
, Class::Simple
, Rubyish::Class
, Class::Easy
, Class::Tiny
, Class::Std
, and so on, I'm going to call this one "Cor" (short for "Corinna", a possibly fictional woman that the poet Ovid would write poems to). Using the name "Cor" is only for disambiguation. I hope Cor would become core and thus not need a name.
This document should be considered a "rough draft". While I (Ovid) am the initial author, this document has been heavily updated via feedback from Sawyer X and Stevan Little (and a bit from Matt Trout and Peter Mottram). Also, many of the underlying ideas have been directly "liberated" from Stevan's work.
Also, note that the intent is that this will ultimately be implemented in perl
, not Perl
. Thus, it would be written in C and likely be much faster than current options.
In creating this proposal, I assumed the following:
-
No Implementation
This document describes a possible OO system. It does not contain information about implementation. Further, general OO "details" about how roles work, how inheritance works, and so on, are mostly omitted.
-
Feature Compatibility
We should not take away anything core Perl 5 supports. Thus, multiple-inheritance must be supported. I considered dropping it and saying "no, you have to use single inheritance", but we have a host of popular modules, such as Catalyst and DBIx::Class, which use MI and could thus not be easily ported were the authors ever inclined to do so.
-
Simplicity
I strove to make this proposed syntax as simple as possible. This makes implementation easier and will have fewer grounds for objections.
-
Roles Must Be Included
Most modern Perl 5 developers who use Moose/Moo use roles. Many of them use roles heavily. Thus, they will need to be supported.
-
Lexical Scope
If possible, if the changes suggested can only apply to a given lexical scope, I suspect it will be easier to use the new classes with old code.
-
use v5.3X;
This would be implemented as a feature and automatically be enabled if you use
use v5.3X
(or similar syntax). This would avoid having to jump through special hoops to use the new OO syntax. It would simply be there. -
Safety
Cor roles and classes assume
strict
andwarnings
by default. They also use subroutine signatures. -
Hash References
Assumes we use blessed hash references for the first pass. This may be revisited in the future.
-
Role Implementation
Role implementation assumes Traits: The Formal Model (pdf) rather than the less formal Traits: Composable Units of Behavior (pdf) that is usually cited. The authors are the same, but the "Formal Model" is explicit about several assumptions made in the better-known paper.
Below is a minimal and almost certainly incorrect grammar as a starting point for discussion.
(*
cheating by allowing regexes and character classes
*)
Cor ::= CLASS | ROLE
CLASS ::= DESCRIPTOR? 'class' NAMESPACE VERSION? DECLARATION BLOCK?
DESCRIPTOR ::= 'abstract'
ROLE ::= ‘role’ NAMESPACE VERSION? DECLARATION BLOCK?
NAMESPACE ::= IDENTIFIER { '::' IDENTIFIER } VERSION?
DECLARATION ::= { PARENTS | ROLES } | { ROLES | PARENTS }
PARENTS ::= 'isa' NAMESPACE { ',' NAMESPACE }
ROLES ::= 'does' NAMESPACE { ',' NAMESPACE }
IDENTIFIER ::= [:alpha:] {[:alnum:]}
VERSION ::= 'v' DIGIT '.' DIGIT {DIGIT}
DIGIT ::= [0-9]
BLOCK ::= # Work in progress. Described below
The bulk of this is to simply provide two things, classes and roles. The Cor syntax is deliberately simple and would be familiar to Perl 5/6 programmers, as well as programmers of other languages (single quotes imply exact text):
DESCRIPTOR 'class' NAMESPACE VERSION? DECLARATION BLOCK?
'role' NAMESPACE VERSION? DECLARATION BLOCK?
-
DESCRIPTOR
Optional. Currently, if present, must be the keyword
abstract
which indicates a class that cannot be instantiated and must be subclassed. -
'class' or 'role'.
One of
class
orrole
, indicating the type of this code. Required. -
NAMESPACE
The name (package) of the class or role. Follows current naming rules. Required.
-
VERSION
A v-string identifying the version of this class/role.
our $VERSION =
inside of theBLOCK
is also still allowed. Optional. -
DECLARATION
This will be described later, but essentially allows us to declare what classes, if any, we inherit from, and what roles, if any, we consume.
-
BLOCK
The block of code defining the body of the class or role.
Only the 'class'/'role' and NAMESPACE
and required:
class Person;
...
role Comparable;
....
If the BLOCK
is not supplied the changes are file-scoped. Otherwise, they are block-scoped:
class Person { ... }
role Comparable { ... }
If possible, any other syntax changes suggested by this proposal would only apply to the scope of the BLOCK
or file and be an error outside of the block.
In Perl 5, classes and packages are the same thing. While this has some drawbacks, it's worked reasonably well and we'll stick with this.
Cor introduces a new, simplified syntax:
class Dog v0.1 {
method speak () { return 'Woof!' }
}
my $dog = Dog->new;
say $dog->speak; # prints 'Woof'
Note there is no trailing semicolon required.
Alternatively, if no arguments are required, we can omit the parens with
the method
keyword:
class Dog v0.1 {
method speak { return 'Woof!' }
}
Declaring inheritance is done via the isa
keyword and takes a comma-separated list of class names (whitespace allowed). Some restrictions:
-
Cor
You may only inherit from Cor classes as we cannot guarantee the behavior of non-Cor classes. This restriction may be removed in the future. However, for now we would prefer to maintain this restriction to avoid the possibility that Cor and non-Cor classes might need a different
UNIVERSAL
base class, thus altering their behavior. -
C3
C3 method resolution order is assumed.
# cannot be instantiated abstract class Animal { # forward declarations are abstract methods # must method body must be defined by the time its called method speak; } class Dog isa Animal { method speak () { return 'Woof!' } }
In the above, Dog
inherits from Animal
.
By using whitespace to separate the classname from the version, we can also specify versions we require:
abstract class Animal v1.9 {
method speak;
}
class Dog isa Animal v2.0 {
method speak { return 'Woof!' }
}
The above should work according to current Perl 5 semantics (principle of least surprise).
We can also do:
class Kill::Me::Now isa I, Despise, Multiple::Inheritance { ... }
In the above, the class Kill::Me::Now
inherits from I
, Despise
, and Multiple::Inheritance
, in that order.
Classes consume roles with the does
keyword:
class My::Worker does Serializable, Runnable {
...
}
Of course, you can combine this with inheritance:
# obviously, if My::Worker consumes these roles, we do not need to repeat
# this here. This is only an example
class My::Worker::Fast isa My::Worker does Serializable, Runnable {
...
}
You may specify the does
before the is
:
class My::Worker::Fast does Serializable, Runnable isa My::Worker {
...
}
We don't envision supporting excluding or renaming role methods at the start, but please see the "Future Work" section.
Methods are accessed via the method
keyword. Object slots (see below) are accessed via the self
keyword. Methods use signatures, but a method with no arguments (aside from the invocant) may omit the signature:
method speak { say "Woof!" }
method allowed_to_vote (@people) {
my @voters;
foreach my $person (@people) {
push @voters => $person
if self->is_on_voter_role($person);
}
return @voters;
}
Further:
class Foo {
use List::Util 'sum';
...
method dimsum() { ... }
}
Foo->new->sum;
The above would issue a runtime error similar to Can't find method 'sum'
because the dispatcher would recognize sum
as a subroutine, not a method. Further, roles would provide methods, not subroutines. This approach should eliminate the need for namespace::autoclean
and friends.
Method dispatch would be resolved via the invocant class and the method name. The arguments to the method will not be considered.
Note: "slots" are internal data for the object. They provide no public API. By not defining standard is => 'ro'
, is => 'rw'
, etc., we avoid the trap of making it natural to expose everything. Instead, just a little extra work is needed by the developer to wrap slots with methods, thereby providing an affordance to keep the public interface smaller (which is generally accepted as good OO practice).
has SLOT OPTIONS_KV;
has [SLOTS] OPTIONS_KV;
The basic slot declaration is simple;
has 'name';
By default, all slots are read-only and required to be passed to the constructor. Thus, to create an immutable point object:
class Point {
has [ 'x', 'y' ];
method to_string {
# self->x and self->y are not available outside of this class
return sprintf "[%d, %d]" => self->x, self->y;
}
}
my $point = Point->new( x => 3, y => 7 );
say $point->x; # fatal error
say $point->to_string; # [3, 7]
Point->new( x => 4 ); # exception thrown because y is required
To provide a default:
has days_ago => ( default => method {20} );
Also, per conversation with MST, it's possible that all default
slots should be automatically lazy.
Alternatively, if the default
is a string, it's a method name to call (makes it easier to subclass):
has _dbh => ( default => '_build_dbh' );
method () _build_dbh { ... }
We should separate default
and builder
, yes? Anything with a builder
would not be passed to the constructor.
Lazy slots (requires default):
has days_ago => (
default => method { ... },
lazy => 1,
);
We may wish to make the has
function extensible, so that people can experiment with isa
to manage their own types.
Exposing slot data requires writing a method:
class Point {
has [qw/x y/];
method x {self->x}
method y {self->y}
}
Summary of slot options:
default => CodRef|Str
(provide a default value if one is not supplied)lazy => Bool
(if default is provided, don't call it until it's asked for, Default is true?)weaken => Bool
(weaken the reference in the slot. Default is false)optional => Bool
(is this required by the constructor? Default is false) (unsure about this one)rw => Bool
(read-write, but only via theself
keyword. Default is false)
The new
method would be in UNIVERSAL::Cor
and should not be overridden in a subclass, though that will be allowed.. It would take an even-sized list of key/value pairs, omitting the need for a hashref:
Object->new( this => 1, that => 2 ); # good
Object->new({ this => 1, that => 2 }); # bad
A BUILD
method, just like Moose/Moo's BUILD
method, will allow for additional customization:
method BUILD (%args) {
unless ( self->verbose xor self->silent ) {
# Speculative. We don't address exceptions in this proposal
self->throw("You must specify one and only one of 'verbose' or 'silent' ");
}
}
The BUILDARGS
method is how Moose/Moo messes around with arguments to allow us to do things like write this:
Point->new( $x, $y );
Instead of this:
Point->new( x => $x, y => $y );
However, the BUILDARGS
method has always been a bit clumsy. We don't yet have a proposal for this.
In order to not paint ourself into a Cor
ner (hey, I'm a papa. I can tell bad papa jokes), we should have a separate object base class which all Cor objects implicitly inherit from. At minimum:
abstract class UNIVERSAL::Cor v.01 {
method new(%args) { ... }
method can () { ... }
method does () { ... }
method isa () { ... }
}
That mirrors the UNIVERSAL
class we currently inherit from, but there's room for more:
abstract class UNIVERSAL::Cor v.01 {
method new(%args) { ... }
method can () { ... }
method does () { ... }
method isa () { ... }
# these new methods are merely being mentioned, not
# suggested. All can be overridden
method to_string () { ... } # overloaded?
method clone () { ... } # (shallow?)
method object_id () { ... }
method meta () { ... }
method equals () { ... }
method dump () { ... }
method throw($message) { ... }
}
This is still an open discussion. We don't want to pack too much into the API and cause developers pain, but there are so many "common" use cases for objects that we're tired of rewriting ad nauseum that, like many other programming languages, it might be reasonable to put them into the base class.
Opinions welcome. However, this is such a core (no pun intended) needs that understanding if we need a separate UNIVERSAL
class for Cor should be decided before Cor is ready for prime time (see also, Stevan Little's UNIVERSAL::Object).
This, incidentally, is why Cor classes cannot inherit from non-Cor classes and vice-versa.
Here's a simple LRU cache in Moose:
package Cache::LRU {
use Moose;
use Hash::Ordered;
use namespace::autoclean;
has '_cache' => (
isa => 'Hash::Ordered',
default => sub { Hash::Ordered->new },
);
has 'max_size' => (
default => 20,
);
sub set {
my ( $self, $key, $value ) = @_;
if ( $self->_cache->exists($key) ) {
$self->_cache->delete($key);
}
elsif ( $self->_cache->keys >= $self->max_size ) {
$self->_cache->shift;
}
$self->_cache->set( $key, $value );
}
sub get {
my ( $self, $key ) = @_;
$self->_cache->get($key)
}
__PACKAGE__->meta->make_immutable;
}
Here it is in Cor:
class Cache::LRU {
use Hash::Ordered;
has cache => ( default => method { Hash::Ordered->new } );
has max_size => ( default => method { 20 } );
method set ( $key, $value ) {
if ( self->cache->exists($key) ) {
self->cache->delete($key);
}
elsif ( self->cache->keys > self->max_size ) {
self->cache->shift;
}
self->cache->set( $key, $value );
}
method get ($key) { self->cache->get($key) }
}
Note that in the Cor version, any attempt do directly access the cache
or max_size
slots from outside the class via direct access is an error, though you can override them:
my $cache = Cache::LRU->new( max_size => 100 );
my $hash_ordered = $cache->cache; # fatal error
Role syntax is also simple and clear:
role Whiny {
method whine($message) { ... }
}
class My::Class does Whiny {
...
}
My::Class->new->whine('some message');
Roles both provide and require methods. Any methods fully defined in the role body will be composed into the consuming class.
Any methods defined via a forward declaration are "required" to be provided by the consuming class or another role consumed by the same class.
role MyRole {
method this; # class must provide this
method that; # class must provide this
method foo ($bar) { ... } # this is provided
}
Like classes, roles may also have slots in the same matter as classes and those slots will be provided to the class. (What happens if the class defines that slot in a different way from the role? For example, if the role slot is read-write but the class slot is read-only, bugs are awaitin').
Of course, roles can consume other roles:
role SomeRole does ThisRole, ThatRole { ... }
Strictly adhering to the concept that roles are guaranteed to be both commutative (the order of application doesn't matter) and associative (for a given set of roles, it doesn't matter which consumes what, so long as the final set is the same), the above is equivalent to:
role ThatRole does SomeRole, ThisRole { ... }
And their aggregate behavior will be the same if one or more is consumes other roles and are in turn consumed by a class:
role ThatRole does ThisRole { ... }
role SomeRole does ThatRole { ... }
class SomeClass does SomeRole { ... } # role-provided behaviors are identical
The MOP module is likely sufficient for our needs.
For the first pass, we need:
Is the current proposed syntax acceptable? I argue that it is because I feel that it still feels "Perlish", while also feeling clean enough that developers from other languages will feel right at home.
This is problematic:
has 'x';
The above is a private, read-only slot with no value. Unless we require it to be passed to the constructor, it's useless and should possibly be an error. This is where our BUILDARGS
work (or replacement) might come in handy. It would be good if behaviors are specified declaratively, rather than procedurally.
How do we handle class data and methods?
Some argue that class data is a code smell. Fine: argue that all you want. Multiple inheritance is also a code smell, but that doesn't mean we can tell Perl developers "no." But the semantics of that can get tricky.
Class methods, however, are important. For example, you may very well want a factory class with an interface like this:
my $message = Message::Factory->create(@message_list);
Internally:
static method create (@list) {
if ( 1 == @list ) {
return Message::String->new( message => $list[0] );
}
else {
return Message::Collection->new( messages => \@list );
}
}
Inside that method, any attempt to call a method on the C keyword would be a syntax error. It could internally call other static (class) methods directly (?) without an invocant.
For simplicity. We currently don't have a clear vision of how non-hashrefs can be done transparently with this. Falling back to core OO may be a solution for some. For those who know they need a blessed regex, they'll (hopefully) know enough about core OO to go ahead and run with scissors.
Cor tries to be as small as possible to avoid overreach. That means "no modifiers" at this time. However, they cause an issue for roles.
Let's say a method returns the number 10. One role modifies that number by adding a 20% VAT, making the result 12. Another role modifies that to offer a discount of 3, making the result 9. However, if the discount is applied and then the VAT is added, the result is 8.4. Thus, a developer could sort the list of consumed roles and change the behavior.
In the original traits research, one of the issues they were trying to work around was the fact that inheritance order could change code behavior. Consumption of roles, however, were guaranteed to be both commutative (the order of application doesn't matter) and associative (for a given set of roles, it doesn't matter which consumes what, so long as the final set is the same). Method modifiers break this guarantee.
I believe the initial core of Cor should be as simple as we can possibly make it to avoid too much up front work and possibly making mistakes that we cannot walk back. However, any good design of something that is both long-lasting and that we know will grow should at least be aware of future considerations. Otherwise, we might make it harder to address issues.
This will wait until the core OO is there. Making has
extensible might help.
I have no suggested syntax for this, but they're extremely useful.
Matt Trout argues, and I agree, that excluding role methods or renaming them is a code smell. However, if you don't have control over the role source code (downloading from the CPAN or being supplied by another software team), you don't always have the luxury of refactoring the code. Thus, we need to support excluding methods and renaming them.
The syntax for this is less clear at this time, but I envision something like this:
class My::Worker isa Some::Parent::Class
does Serializable(
excludes => ['some_method'],
renames => { old_method => 'new_method' } ) {
method some_method { ... }
method old_method (@args) { ... }
# do something with @args
return self->new_method(@args);
}
}
Excluding or renaming a method automatically makes it a "required" method. This is because, even if you don't use them in your class, the role might use them internally.
This raises an issue. In the original Smalltalk traits papers, they made it clear that a role is defined by its name and the methods it supplies (methods are defined by their signature, not just the name). It's possible that someone might do this:
if ( $object->DOES('Serializable') ) {
...
}
At this point, we don't know if the $object
class excluded any methods from the Serializable
role. Thus, we don't know if any methods we expect from Serializable
will conform to expectations. Thus, the naïve Does('Serializable')
check may be wrong because merely having the role name isn't enough to know if the class exhibits the desired behavior.
In proper OO, the replacement methods should be semantically identical, even if they're doing different things. In reality, we know that these guarantees are often tossed out the window. I don't know that this is really a serious issue because I haven't been hit with this, but I also know that safety in building large scalable systems suggests avoiding pitfalls.
I do not have a recommendation for this, but I point it out so people can be aware of the background.
I have no suggested syntax at this time, but this generally involves reblessing an object into an anonymous subclass which consumes the role or roles. Naturally, it's harder to guarantee object behavior, especially if several roles are applied at runtime in separate statements.
Stevan Little has been working on an object system for Perl for years. And given his background—including creating Moose and Moxie—and his constant research into a "better" way to write OO, he laid much of the groundwork for Cor.
I had been working on a pure-Perl implementation of Cor (because clearly we don't have enough object modules on the CPAN) and discussing it with the Pumpking, Sawyer, at the 2019 EU Perl Conference in Riga, when he said he wanted a spec, not an implementation.
And he's right: with P5P, there are plenty of implementors, but there's been no agreement about what should be implemented. So, working with Stevan and Sawyer, I've had to suffer the humiliation of them laughing at my amateurish mistakes, but it's made this document better as a result.
Any mistakes, of course, are mine.
Putting updates here so they can be easily spotted without consulting the history.
What does this do?
method foo {
some_external_function(self);
}
We can add a check on self
to ensure that private slots cannot be accessed unless we're in a class or a subclass of ref self
, but seems clumsy. Or we can tell developers "don't do that", but we all know what that means.
If we remove self
, we need an easy way for the class to access its internal state. Lexical variables have also been proposed:
class Foo {
has $x => ( rw => 1 );
method bar ($new_x) {
$x = $new_x;
}
}
Feels unperlish to me, but hey, what do I know? :)
We're trying to figure out a better syntax.
@Ovid: the
$cache
accessor in that example is lexical, so the accessor cannot be called from outside the class block. (It's even stored inside out.) And I have lately been rethinking whether defaulting tois=>rw
was a sensible decision. Personally I preferis=>ro
as a default, and there are good arguments in favour ofis=>bare
. I might change the default at some point soon. My thought process on defaulting tois=>rw
was "yes, read only is better, but if you've got read-write, nobody's forcing you to write to it".