rurban/blog-todo.md

## blog-todo.md

      
    Raw
  

              blog-todo.md
            
          
    My perl5 TODO list

Below is a formal list of possible optimizations, which most would agree on.
We had these discussion in 2001 with damian were perl6 and perl5i took off.
I'd like to work on these for perl5 core and need decisions. Most p5p
hackers seem to be informed about the general possibilities and directions,
but not all. We'd need this to improve general perl5 performance, and also
help static compilation.[1]
We had this before, so I'd like to keep it formal. So each
proposal gets a perl6-like name, and replies should change the subject
to that name. I choose PDD for "Perl Design Draft".
Beforehand: "compiler" means op.c not B::C. compile-time and run-time
should be obvious.
PDD01 const / readonly lexicals

The CONST op currently is a SVOP, holding a global gvsv.
A CONST op might hold lexicals also, a PADOP type.
The more constants the compiler knows at compile-time the better
it can optimize.
The following datatypes need to be represented as const:


PADSV (lexicals and esp. function arguments)


"PDD02 final classes - const @ISA"


"PDD03 immutable classes - const %class::"


Esp. readonly function arguments need to be parsed into lexical consts,
but "my const $i" or "my $i:ro" also. I have no opinion on "my $i is ro",
but it would be the best choice.
See "PDD05 Function and method signatures"
Datatypes:

SVt_READONLY already is good enough to hold this information in the data.
But the compiler does not want to optimize on datatypes, the information
needs to represented as OP. Just for the special cases @ISA and stashes
it is not needed.
So either add a mixin svop+padop type for CONST decriminated by OPpCONST_PAD 1,
add a CONST flag to PADSV,
or add a new CONSTPAD op, replacing PADSV/const which needs to be added
into all current CONST checks in the compiler.
CONST with OPpCONST_PAD flag:

Pro: Easier and faster for the compiler.
Contra: The logic for the new OP type which is a union of SVOP and PADOP
needs to be added for all accessors. B and its libraries, but also XS walkers.
PADSV with OPpPAD_CONST flag

Pro: Does not break libraries
Contra: CONST checks need to check PADSV's also.
CONSTPAD:

Pro: Does not break libraries
Contra: CONST checks need to check CONSTPAD also.
Personally I lean against CONSTPAD.
Keywords: (how to parse)

The following variants are being considered:
lexicals and globals:
my const $i; my const ($i, $j) = (0, 1);   (as const keyword upfront)
my $i :ro;
my $i is ro;

See "PDD05 Function and method signatures"
sub call (const $i) {}
sub call ($i:ro) {}
sub call ($i is ro) {}

See "PDD02 final classes - const @ISA"
const our @ISA = ('MyBase');
our @ISA :ro = ('MyBase');
our @ISA is ro = ('MyBase');
class MyClass is final {
  our @ISA = ('MyBase');
}
class MyClass (extends => ('MyBase'), is_final => 1) {}

See "PDD03 immutable classes - const %class::"
const package MyClass { } and const package MyClass;
const %MyClass::;
class MyClass is immutable {}
class MyClass (is_immutable => 1) {}

No keyword. immutable should be the new default for the class keyword,
old-style packages stay mutable.
Keyword discussion:
The type qualifier const, which creates CONST/CONSTPAD op and sets the
SVf_READONLY flag can be represented either as new keyword "const",
which looks most natural, but is hardest to parse. Larry opposed it
initially, because it looked to C++ish.  But nowadays it looks best.
The attribute it would be easiest to parse, as a MYTERM also parses
and handles attributes, The MYTERM type just needs be extended for
signatures. It also looks natural.
The perl6-like type trait is harder to parse, and a bit unnatural
for lexicals.
The Moose style hash attributes only work for classes, not for
lexicals and sigs.
PDD02 final classes - const @ISA

A const isa is commonly known as "final" keyword. The class is not extendable,
the compiler can do compile-time method resolution, i.e. convert a method
to a function.
Pro: Compile-time method resolution
If the compiler knows at compile-time for each method, that all isa's
until the method is found are const and also those classes are immutable (const),
the method can be converted to a function.
That would be a huge performance win, esp. with classes with favor methods
over hash accessors.
Note that the accessor typo problem could also be solved with const
hashes of the object representation, but nobody is using that yet.
A const class (const %classname::) not, as this is independent of the
underlying object representation, which is usually a blessed hash.
Function calls are slow, and method calls even 10% minimum slower.
(10% for immediately found methods, for a deeper search the run-time costs are higher)
Contra:
I hope the "final" problem is known from java. Since the compiler needs to
know in advance the inheritances it is not possible to extend and override
methods of final classes. One cannot extend java strings.
Thanks to Michael Schwern for the discussion.
Solutions:


(Reini): Define the following convention. No additional keywords needed.
Libraries may use final, but finalization is defered until the application
is processed, and all libraries (use statements) are already loaded. So mocking
is still possible, but the default is to use compile-time method resolution.
Schwern sees a problem in that scheme which I haven't understood yet.


(Larry): Libraries may use final, but the application with a
#pragma final has the final word.


See also pddtypes.pod
PDD03 immutable classes - const %class::

Classes should default to immutable, packages keep the dynamic behaviour
unless a package is declared as const. (Damian)
Some might know from Moose that immutable classes makes it 20x faster,
even if not all possible optimizations are yet done.
PDD04 Types

They are already parsed for lexicals, just not for named arguments.
The 3 coretypes int, num, string need to be reserved. p5-mop will
probably define more. bool needs to be added probably also.
Type conventions in core are needed to


talk to other languages, like json, perl6 or java,


to specify the wanted behavior for methods acting on types,
such as smartmatch or multi-methods, or


for special performance purposes, e.g. int loop counters, int arithmetic,
smaller and faster typed arrays or hashes, or to enforce compile-time method resolution.


See pddtypes.pod
and perltypes.pod
I had an old version at my blog and
at YAPC
An initial benefit would be natively typed arrays and hashes in core,
with const hashes even optimizable hashes (so called "perfect
hashes"). Further type checks and optimizers are left to modules.
PDD04.1 CHECK_SCALAR_ATTRIBUTES

Compile-time attribute hook for our three types to
be able to use attributes for my declarations.
Note: Attributes still suffer from an over-architectured and broken
Attribute::Handler implementation which evals the attribute value.
our $name:help(print the name); will call eval "print the name";
Without fixing this, attributes will have no chance to be accepted.
The syntax is nice, and it is already parsed.
PDD05 Function and method signatures

The current prototype syntax explictly allows named arguments.
There are several implementations already.
But there are several decisions required.
In order to optimize function and method calls, we need to define
type qualifiers, and eventually return types, even if they are not
used yet.
New syntax allows changing the semantics.
Lets follow perl6:


is bind (default) vs is copy (old semantics)


is ro (default) vs is rw (old semantics)


allow passing types and attributes to functions.
attributes allow user-define hooks as now, just on function entries,
not on variable declarations.


Optional arguments are defined by specifying defaults.
If we do not follow perl6 syntax with "is", we need attributes
to specify ":rw" and possibly "$" to specify bind (by reference).
e.g. sub myadder (\$i, $num = 1) { $i =+ $num }
or   sub myadder ($i:rw, $num = 1) { $i =+ $num }

bind ro is by far the fastest calling convention.
optimizable and checkable by the compiler. copy is the safe way,
rw uses the old $_[n] semantics.
I outlined my proposal in pddtypes.pod
Q: Do function args and return values keep constness?
A: Only function args by ref. This is current behaviour and makes sense.
PDD06 Function return types

Any optimizer needs to stop if a function return type is not known.
We don't even know if any value is returned at all, so we have to check @_
at every LEAVESUB, though the parser knows the context information already.
By optionally declaring return types, a type checker and optimizer can kick in.
Esp. for coretypes like int, num, str or void or a const qualifier.
There exist old and wildly different syntaxes for return types,
but they are unused. Use the perl6 syntax, which is c-like.
Q What about libraries declaring their return values constant? I cannot change
them then and have to copy them?
A: No. Return values so far are not const. Only if you declare a function to
return a const it will be so.
PDD07 Compile-time entersub optimizations

Calling a function via ENTERSUB and cleaning
up at LEAVESUB is by far the slowest part of perl.
We can check our functions for the following situations:
exceptions, jumps out, lexicals, locals, function calls,
recursive calls.
If none of these occur, the function can be inlined.
We also need to check for tail calls and arguments. (signatures)
If no exceptions or no locals occur the parts in ENTERSUB and LEAVESUB
which deal with that can be skipped.
We need to store the context and possible return type in ENTERSUB and
LEAVESUB to speed up @_ handling.
We need to seperate XS calls from ENTERSUB.
PDD08 Compile-time op-arg optimizations

Our current optree resolves op argument types (the compile-time op
flags and also the POP'ed flags) at run-time.  For the cases the op
itself specifies the behavior or the argument type can be compile-time
deferred (lvalue, context, magic, ...), an optimized op version should be
used.
Promote type pessimization to all affected ops, and use optimized ops
for non-pessimized. Similar to i_opt (integer constant folding) if all
operands are non-magic IVs.
The biggest blocker are functions borders. Without named arguments
passed as bind (alias), each function must optimize from scratch and
looses all information.
PDD09 Compile-time function inlining

See "PDD07 Compile-time entersub optimizations". entersub (and
leavesub) needs to hold compiler information about the function, which
requires waiting for parsing all embedded functions.
Even functions with arguments can be inlined, for safe versions with
arguments by copy, and destructive arguments by bind. They just need
a scope block.
PDD10 Compile-time method resolution

We can easily change run-time method calls at compile-time to
function calls. What is left is a decision on "PDD02 final classes - const @ISA"
and "PDD03 immutable classes - const %class::"
Outlined here how-perl-calls-subs-and-methods
and further refined at "Compile-time type optimizations" in perltypes
PDD11 Compile-time method inlining

This just does method resolution (change to functions) and
then does function inlining.
PDD12 Run-time method caching

This is trivial as there are already isa change hooks.
METHOD_NAMED and METHOD just need a check a global method or
object cache.
PDD13 Multi

multi needs types. (As smartmatch needs types to work reliably.)
As for the syntax multi can be implemented traditionally where the
compiler generates the different methods per types automatically, or
the perl6 way, with a seperate keyword.  I see no problem with the
first approach. This would need no new keyword.
PDD14 MOP

The current MOP discussion and opinion is mainly about the new class
and method keywords, but a MOP has nothing to do with that. Also not
with Moose or a new object system. A MOP allows the definition of new
behaviour for classes, methods, attributes, types, roles, inheritance
and so on.  How they are initialized, the layout, the behavior. A
definition of alternate object systems. It is mainly proposed to
overcome a Moose problem with anonymous packages, to seperate classes
from stashes.
Introducing a MOP is good if the current object system is not good
enough.  The current object system is not good enough for Moose, and
should be improved. There need to be two seperate discussions. One
about what improvements Moose needs from the traditional stash based
objects (global vs lexical namespaces - anon Packages), and the second
about the MOP itself.
I have no opinion on the mop. Just this: Why bother with a mop before
some basic langauge features are not yet decided upon? Moose does not
even use types properly yet. This smells for premature hooks. But
pmichaud is highly convinced that a p5 mop is a good thing.
PDD20 no vivify

Something like autovivification needs to get added to improve the optree.
As shown in http://blogs.perl.org/users/rurban/2012/10/optimizing-compiler-benchmarks-part-3.html
disabling vivification of arrays but also hashes will lead to compile-time optimizations and dramatic performance improvements, similar to const arrays or hashes, but even better.
autovivification is not usable for optimizations, as it is hard to detect, there's no API
for an optimizer, and it needs to be in the compiler to be able to generate better code.
Currently it adds checks, but does not improve the optree.
PDD21 no magic

Similar to no vivify or const lexicals, a lexical 'no magic' pragma can lead to compile-time optimizations and dramatic performance improvements.
PDD22 slimmer nextstate

Slimmer nextstate op variants can be optimized at compile-time, which do not:
reset PL_taint, the stack pointer and FREETMPS.
PDD23 loop unrolling

As shown in http://blogs.perl.org/users/rurban/2012/10/optimizing-compiler-benchmarks-part-2.html
AELEMFAST is about 2 times faster than the generic AELEM, but it needs to know the index at compile-time. This is easy to do for loops.
Unroll loops with lots of AELEM into AELEMFAST accesses.
PDD30 Alternative parser

The worst part of perl is the parser. It is a hack, it is fast, but
changing and esp. adding rules in a sane manner is hard, because the parser
deviates in too many ways from a lexer/tokenizer seperation. For adding
new syntax you usually cannot just add the syntax rules to perly.y
Second generating a traditional AST which generates a better optree
(better optimizable, or emit jit or emit native code) is worthwile.
PDD31 Alternative vm

Our VM is a stack machine, which handles the stack on the heap.
There are no typed alternatives.
There are integer optimized opts, but they are rarely used, "use
integer" and "my int" can overcome this, but overflow behaviour needs
to be defined. Either slow promotion to number or fast integer wrap,
unsigned or signed. With "my int" this behaviour can be changed.
The VM is simple and easy to XS, but has major problems.
An alternative VM could be based on parrot or vmkit or simply
reuse the existing ops, with a different compiler and different
stack handling.
A c-stack based compiler could arrange the optree as a natively
compiled or jit'ed C program. Before each op call the op arguments
(0-2 SV pointers) are put on the stack, lexicals also as in native
closures, and functions are called natively via cdecl or stdcall,
depending on if we need varargs.
By using LLVM even a register based (fastcall) layout can be arranged.
PDD32 Jit

A jit could solve the run-time decisions for dynamic cases, which are not
solvable at compile-time. But the vm should be JIT friendly. The current
VM is quite jit-friendly, but the ops itself are too dynamic, there's not
enough compile-time information in the ops, only in the operands at run-time.
There need to be pre-compiled optimized alternatives for certain ops with known
argument types.
To be practical I'm thinking of adding labels with a naming scheme to most ops,
where a JIT or LLVM could hook into.
Just some random examples from pp.c, to give you an idea.
PP(pp_pos)
{
    dVAR; dSP; dPOPss;

    if (PL_op->op_flags & OPf_MOD || LVRET) {
      pp_pos_mod:
	  SV * const ret = sv_2mortal(newSV_type(SVt_PVLV));
	  sv_magic(ret, NULL, PERL_MAGIC_pos, NULL, 0);
	  LvTYPE(ret) = '.';
	  LvTARG(ret) = SvREFCNT_inc_simple(sv);
	  PUSHs(ret);    /* no SvSETMAGIC */
	  RETURN;
    }
    else {
	  if (SvTYPE(sv) >= SVt_PVMG && SvMAGIC(sv)) {
        pp_pos_mg:
	    const MAGIC * const mg = mg_find(sv, PERL_MAGIC_regex_global);
	    if (mg && mg->mg_len >= 0) {
	      dTARGET;
	      I32 i = mg->mg_len;
	      if (DO_UTF8(sv))
	        sv_pos_b2u(sv, &i);
	      PUSHi(i);
	      RETURN;
	    }
	  }
	  RETPUSHUNDEF;
    }
}

PP(pp_refgen)
{
    dVAR; dSP; dMARK;
    if (GIMME != G_ARRAY) {
      pp_refgen_gimme_not_array:
	  if (++MARK <= SP)
	    *MARK = *SP;
	  else
	    *MARK = &PL_sv_undef;
	  *MARK = refto(*MARK);
	  SP = MARK;
	  RETURN;
    }
    pp_refgen_gimme_array:
    EXTEND_MORTAL(SP - MARK);
    while (++MARK <= SP)
	  *MARK = refto(*MARK);
    RETURN;
}

Footnotes:

"Ertl and Gregg analyze the performance of the following interpreters:
Gforth, OCaml, Scheme48, Yap, Perl, Xlisp. While Gforth, OCaml,
Scheme48 and Yap are categorized as efficient interpreters, Perl and
Xlisp benchmarks are used for comparison purposes as inefficient
interpreters.

While efficient interpreters perform with a slowdown by a factor of 10
when compared to an optimizing native code compiler, inefficient
interpreters have a slowdown by a factor of 1000."
M. Anton Ertl and David Gregg. The structure and performance of
efficient interpreters. Journal of Instruction-Level Parallelism,
5:125, November 2003. Cited on pages 6 and 7.
https://students.ics.uci.edu/~sbruntha/cgi-bin/download.py?key=thesis