wimvanderbauwhede/writing-faster-raku.md

## writing-faster-raku.md

      
    Raw
  

              writing-faster-raku.md
            
          
    Writing faster Raku code

In an earlier article, I discussed the result of my attempts to optimize the performance of an expression parser written in Perl. I have recently been writing quite a bit of Raku code but so far I had not looked at its performance. Out of curiosity I decided to write and optimise my Fortran expression parser in Raku.
Raku performance as we know it

In the current stage of its development, Raku prioritises functionality over performance. So an easily-made argument is that if you want performance, you should not write your code in Raku. And it is of course true that compiled code will almost always be faster. However, often, rewriting in a compiled language is not an option, so it is important to know how to get the best possible performance in Raku.
The Raku documentation has a page on performance which offers good advice in general terms. But for my needs I did not find the answers there, nor anywhere else. So I created some simple test cases to find out more. I used Raku version  2020.09 built on MoarVM version 2020.09, the most recent one when I ran the tests, but the results should be quite similar for slightly earlier and later versions.
Testing some hunches

Before going into the details on the expression parser, here are some results of performance comparisons that influenced design decisions for the compiler. I was curious to see if they would turn out different in Raku.
Hash key testing is faster than regexp matching

Fortran code essentially consists of a list of statements which can contain expressions, and the parser labels each of the statements once using a hashmap, ever the workhorse data structure in Perl. Every parsed line of code is stored as a pair with this hashmap (which I call $info):
my $parsed_line = [ $src_line, $info ];
This means than in principle I can choose to match a pattern in $line using a regex or use one of the labels in $info. So I tested the performance of hash key testing versus regexp matching, using some genuine FORTRAN 77 code:
my $str = lc('READ( 1, 2, ERR=8, END=9, IOSTAT=N ) X');
my %info =();   
if ($str~~/read/) {
    %info<ReadCall> = 1;
}
my $count=0;

constant NITERS = 10_000_000;
if VER==1 {
    for 1..NITERS -> $i {
        if ($str~~/read/) { 
            $count+=$i;
        }
}
} elsif VER==2 {
    for 1..NITERS -> $i {
            if (%info<ReadCall>:exists) {
                $count+=$i;
            }
    }   
} else {
    for 1..NITERS -> $i {
                $count+=$i;
    }    
}
Without the if-condition, the loop takes 3 s on my laptop. The loop with with the hash key existence test takes 5 s; the regexp match condition takes 53 s. So the actual condition evaluation takes 2 s for hash key existence check and 50 s for regexp match. So testing hash keys is 25 times faster than simple regexp matching.
Custom tree traversals are faster

I tested the cost of using higher-order functions for tree traversal. Basically, this is the choice between a generic traversal which takes an arbitrary function that operates on the tree nodes:
sub _traverse_ast_with_action($ast_, $acc_, &f) {
    my $ast=$ast_; my $acc=$acc_;
    if <cond> { 
        $acc=&f($ast,$acc);
    } else { 
        $acc=&f($ast,$acc);
        for  1 .. $ast.elems - 1  -> $idx {
            (my $entry, $acc) = 
                _traverse_ast_with_action($ast[$idx],$acc, &f);
            $ast[$idx] = $entry;
        }
    }
    return ($ast, $acc);
} 
or a custom traversal:
sub _traverse_ast_custom($ast_, $acc_) {
    my $ast=$ast_; my $acc=$acc_;
    if <cond> { 
        $acc=< custom code acting on $ast and $acc>;
    } else { 
    $acc=< custom code acting on $ast and $acc>;
        for 1 .. $ast.elems - 1  -> $idx {
            (my $entry, $acc) = 
                _traverse_ast_custom($ast[$idx],$acc);
            $ast[$idx] = $entry;
        }
    }
    return ($ast, $acc);
} 

For the case of the tree data structures in my compiler, the higher-order implementation takes more than twice as long as the custom traversal, so for performance this is not a good choice. Therefore I don't use higher-order functions in the parser, but I do use them in the later refactoring passes.
The fastest way to process a list

Raku has several ways to iterate through a list. I tested six of them, as follows:
constant NITERS = 2_000_000;
if CASE==0 { # 6.2 s
    my @src = map {$_}, 1 .. NITERS;
    my @res = map {2*$_+1}, @src;
} elsif CASE==1 { # 7.9 s
    my @res=();
    my @src=();
    for 1..NITERS -> $elt {
        push @src, $elt;
    }
    for @src -> $elt {
        push @res, 2*$elt+1;
    }
} elsif CASE==2 { # 6.2 s
    my @res=();
    my @src=();
    for 0..NITERS-1 -> $idx {
        my $elt=$idx+1;
        @src[$idx] = $elt;
    }
    for 0..NITERS-1 -> $idx {
        my $elt=@src[$idx];
        @res[$idx] = 2*$elt+1;
    }
} elsif CASE==3 { # 11.0
    my @res=();
    my @src=();
    loop (my $idx=0;$idx < NITERS;++$idx) {
        my $elt=$idx+1;
        @src[$idx] = $elt;
    }
    loop (my $idx2=0;$idx2 < NITERS;++$idx2) {
        my $elt=@src[$idx2];
        @res[$idx2] = 2*$elt+1;
    }
} elsif CASE==4 { # 3.7 s
    my @src = ();
    my @res=();
    push @src, $_ for 1 .. NITERS;
    push @res, 2*$_+1 for @src;
} elsif CASE==5 { # 3.5 s
    my @src = ($_ for 1 .. NITERS);
    my @res= (2*$_+1 for @src);
}
The fastest way is to use list comprehension (case 5, 3.5 s), very closely followed by the suffix-style for (case 4, 3.7 s). The C-style loop construct (case 3) is the slowest (11 s). The map version performs the same as the index-based for loop (both 6.2 s). It is a bit odd that the list-based for loop, probably the most common loop construct, is slower than these two (7.9 s).
Expression parsing

What I loosely call an expression parser is actually a combination of a lexer and a parser: it turns a string of source code into a tree-like data structure which expresses the structure of the expression and the purpose of its constituents. For example if the expression is 2*v+1, the result of the expression parser will be a data structure which identifies the top-level expression as a sum of a multiplication with  the integer constant 1, and the multiplication of an integer constant 2 with a variable v. So how do we build a fast expression parser? It is not my intention to go into the computing science details, but instead to discuss the choices to be considered.
A fast data structure

First, the choice of the data structure matters. As we need a tree-like ordered data structure, it would have to either an object or a list. But objects in are slow, so I use a nested list.
['+'
    ['*',
        2,
        ['$','v']
    ],
    1
]
Integer comparison is faster than string comparison

This data structure is fine if you don't need to do a lot of work on it. However, because every node is labelled with a string, testing against the node type is a string comparison. I tested this as follows:
if VER==1 { # 7.3 - 5.3 = 2 s net
    for 1 .. NITERS -> $i {
        my $str = chr($i % 43);
        if $str eq '*' {
            $count+=$i;
        }
    }
} 
elsif VER==2 { # 3.3 - 3.1 = 0.3
    for 1..NITERS -> $i {
        my $c = $i % 43;
        if $c == 42 {
            $count+=$i;
        }
    }
} elsif VER==3 { # 5.3
    for 1..NITERS -> $i {
        my $str = chr($i % 43);
    }
} elsif VER==4 { # 3.1
    for 1..NITERS -> $i {
        my $c = $i % 43;
    }
}
I populate the string or integer based on the loop iterator and then perform a comparison to a constant string or integer. By subtracting the time talken for the assignment I obtain the actual time for the comparison.
On my laptop, the version with string comparison takes 2 s net, the integer comparison 0.3 s. So doing string comparisons is at least 5 times slower than doing integer comparisons. Therefore my data structure uses integer labels. Also, I label the constants so that I can have different labels for string, integer and real constants, and because in this way all nodes are arrays. This avoids having to test if a node is an array or a scalar, which is a slow operation.
So the example becomes :
[ 3,
  [ 5,
    [ 29, '2' ],
    [ 2, 'v' ]
  ],
  [ 29, '1' ]
]
Less readable, but faster and easier to extend.
Parsing: regular expressions, string comparisons or list operations?

Then we have to decide how to parse the expression string. The traditional way to build an expression parser is using a Finite State Machine, consuming one character at a time (if needed with one or more characters look-ahead) and keeping track of the identified portion of the string. This is very fast in a language such as C but in Raku I was not too sure, because in Raku a character is actually a string of length one, so every test against a character is a string comparison. On the other hand, Raku has a sophisticated regular expression engine. Yet another way is to turn the string into an array, and parse using list operations. I created a test bench to see which approach was faster:
constant NITERS = 100_000;
my $str='This means we need a stack per type of operation and run until the end of the expression';
my @chrs =  $str.comb;
if (VER==0) { # 5.8 s
    for 1 .. NITERS -> $ct {
        my @words=();
        my $word='';
        map(-> \c { 
            if (c ne ' ') {
                $word ~= c;
            } else {
                push @words, $word;
                $word='';
            }
        }, @chrs);
        push @words, $word;
    }
} elsif VER==1 { # 2.7 s    
     for 1 .. NITERS -> $ct {
        my @words=();
        my $str='This means we need a stack per type of operation and run until the end of the expression';
        while my $idx=$str.index( ' ' ) {
            push @words, $str.substr(0,$idx);
            $str .= substr($idx+1);
        }
        push @words, $str;
    }         
} elsif VER==2 {  # 11.7 s
    for 1 .. NITERS -> $ct {
        my @words=();
        my @chrs_ = @chrs; 
        my $word='';      
        while @chrs_ {
            my $chr = shift @chrs_;
            if ($chr ne ' ') {
                $word~=$chr;
            } else {
                push @words, $word;
                $word='';
            }
        }
        push @words, $word;
    }
} elsif VER==3 { # 101 s
    for 1 .. NITERS -> $ct {
        my @words=();
        my $str='This means we need a stack per type of operation and run until the end of the expression';
        while $str.Bool {
            $str ~~ s/^$<w> = [ \w+ ]//;
            if ($<w>.Bool) {
                push @words, $<w>.Str;
            }
            else {
                $str ~~ s/^\s+//;
            } 
        }
    }   
} elsif VER==4 { # 64 s
    for 1 .. NITERS -> $ct {
        my \res = reduce(
        -> \acc, \c { 
            if (c ne ' ') {
                acc[0],acc[1] ~ c;
            } else {
                ( |acc[0], acc[1] ),'';
            }
        }, ((),''), |@chrs);
        my @words = |res[0],res[1];
}
For the list-based version, the overhead is 1.6 s; for the string-based versions, 0.8s.
The results are rather striking. Clearly the regexp version is by far the slowest. This was a surprise because in my Perl implementation, the regexp version was twice as fast as next best choice. From the other implementations, the string-based FSM which uses the index and substr methods is by far the fastest, without the overhead it takes 1.9s s, which is more that 50 times faster than the regexp version. The map based version comes second but is nearly twice as slow. What is surprising, and actually a bit disappointing, is that the reduce based version, which works the same as the map based one but works on immutable data, is also very slow, 64 s.
In any case, the choice is clear. It is possible to make the fastest version  marginally faster (1.6 s instead of 1.9 s) by not reducing the string but instead moving the index through the string. However, for the full parser I want to have the convenience of the trim-leading and starts-with methods, so I choose to consume the string.
A faster expression parser

With the choices of string parsing and data structure made, I focused on the structure of the overall algorithm. The basic approach is to loop trough a number of states and in every state perform a specific action. In the Perl version this was very simple because we use regular expressions to identify tokens, so most of the state transitions are implicit. I wanted to keep this structure so I emulate the regexp s/// operation with comparisons, indexing and substring operations.
my $prev_lev=0;
my $lev=0;
my @ast=();
my $op;
my $state=0;
while (length($str)>0) {
     # Match unary prefix operations
     # Match terms
     # Add prefix operations if matched
     # Match binary operators
     # Append to the AST
}
The matching rules and operations are very simple (I use <pattern> and <integer> as placeholders for the actual values). Here is the Perl version for reference:

prefix operations:

if ( $str=~s/^<pattern>// ) { $state=<integer>; } 

terms:

if ( $str=~s/^(<pattern>)// ) { $expr_ast=[<integer>,$1]; }

operators:

$prev_lev=$lev;
if ( $str=~s/^<pattern>// ) { $lev=<integer>; $op=<integer>; }
In the Raku version I used the given/when construct, which is as fast as an if statement but a bit neater.

prefix operations:

given $str {
    when .starts-with(<token>) { 
        .=substr(<length of token>); 
        $state<integer>; }

terms:

given $str
    when .starts-with(<token start>) { 
        $expr_ast=[<integer>,$term]; }

operators:

given $str {
    when .starts-with(<token>) { 
        .=substr(<length of token>); 
        $lev=<integer>; 
        $op=<integer>; 
    }
One of the more complex patterns to match is the case of an identifier followed by an opening parenthesis with optional whitespace. Using regular expressions this pattern would be:
if  $str ~~ s:i/^ $<token> = [ [a .. z] \w*] \s* \( // { 
    my $var=$<token>.Str;
    ... 
}
Without regular expressions, we first check for a character between 'a' and 'z' using 'a' le .substr(0,1).lc le 'z'. If that matches, we remove it from $str and add it to $var. Then we go in a while loop for as long as there are characters that are alphanumeric or '_'. Then we strip any whitespace and test for '('.
when 'a' le (my $var = .substr(0,1)).lc le 'z' {
    my $idx=1;
    my $c = .substr($idx,1);
    while 'a' le $c.lc le 'z' or $c eq '_' 
        or '0' le $c le '9' {
        $var~=$c;
        $c = .substr(++$idx,1);
    }
    .=substr($idx);
    .=trim-leading;
    if .starts-with('(') {
        ...
    }
}
Another complex pattern is that for a floating point number. In Fortran, the pattern is more complicated because the sub-pattern .e can be part of a floating-point constant but could also be the part of the equality operator .eq.. Furthermore, the separator between the mantissa and the exponent can be not just e but also d or q. So the regular expression is rather involved:
if (                    	
    (
        !($str~~rx:i/^\d+\.eq/) and
        $str~~s:i/^([\d*\.\d*][[e|d|q][\-|\+]?\d+]?)//        
    )        	
    or 
    $str~~s:i/^(\d*[e|d|q][\-|\+]?\d+)//
) {
    $real_const_str=$/.Str;
} 
Without regular expression, the implementation is as follows. We first detect a character between 0 and 9 or a dot. Then we try to match the mantissa, separator, sign and exponent. The latter three are optional; if they are not present and the mantissa does not contain a dot, we have matched an integer.
when '0' le .substr(0,1) le '9' or .substr(0,1) eq '.' { 
    my $sep='';
    my $sgn='';
    my $exp='';
    my $real_const_str='';

    # first char of mantissa
    my $mant = .substr(0,1);
    # try and match more chars of mantissa
    my $idx=1;
    $h = .substr($idx,1);
    while '0' le $h le '9' or $h eq '.' {
        $mant ~=$h;
        $h = .substr(++$idx,1);
    }
    $str .= substr($idx);

    # reject .eq.
    if not ($mant.ends-with('.') and .starts-with('eq',:i)) { 
        if $h.lc eq 'e' | 'd' | 'q' {
            # we found a valid separator
            $sep = $h;            
            my $idx=1;
            $h =.substr(1,1);
            # now check if there is a sign
            if $h eq '-' or $h eq '+' {
                ++$idx;
                $sgn = $h;
                $h =.substr($idx,1);
            }
            # now check if there is an exponent            
            while '0' le $h le '9' {
                ++$idx;
                $exp~=$h;
                $h =.substr($idx,1);
            }
            $str .= substr($idx);
            if $exp ne '' {
            $real_const_str="$mant$sep$sgn$exp";
            $expr_ast=[30,$real_const_str];
            } else {
                # parse error
            }
        } elsif index($mant,'.').Bool {
            # a mantissa-only real number
            $real_const_str=$mant;
            $expr_ast=[30,$real_const_str];
        }
        else { # no dot and no sep, so an integer
            $expr_ast=[29,$mant];   
        }
    } else { # .eq., backtrack and carry on
        $str ="$mant$str";        
        proceed;
    }            
}
A final example of how to handle patterns is the case of whitespace in comparison and logical operators. Fortran has operators of the form <dot word dot>, for example .lt. and .xor.. But annoyingly, it allows whitespace between the dot and the word, e.g. . not .. Using regular expressions, this is of course easy to handle, for example:
if $str~~s/^\.\s*ge\s*\.//) {
    $lev=6;
    $op=20;
} 
I check for a pattern starting with a dot and which contains a space before the next dot. Then I remove all spaces from that substring using trans and replace this original string with this trimmed version.
when .starts-with('.') and  .index( ' ' ) 
    and (.index( ' ' ) < (my $eidx = .index('.',2 ))) {
    
    # Find the keyword with spaces
    my $match = .substr(0, $eidx+1);
    # remove the spaces
    $match .= trans( ' ' => '' );
    # update the string
    $str = $match ~ .substr( $eidx+1);
    proceed;
}
Conclusion

Overall the optimised expression parser in Raku is still very close to the Perl version. The key difference is that the Raku version does not use regular expressions. With the above examples I wanted to illustrate how it is possible to write code with the same functionality as a regular expression s/// operation, using some of Raku's built-in string operations:

substr : substring
index : location a a substring in a string
trim-leading : strip leading whitespace
starts-with
ends-with
trans : used to remove whitespace using the ' ' => '' pattern
lc : used in range tests instead of testing against both upper and lower case
le, lt, ge, gt: for very handy range comparisons, e.g. 'a' le $str le 'z'

The resulting code is of course much longer but arguably more readable than regular expressions, and currently four times faster.
I ran a lot more tests, and compared performance against Perl and Python as well, but that is another story. All code for the tests is available in my GitHub repo.