wimvanderbauwhede/writing-faster-raku-part-1.md

## writing-faster-raku-part-1.md

      
    Raw
  

              writing-faster-raku-part-1.md
            
          
    Writing faster Raku code, Part 1

Last year, in Perl land, I discussed the result of my attempts to optimize the performance of an expression parser which is part of my Perl-based Fortran source-to-source compiler. An expression parser takes strings representing expressions in a programming language (in my case Fortran) and turns it into a data structure called a parse tree, which the compiler uses for further analysis and code generation.
I have recently been writing quite a bit of Raku code but so far I had not looked at its performance. Out of curiosity I decided to rewrite and optimise this Fortran expression parser in Raku.
Expression parsing

What I loosely call an expression parser is actually a combination of a lexer and a parser: it turns a string of source code into a tree-like data structure which expresses the structure of the expression and the purpose of its constituents. For example if the expression is 2*v+1, the result of the expression parser will be a data structure which identifies the top-level expression as a sum of a multiplication with  the integer constant 1, and the multiplication of an integer constant 2 with a variable v.
So how do we build a fast expression parser in Raku? In this first part of the article, I look at some of the choices and trade-offs to be considered. In the follow-up article, I will discuss the actual implementation of the expression parser.
Raku performance testing


The Raku documentation has a page on performance which offers good advice in general terms. But for my needs I did not find the answers about the specific trade-offs that I might have to make. So I created some simple test cases to find out more. I used Raku version  2020.09 built on MoarVM version 2020.09, the most recent one when I ran the tests, but the results should be quite similar for slightly earlier and later versions, at least until the new RakuAST model is finished, as this will likely have a lot of impact on the performance.
I test the performance using a series of small testbenches with different cases, controlled by a command line argument, using the time command to obtain the wall clock time, and taking the average over 5 runs. For example,
$ time raku test_hash_vs_regex.raku 1
There is more than one way to do it, but only one will be the fastest

Parsing involves taking strings and turning them into other data structures, so there are many decisions to be made about the data structures and the ways to turn strings into them and manipulate them. Here are some results of performance comparisons that influenced design decisions for the compiler. I was curious to see if they would turn out different in Raku.
Hash key testing is faster than regexp matching

Fortran code essentially consists of a list of statements which can contain expressions, and in my compiler the statement parser labels each of the statements once using a hashmap. Every parsed line of code is stored as a pair of the original string $src_line with this hashmap, called $info:
my $parsed_line = [ $src_line, $info ];
The labels and values stored in $info depend on the type of statement. It is not a priori clear if matching a pattern in $src_line using a regex is faster or slower than looking up the corresponding label in $info. So I tested the performance of hash key testing versus regexp matching, using some genuine FORTRAN 77 code, a READ I/O call, labelled in $info as ReadCall:
my $str = lc('READ( 1, 2, ERR=8, END=9, IOSTAT=N ) X');
my $info = {};   
if ($str~~/read/) {
    $info<ReadCall> = 1;
}
my $count=0;

constant NITERS = 10_000_000;
if CASE==1 {
    for 1..NITERS -> $i {
# regexp        
        if ($str~~/read/) { 
            $count+=$i;
        }
    }
} elsif CASE==2 {
    for 1..NITERS -> $i {
# hash lookup        
            if ($info<ReadCall>:exists) {
                $count+=$i;
            }
    }   
} elsif CASE==3 {
    for 1..NITERS -> $i {
# overhead        
                $count+=$i;
    }    
}
Without the if-condition in its body (CASE==3), the for 1..NITERS loop takes 3 s on my laptop. The loop with with the hash key existence test takes 5 s; the regexp match condition takes 53 s. So the actual condition evaluation takes 2 s for hash key existence check and 50 s for regexp match.
Result: Testing hash keys is 25 times faster than simple regexp matching. So we trade some memory for computation: we identify the statement once using a regexp, an store the identifying label in $info for subsequent passes.
A fast data structure for the parse tree: integer versus string comparison

The choice of the data structure for the parsed expression matters. As we need a tree-like ordered data structure, it would have to either an object or a list-like data structure. But objects in are slow, so I use a nested array.
['+',
    ['*',
        2,
        ['$','v']
    ],
    1
]

This data structure is fine if you don't need to do a lot of work on it. However, because every node is labelled with a string, testing against the node type is a string comparison. Simply testing against a constant string or integer is not good enough as the compiler might optimise this away. So I tested this as follows to make sure $str and $c get a new value on every iteration:
if CASE==1 { # 7.3 - 5.3 = 2 s net
    for 1 .. NITERS -> $i {
# string equality        
        my $str = chr($i % 43);
        if $str eq '*' {
            $count+=$i;
        }
    }
} 
elsif CASE==2 { # 3.3 - 3.1 = 0.3
    for 1..NITERS -> $i {
# int equality        
        my $c = $i % 43;
        if $c == 42 {
            $count+=$i;
        }
    }
} elsif CASE==3 { # 5.3
    for 1..NITERS -> $i {
# string equality overhead        
        my $str = chr($i % 43);
    }
} elsif CASE==4 { # 3.1
    for 1..NITERS -> $i {
# int equality overhead
        my $c = $i % 43;
    }
}
I populate the string or integer based on the loop iterator and then perform a comparison to a constant string or integer. By subtracting the time taken for the assignment (cases 3 and 4) I obtain the actual time for the comparison.
On my laptop, the version with string comparison takes 2 s net, the integer comparison 0.3 s. So doing string comparisons is at least 5 times slower than doing integer comparisons. Therefore my data structure uses integer labels. Also, I label the constants so that I can have different labels for string, integer and real constants, and because in this way all nodes are arrays. This avoids having to test if a node is an array or a scalar, which is a slow operation.
So the example becomes :
[ 3,
  [ 5,
    [ 29, '2' ],
    [ 2, 'v' ]
  ],
  [ 29, '1' ]
]
Less readable, but faster and easier to extend. In what follows, what I call the parse tree is this data structure.
Result: String comparisons is at least 5 times slower than doing integer comparisons.
Custom tree traversals are faster

I tested the cost of using higher-order functions for parse tree traversal (recursive descent). Basically, this is the choice between a generic traversal using a higher-order function which takes an arbitrary function that operates on the parse tree nodes:
sub _traverse_ast_with_action($ast_, $acc_, &f) {
    my $ast=$ast_; my $acc=$acc_;
    if <cond> { 
        $acc=&f($ast,$acc);
    } else { 
        $acc=&f($ast,$acc);
        for  1 .. $ast.elems - 1  -> $idx {
            (my $entry, $acc) = 
                _traverse_ast_with_action($ast[$idx],$acc, &f);
            $ast[$idx] = $entry;
        }
    }
    return ($ast, $acc);
} 
or a custom traversal:
sub _traverse_ast_custom($ast_, $acc_) {
    my $ast=$ast_; my $acc=$acc_;
    if <cond> { 
        $acc=< custom code acting on $ast and $acc>;
    } else { 
    $acc=< custom code acting on $ast and $acc>;
        for 1 .. $ast.elems - 1  -> $idx {
            (my $entry, $acc) = 
                _traverse_ast_custom($ast[$idx],$acc);
            $ast[$idx] = $entry;
        }
    }
    return ($ast, $acc);
} 

For the case of the parse tree data structures in my compiler, the higher-order implementation takes more than twice as long as the custom traversal, so for performance this is not a good choice. Therefore I don't use higher-order functions in the parser, but I do use them in the later refactoring passes.
Result: Higher-order implementations of recursive descent take more than twice as long as custom traversals.
The fastest way to process a list

The internal representation of a Fortran program in my compiler is an list of [ $src_line, $info ] pairs and the $info hash stores the parse tree as a nested array. So iterating through lists and arrays is a major factor in the performance.
Raku has several ways to iterate through a list-like data structure. I tested six of them, as follows:
constant NITERS = 2_000_000;
if CASE==0 { # 6.2 s
# map
    my @src = map {$_}, 1 .. NITERS;
    my @res = map {2*$_+1}, @src;
} elsif CASE==1 { # 7.9 s
# for each elt in list
    my @res=();
    my @src=();
    for 1..NITERS -> $elt {
        push @src, $elt;
    }
    for @src -> $elt {
        push @res, 2*$elt+1;
    }
} elsif CASE==2 { # 6.2 s
# for with index
    my @res=();
    my @src=();
    for 0..NITERS-1 -> $idx {
        my $elt=$idx+1;
        @src[$idx] = $elt;
    }
    for 0..NITERS-1 -> $idx {
        my $elt=@src[$idx];
        @res[$idx] = 2*$elt+1;
    }
} elsif CASE==3 { # 11.0
# loop (C-style)
    my @res=();
    my @src=();
    loop (my $idx=0;$idx < NITERS;++$idx) {
        my $elt=$idx+1;
        @src[$idx] = $elt;
    }
    loop (my $idx2=0;$idx2 < NITERS;++$idx2) {
        my $elt=@src[$idx2];
        @res[$idx2] = 2*$elt+1;
    }
} elsif CASE==4 { # 3.7 s
# postfix for with push
    my @src = ();
    my @res=();
    push @src, $_ for 1 .. NITERS;
    push @res, 2*$_+1 for @src;
} elsif CASE==5 { # 3.5 s
# comprehension
    my @src = ($_ for 1 .. NITERS);
    my @res= (2*$_+1 for @src);
}
The fastest way is to use list comprehension (case 5, 3.5 s), very closely followed by the suffix-style for (case 4, 3.7 s). The C-style loop construct (case 3) is the slowest (11 s). The map version performs the same as the index-based for loop (both 6.2 s). It is a bit odd that the list-based for loop, probably the most common loop construct, is slower than these two (7.9 s).
Result: List comprehensions are fastest, almost twice as fast as for-loops or maps. C-style loop is very slow.
Conclusions so far

With this set of rather diverse experiments, we have learned the following:

Testing hash keys is 25 times faster than regexp matching, so match once and store in a hash.
String comparisons is at least 5 times slower than doing integer comparisons, so if you care about speed, prefer integer comparisons.
Higher-order implementations of recursive descent take more than twice as long as custom traversals. So copy-paste is faster than abstract out.
List comprehensions are fastest, almost twice as fast as for-loops or maps. C-style loop is very slow. So for fastest list iteration, use a comprehension.

Apart from the last one, these conclusions are the same as for Perl. In the follow-on article we'll look at the performance of parsing strings and the final design of the expression parser.
All code for the tests is available in my GitHub repo.