Skip to content

Instantly share code, notes, and snippets.

@wimvanderbauwhede
Created November 24, 2020 16:21
Show Gist options
  • Save wimvanderbauwhede/b33aa79eb35affc08fff835f6b42270e to your computer and use it in GitHub Desktop.
Save wimvanderbauwhede/b33aa79eb35affc08fff835f6b42270e to your computer and use it in GitHub Desktop.
Writing faster Raku code

Writing faster Raku code

In an earlier article, I discussed the result of my attempts to optimize the performance of an expression parser written in Perl. I have recently been writing quite a bit of Raku code but so far I had not looked at its performance. Out of curiosity I decided to write and optimise my Fortran expression parser in Raku.

Raku performance as we know it

In the current stage of its development, Raku prioritises functionality over performance. So an easily-made argument is that if you want performance, you should not write your code in Raku. And it is of course true that compiled code will almost always be faster. However, often, rewriting in a compiled language is not an option, so it is important to know how to get the best possible performance in Raku.

The Raku documentation has a page on performance which offers good advice in general terms. But for my needs I did not find the answers there, nor anywhere else. So I created some simple test cases to find out more. I used Raku version 2020.09 built on MoarVM version 2020.09, the most recent one when I ran the tests, but the results should be quite similar for slightly earlier and later versions.

Testing some hunches

Before going into the details on the expression parser, here are some results of performance comparisons that influenced design decisions for the compiler. I was curious to see if they would turn out different in Raku.

Hash key testing is faster than regexp matching

Fortran code essentially consists of a list of statements which can contain expressions, and the parser labels each of the statements once using a hashmap, ever the workhorse data structure in Perl. Every parsed line of code is stored as a pair with this hashmap (which I call $info):

my $parsed_line = [ $src_line, $info ];

This means than in principle I can choose to match a pattern in $line using a regex or use one of the labels in $info. So I tested the performance of hash key testing versus regexp matching, using some genuine FORTRAN 77 code:

my $str = lc('READ( 1, 2, ERR=8, END=9, IOSTAT=N ) X');
my %info =();   
if ($str~~/read/) {
    %info<ReadCall> = 1;
}
my $count=0;

constant NITERS = 10_000_000;
if VER==1 {
    for 1..NITERS -> $i {
        if ($str~~/read/) { 
            $count+=$i;
        }
}
} elsif VER==2 {
    for 1..NITERS -> $i {
            if (%info<ReadCall>:exists) {
                $count+=$i;
            }
    }   
} else {
    for 1..NITERS -> $i {
                $count+=$i;
    }    
}

Without the if-condition, the loop takes 3 s on my laptop. The loop with with the hash key existence test takes 5 s; the regexp match condition takes 53 s. So the actual condition evaluation takes 2 s for hash key existence check and 50 s for regexp match. So testing hash keys is 25 times faster than simple regexp matching.

Custom tree traversals are faster

I tested the cost of using higher-order functions for tree traversal. Basically, this is the choice between a generic traversal which takes an arbitrary function that operates on the tree nodes:

sub _traverse_ast_with_action($ast_, $acc_, &f) {
    my $ast=$ast_; my $acc=$acc_;
    if <cond> { 
        $acc=&f($ast,$acc);
    } else { 
        $acc=&f($ast,$acc);
        for  1 .. $ast.elems - 1  -> $idx {
            (my $entry, $acc) = 
                _traverse_ast_with_action($ast[$idx],$acc, &f);
            $ast[$idx] = $entry;
        }
    }
    return ($ast, $acc);
} 

or a custom traversal:

sub _traverse_ast_custom($ast_, $acc_) {
    my $ast=$ast_; my $acc=$acc_;
    if <cond> { 
        $acc=< custom code acting on $ast and $acc>;
    } else { 
    $acc=< custom code acting on $ast and $acc>;
        for 1 .. $ast.elems - 1  -> $idx {
            (my $entry, $acc) = 
                _traverse_ast_custom($ast[$idx],$acc);
            $ast[$idx] = $entry;
        }
    }
    return ($ast, $acc);
} 

For the case of the tree data structures in my compiler, the higher-order implementation takes more than twice as long as the custom traversal, so for performance this is not a good choice. Therefore I don't use higher-order functions in the parser, but I do use them in the later refactoring passes.

The fastest way to process a list

Raku has several ways to iterate through a list. I tested six of them, as follows:

constant NITERS = 2_000_000;
if CASE==0 { # 6.2 s
    my @src = map {$_}, 1 .. NITERS;
    my @res = map {2*$_+1}, @src;
} elsif CASE==1 { # 7.9 s
    my @res=();
    my @src=();
    for 1..NITERS -> $elt {
        push @src, $elt;
    }
    for @src -> $elt {
        push @res, 2*$elt+1;
    }
} elsif CASE==2 { # 6.2 s
    my @res=();
    my @src=();
    for 0..NITERS-1 -> $idx {
        my $elt=$idx+1;
        @src[$idx] = $elt;
    }
    for 0..NITERS-1 -> $idx {
        my $elt=@src[$idx];
        @res[$idx] = 2*$elt+1;
    }
} elsif CASE==3 { # 11.0
    my @res=();
    my @src=();
    loop (my $idx=0;$idx < NITERS;++$idx) {
        my $elt=$idx+1;
        @src[$idx] = $elt;
    }
    loop (my $idx2=0;$idx2 < NITERS;++$idx2) {
        my $elt=@src[$idx2];
        @res[$idx2] = 2*$elt+1;
    }
} elsif CASE==4 { # 3.7 s
    my @src = ();
    my @res=();
    push @src, $_ for 1 .. NITERS;
    push @res, 2*$_+1 for @src;
} elsif CASE==5 { # 3.5 s
    my @src = ($_ for 1 .. NITERS);
    my @res= (2*$_+1 for @src);
}

The fastest way is to use list comprehension (case 5, 3.5 s), very closely followed by the suffix-style for (case 4, 3.7 s). The C-style loop construct (case 3) is the slowest (11 s). The map version performs the same as the index-based for loop (both 6.2 s). It is a bit odd that the list-based for loop, probably the most common loop construct, is slower than these two (7.9 s).

Expression parsing

What I loosely call an expression parser is actually a combination of a lexer and a parser: it turns a string of source code into a tree-like data structure which expresses the structure of the expression and the purpose of its constituents. For example if the expression is 2*v+1, the result of the expression parser will be a data structure which identifies the top-level expression as a sum of a multiplication with the integer constant 1, and the multiplication of an integer constant 2 with a variable v. So how do we build a fast expression parser? It is not my intention to go into the computing science details, but instead to discuss the choices to be considered.

A fast data structure

First, the choice of the data structure matters. As we need a tree-like ordered data structure, it would have to either an object or a list. But objects in are slow, so I use a nested list.

['+'
    ['*',
        2,
        ['$','v']
    ],
    1
]

Integer comparison is faster than string comparison

This data structure is fine if you don't need to do a lot of work on it. However, because every node is labelled with a string, testing against the node type is a string comparison. I tested this as follows:

if VER==1 { # 7.3 - 5.3 = 2 s net
    for 1 .. NITERS -> $i {
        my $str = chr($i % 43);
        if $str eq '*' {
            $count+=$i;
        }
    }
} 
elsif VER==2 { # 3.3 - 3.1 = 0.3
    for 1..NITERS -> $i {
        my $c = $i % 43;
        if $c == 42 {
            $count+=$i;
        }
    }
} elsif VER==3 { # 5.3
    for 1..NITERS -> $i {
        my $str = chr($i % 43);
    }
} elsif VER==4 { # 3.1
    for 1..NITERS -> $i {
        my $c = $i % 43;
    }
}

I populate the string or integer based on the loop iterator and then perform a comparison to a constant string or integer. By subtracting the time talken for the assignment I obtain the actual time for the comparison.

On my laptop, the version with string comparison takes 2 s net, the integer comparison 0.3 s. So doing string comparisons is at least 5 times slower than doing integer comparisons. Therefore my data structure uses integer labels. Also, I label the constants so that I can have different labels for string, integer and real constants, and because in this way all nodes are arrays. This avoids having to test if a node is an array or a scalar, which is a slow operation.

So the example becomes :

[ 3,
  [ 5,
    [ 29, '2' ],
    [ 2, 'v' ]
  ],
  [ 29, '1' ]
]

Less readable, but faster and easier to extend.

Parsing: regular expressions, string comparisons or list operations?

Then we have to decide how to parse the expression string. The traditional way to build an expression parser is using a Finite State Machine, consuming one character at a time (if needed with one or more characters look-ahead) and keeping track of the identified portion of the string. This is very fast in a language such as C but in Raku I was not too sure, because in Raku a character is actually a string of length one, so every test against a character is a string comparison. On the other hand, Raku has a sophisticated regular expression engine. Yet another way is to turn the string into an array, and parse using list operations. I created a test bench to see which approach was faster:

constant NITERS = 100_000;
my $str='This means we need a stack per type of operation and run until the end of the expression';
my @chrs =  $str.comb;
if (VER==0) { # 5.8 s
    for 1 .. NITERS -> $ct {
        my @words=();
        my $word='';
        map(-> \c { 
            if (c ne ' ') {
                $word ~= c;
            } else {
                push @words, $word;
                $word='';
            }
        }, @chrs);
        push @words, $word;
    }
} elsif VER==1 { # 2.7 s    
     for 1 .. NITERS -> $ct {
        my @words=();
        my $str='This means we need a stack per type of operation and run until the end of the expression';
        while my $idx=$str.index( ' ' ) {
            push @words, $str.substr(0,$idx);
            $str .= substr($idx+1);
        }
        push @words, $str;
    }         
} elsif VER==2 {  # 11.7 s
    for 1 .. NITERS -> $ct {
        my @words=();
        my @chrs_ = @chrs; 
        my $word='';      
        while @chrs_ {
            my $chr = shift @chrs_;
            if ($chr ne ' ') {
                $word~=$chr;
            } else {
                push @words, $word;
                $word='';
            }
        }
        push @words, $word;
    }
} elsif VER==3 { # 101 s
    for 1 .. NITERS -> $ct {
        my @words=();
        my $str='This means we need a stack per type of operation and run until the end of the expression';
        while $str.Bool {
            $str ~~ s/^$<w> = [ \w+ ]//;
            if ($<w>.Bool) {
                push @words, $<w>.Str;
            }
            else {
                $str ~~ s/^\s+//;
            } 
        }
    }   
} elsif VER==4 { # 64 s
    for 1 .. NITERS -> $ct {
        my \res = reduce(
        -> \acc, \c { 
            if (c ne ' ') {
                acc[0],acc[1] ~ c;
            } else {
                ( |acc[0], acc[1] ),'';
            }
        }, ((),''), |@chrs);
        my @words = |res[0],res[1];
}

For the list-based version, the overhead is 1.6 s; for the string-based versions, 0.8s.

The results are rather striking. Clearly the regexp version is by far the slowest. This was a surprise because in my Perl implementation, the regexp version was twice as fast as next best choice. From the other implementations, the string-based FSM which uses the index and substr methods is by far the fastest, without the overhead it takes 1.9s s, which is more that 50 times faster than the regexp version. The map based version comes second but is nearly twice as slow. What is surprising, and actually a bit disappointing, is that the reduce based version, which works the same as the map based one but works on immutable data, is also very slow, 64 s.

In any case, the choice is clear. It is possible to make the fastest version marginally faster (1.6 s instead of 1.9 s) by not reducing the string but instead moving the index through the string. However, for the full parser I want to have the convenience of the trim-leading and starts-with methods, so I choose to consume the string.

A faster expression parser

With the choices of string parsing and data structure made, I focused on the structure of the overall algorithm. The basic approach is to loop trough a number of states and in every state perform a specific action. In the Perl version this was very simple because we use regular expressions to identify tokens, so most of the state transitions are implicit. I wanted to keep this structure so I emulate the regexp s/// operation with comparisons, indexing and substring operations.

my $prev_lev=0;
my $lev=0;
my @ast=();
my $op;
my $state=0;
while (length($str)>0) {
     # Match unary prefix operations
     # Match terms
     # Add prefix operations if matched
     # Match binary operators
     # Append to the AST
}

The matching rules and operations are very simple (I use <pattern> and <integer> as placeholders for the actual values). Here is the Perl version for reference:

  • prefix operations:
if ( $str=~s/^<pattern>// ) { $state=<integer>; } 
  • terms:
if ( $str=~s/^(<pattern>)// ) { $expr_ast=[<integer>,$1]; }
  • operators:
$prev_lev=$lev;
if ( $str=~s/^<pattern>// ) { $lev=<integer>; $op=<integer>; }

In the Raku version I used the given/when construct, which is as fast as an if statement but a bit neater.

  • prefix operations:
given $str {
    when .starts-with(<token>) { 
        .=substr(<length of token>); 
        $state<integer>; }
  • terms:
given $str
    when .starts-with(<token start>) { 
        $expr_ast=[<integer>,$term]; }
  • operators:
given $str {
    when .starts-with(<token>) { 
        .=substr(<length of token>); 
        $lev=<integer>; 
        $op=<integer>; 
    }

One of the more complex patterns to match is the case of an identifier followed by an opening parenthesis with optional whitespace. Using regular expressions this pattern would be:

if  $str ~~ s:i/^ $<token> = [ [a .. z] \w*] \s* \( // { 
    my $var=$<token>.Str;
    ... 
}

Without regular expressions, we first check for a character between 'a' and 'z' using 'a' le .substr(0,1).lc le 'z'. If that matches, we remove it from $str and add it to $var. Then we go in a while loop for as long as there are characters that are alphanumeric or '_'. Then we strip any whitespace and test for '('.

when 'a' le (my $var = .substr(0,1)).lc le 'z' {
    my $idx=1;
    my $c = .substr($idx,1);
    while 'a' le $c.lc le 'z' or $c eq '_' 
        or '0' le $c le '9' {
        $var~=$c;
        $c = .substr(++$idx,1);
    }
    .=substr($idx);
    .=trim-leading;
    if .starts-with('(') {
        ...
    }
}

Another complex pattern is that for a floating point number. In Fortran, the pattern is more complicated because the sub-pattern .e can be part of a floating-point constant but could also be the part of the equality operator .eq.. Furthermore, the separator between the mantissa and the exponent can be not just e but also d or q. So the regular expression is rather involved:

if (                    	
    (
        !($str~~rx:i/^\d+\.eq/) and
        $str~~s:i/^([\d*\.\d*][[e|d|q][\-|\+]?\d+]?)//        
    )        	
    or 
    $str~~s:i/^(\d*[e|d|q][\-|\+]?\d+)//
) {
    $real_const_str=$/.Str;
} 

Without regular expression, the implementation is as follows. We first detect a character between 0 and 9 or a dot. Then we try to match the mantissa, separator, sign and exponent. The latter three are optional; if they are not present and the mantissa does not contain a dot, we have matched an integer.

when '0' le .substr(0,1) le '9' or .substr(0,1) eq '.' { 
    my $sep='';
    my $sgn='';
    my $exp='';
    my $real_const_str='';

    # first char of mantissa
    my $mant = .substr(0,1);
    # try and match more chars of mantissa
    my $idx=1;
    $h = .substr($idx,1);
    while '0' le $h le '9' or $h eq '.' {
        $mant ~=$h;
        $h = .substr(++$idx,1);
    }
    $str .= substr($idx);

    # reject .eq.
    if not ($mant.ends-with('.') and .starts-with('eq',:i)) { 
        if $h.lc eq 'e' | 'd' | 'q' {
            # we found a valid separator
            $sep = $h;            
            my $idx=1;
            $h =.substr(1,1);
            # now check if there is a sign
            if $h eq '-' or $h eq '+' {
                ++$idx;
                $sgn = $h;
                $h =.substr($idx,1);
            }
            # now check if there is an exponent            
            while '0' le $h le '9' {
                ++$idx;
                $exp~=$h;
                $h =.substr($idx,1);
            }
            $str .= substr($idx);
            if $exp ne '' {
            $real_const_str="$mant$sep$sgn$exp";
            $expr_ast=[30,$real_const_str];
            } else {
                # parse error
            }
        } elsif index($mant,'.').Bool {
            # a mantissa-only real number
            $real_const_str=$mant;
            $expr_ast=[30,$real_const_str];
        }
        else { # no dot and no sep, so an integer
            $expr_ast=[29,$mant];   
        }
    } else { # .eq., backtrack and carry on
        $str ="$mant$str";        
        proceed;
    }            
}

A final example of how to handle patterns is the case of whitespace in comparison and logical operators. Fortran has operators of the form <dot word dot>, for example .lt. and .xor.. But annoyingly, it allows whitespace between the dot and the word, e.g. . not .. Using regular expressions, this is of course easy to handle, for example:

if $str~~s/^\.\s*ge\s*\.//) {
    $lev=6;
    $op=20;
} 

I check for a pattern starting with a dot and which contains a space before the next dot. Then I remove all spaces from that substring using trans and replace this original string with this trimmed version.

when .starts-with('.') and  .index( ' ' ) 
    and (.index( ' ' ) < (my $eidx = .index('.',2 ))) {
    
    # Find the keyword with spaces
    my $match = .substr(0, $eidx+1);
    # remove the spaces
    $match .= trans( ' ' => '' );
    # update the string
    $str = $match ~ .substr( $eidx+1);
    proceed;
}

Conclusion

Overall the optimised expression parser in Raku is still very close to the Perl version. The key difference is that the Raku version does not use regular expressions. With the above examples I wanted to illustrate how it is possible to write code with the same functionality as a regular expression s/// operation, using some of Raku's built-in string operations:

  • substr : substring
  • index : location a a substring in a string
  • trim-leading : strip leading whitespace
  • starts-with
  • ends-with
  • trans : used to remove whitespace using the ' ' => '' pattern
  • lc : used in range tests instead of testing against both upper and lower case
  • le, lt, ge, gt: for very handy range comparisons, e.g. 'a' le $str le 'z'

The resulting code is of course much longer but arguably more readable than regular expressions, and currently four times faster.

I ran a lot more tests, and compared performance against Perl and Python as well, but that is another story. All code for the tests is available in my GitHub repo.

@JJ
Copy link

JJ commented Nov 27, 2020

  1. Please add a bit of context, just the gist of what's an expression parser and why you want it optimized so that people don't go away from this article to the one you link.
  2. Don't quite agree with Raku prioritizing functionality over performance. Most improvements in the last 2 years are related to performance. I'm not getting much out of that paragraph, I'm afraid. I would like these articles to be good introductory articles for people unfamiliar, or not too familiar, with Raku, not tips and tricks for the aficionado.
  3. Maybe you can link the Fortran code?
  4. What do you call a hashmap? It's a two-element array?
  5. How do we know there are labels in $info?
  6. Please mind indentation of curlies, the one on top of the elsif statement is kind of confusing. Alwo, why not use given/when? You are also repeating the 1..NITERS range, which you could assign somewhere else. Not really clear what you're doing there.
  7. You say "without the if-condition", but there are two ifs. Why don't you time the two loops independently? Also, I dont' really see the point of the comparison. In an actual statement, you will have to check for /read/ every time, right?
  8. In which context are you using custom tree traversals? Is it an alternative to what? What tree are you talking about? Also, what does it mean in this context of making Raku faster? Are you measuring the two implementations mentioned?
  9. What list are you processing? It might be the case that all that info is in the linked article, but as said above, can't you put the gist of it here? Should be as self-contained as possible. Or is this simply a different tip for a different context?
  10. Also, it would be convenient in the next example to wrap a benchmark around every piece of code, maybe even spin it off to a routine. Would make it more readable.
  11. You come back to expression parsing later. Not clear if it follows from the previous text or not. Also, link everything: lexer, parser. Very likely that section should be bumped up to the beginning, because it provides context for the rest.
  12. There's a comma missing in your nested list. Also, it's an array:
  13. Once I'm getting to this point, it's likely that this article is a bit long for an Advent article, which are usually short and a few minutes read. Maybe the part "faster expression parser" could go to a second advent calendar article? As long as you're OK with that. Anyway, it's a good job, but I would appreciate if you helped make it better by addressing some of the concerns above.

@nige123
Copy link

nige123 commented Nov 27, 2020

Just some suggestions:

  • some of the optimisation/speed problems are being addressed by the Jonathan Worthington's RakuAST work - it would be good to link to this work - it might be that some of the optimisations you've identified may no longer apply in future

  • another thing to consider is optimising for human readability/understandability - which version is easiest to understand for the hoomans?

@wimvanderbauwhede
Copy link
Author

@JJ I'll do what I can but you are asking for a lot of changes and some of them will take me a lot of time.

Re splitting it into two parts, I'm OK with that if you think the first part on its own is interesting enough. But maybe the 2nd part does not have to be an advent calendar article, I can publish it on my own blog.

I just want to address one point here: "Don't quite agree with Raku prioritizing functionality over performance." Currently Raku is 10x slower than Perl also many times slower than Python for all actions that I have been testing. You can see the results in the spreadsheet in my repo if you are interested.
If you say Raku is not prioritizing functionality over performance, do you mean that the consensus is that performance at the moment is already good enough? In particular regular expression performance is very disappointing, it is more than 50x slower than Perl. To get the language widely accepted, I think better performance is important.

I took care not to make any relative comparisons in my article, and I said that the current focus is on functionality, because I did not want to put Raku in a poor light. But my own conclusion is that sadly, Raku is currently way too slow for me to use it in my work.

@nige123
Copy link

nige123 commented Dec 2, 2020

The process of development has been make it work first - and then secondly make it work fast (enough). This second part of the plan is currently in progress now: RakuAST

@wimvanderbauwhede
Copy link
Author

@nige123 I am aware of RakuAST. In the article I discuss current trade-offs. But I will put in a link to it.

@JJ
Copy link

JJ commented Dec 2, 2020

@wimvanderbauwhede they're not mandatory. Do your best.

WRT to performance/functionality. There's that, there's also if we want to make that point in an article which should be laudatory for Raku. OTOH, while it was true that at the very beginning functionality was the priority, there's no single release since then that has not include improvements in performance. It still lags behind others, but that does not belie the fact that right now, performance is one of the priorities.

@wimvanderbauwhede
Copy link
Author

No worries, I removed that paragraph.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment