Skip to content

Instantly share code, notes, and snippets.

@alabamenhu
Last active December 11, 2022 14:48
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save alabamenhu/7261845862162b13b4a2affb4d945f58 to your computer and use it in GitHub Desktop.
Save alabamenhu/7261845862162b13b4a2affb4d945f58 to your computer and use it in GitHub Desktop.
Advent Day — Santa's Making a List

If there's anything that Santa and his elves ought to know, it's how to make a list. After all, they're reading lists that children send in, and Santa maintains his very famous list. Another thing we know is that Santa and his elves are quite multilingual.

So one day one of the elfs decided that, rather than hand typing out a list of gifts based on the data they received (requiring elves that spoke all the world's languages), they decided to take advantage of the power of Unicode's CLDR (Common Linguistic Data Repository). This is Unicode's lesser-known project. As luck would have it, Raku has a module providing access to the data, called Intl::CLDR. One elf decided that he could probably use some of the data in it to automate their list formatting.

He began by installing Intl::CLDR and decided to play around with it in the terminal.
The module was designed to allow some degree of exploration in a REPL, so the elf did the following after reading the provided read me:

                         # Repl response
use Intl::CLDR;          # Nil
my $english = cldr<en>   # [CLDR::Language: characters,context-transforms,
                         #  dates,delimiters,grammar,layout,list-patterns,
                         #  locale-display-names,numbers,posix,units]

The module loaded up the data for English and the object returned has a neat gist that provides information about the elements it contains. For a variety of reasons, Intl::CLDR objects can be referenced either as attributes or as keys. Most of the time, the attribute reference is faster in performance, but the key reference is more flexible (because let's be honest, $english{$foo} looks nicer than $english."$foo"(), and it also enables listy assignment via e.g. $english<grammar numbers>).

In any case, the elf saw that one of the data points is list-patterns, so he explored further:

                                        # Repl response
$english.list-patterns;                 # [CLDR::ListPatterns: and,or,unit]
$english.list-patterns.and;             # [CLDR::ListPattern: narrow,short,standard]
$english.list-patterns.standard;        # [CLDR::ListPatternWidth: end,middle,start,two]
$english.list-patterns.standard.start;  # {0}, {1}
$english.list-patterns.standard.middle; # {0}, {1}
$english.list-patterns.standard.end;    # {0}, and {1}
$english.list-patterns.standard.two;    # {0} and {1}

Aha! He found the data he needed.

List patterns are catalogued by their function (and-ing them, or-ing them, and a unit one designed for formatting conjoined units such as 2ft 1in or similar). Each pattern has three different lengths. Standard is what one would use most of the time, but if space is a concern, some languages might allow for even slimmer formatting. Lastly, each of those widths has four forms. The two form combines, well, two elements. The other three are used to collectively join three or more: start combines the first and second element, end combines the penultimate and final element, and middle combines all second to penultimate elements.

He then wondered what this might look like for other languages. Thankfully, testing this out in the repl was easy enough:

my &and-pattern = { cldr{$^language}.list-patterns-standard<start middle end two>.join: "\t"'" }
                  # Repl response (RTL corrected, s/\t/' '+/)
and-pattern 'es'  # {0}, {1}    {0}, {1}    {0} y {1}    {0} y {1}
and-pattern 'ar'  # ‮{0} و{1}     {0} و{1}    {0} و{1}    {0} و{1}
and-pattern 'ko'  # {0}, {1}    {0}, {1}    {0} 및 {1}    {0} 및 {1}
and-pattern 'my'  # {0} - {1}   {0} - {1}   {0}နှင့် {1}    {0}နှင့် {1}
and-pattern 'th'  # {0} {1}     {0} {1}     {0} และ{1}    {0}และ{1}

He quickly saw that there was quite a bit of variation!
Thank goodness someone else had already catalogued all of this for him. So he went about trying to create a simple formatting routine. To begin, he created a very detailed signature and then imported the modules he'd need.

#| Lengths for list format.  Valid values are 'standard', 'short', and 'narrow'.
subset ListFormatLength of Str where <standard short narrow>;

#| Lengths for list format.  Valid values are 'and', 'or', and 'unit'.
subset ListFormatType of Str where <standard short narrow>;

use User::Language;     # obtains default languages for a system
use Intl::LanguageTag;  # use standardized language tags
use Intl::CLDR;         # accesses international data

#| Formats a list of items in an internationally-aware manner
sub format-list(
                     +@items,                   #= The items to be formatted into a list
    LanguageTag()    :$language = user-language #= The language to use for formatting
    ListFormatLength :$length   = 'standard',   #= The formatting width
    ListFormatType   :$type     = 'and'         #= The type of list to create
) {
    ...
    ...
    ...
}

That's a bit of a big bite, but it's worth taking a look at. First, the elf decided uses declarator POD wherever it's possible. This can really help out people who might want to use his eventual module in an IDE, for autogenerating documentation, or for curious users in the REPL. (If you type in ListFormatLength.WHY, the text “Lengths for list format … and 'narrow'” will be returned.) For those unaware of declarator POD, you can use either #| to apply a comment to the following symbol declaration ( in the example, for the subset and the sub itself), or #= to apply it to the preceeding symbol declaration (most common with attributes).

Next, he imports two modules that will be useful. User::Language detects the system language, and he uses this to provide sane defaults. Intl::LanguageTag is one of the most fundamental modules in the international ecosystem. While he wouldn't strictly need it (we'll see he'll ultimately only use them in string-like form), it helps to ensure at least a plausible language tag is passed.

If you're wondering what the +@items means, it applies a DWIM logic to the positional arguments. If one does format-list @foo, presumably the list is @foo, and so @items will be set to @foo. On the other hand, if someone does format-list $foo, $bar, $xyz, presumably the list isn't $foo, but all three items. Since the first item isn't a Positional, Raku assumes that $foo is just the first item and the remaining positional arguments are the rest of the items. The extra () in LanguageTag() means that it will take either a LanguageTag or anything that can be coerced into one (like a string).

Okay, so with that housekeeping stuff out of the way, he gets to coding the actual formatting, which is devilishly simple:

    my $format = cldr{$language}.list-format{$type}{$length};
    my ($start, $middle, $end, $two) = $format<start middle end two>;
    
    
    if    @items  > 2 { ...                          }
    elsif @items == 2 { @items[0] ~ $two ~ @items[1] }
    elsif @items == 1 { @items.head                  }
    else              { ''                           }

He paused here to check and see if stuff would work. So he ran his script and added in the following tests:

                                  # output
format-list <>,    :language<en>; # '' 
format-list <a>,   :language<en>; # 'a'
format-list <a b>, :language<en>; # 'a{0} and {1}b'

While the simplest two cases were easy, the first one to use CLDR data didn't work quite as expected. The elf realized he'd need to actually replace the {0} and {1} with the item. While technically he should use subst or similar, after going through the CLDR, he realized that all of them begin with {0} and end with {1}. So he cheated and changed the initial assignment line to

    my $format = cldr{$language}.list-format{$type}{$length};
    my ($start, $middle, $end, $two) = $format<start middle end two>.map: *.substr(3, *-3);

Now he his two-item function worked well. For the three-or-more condition though, he had to think a bit harder how to combine things. There are actually quite a few different ways to do it! The simplest way for him was to take the first item, then the $start combining text, then join the second through penutimate, and then finish off with the $end and final item:

    if @items > 2 { 
        ~ $items[0]
        ~ $start
        ~ $items[1..*-2].join($middle)
        ~ $end
        ~ $items[2]
    }
    elsif @items == 2 { @items[0] ~ $two ~ @items[1] }
    elsif @items == 1 { @items.head                  }
    else              { ''                           }

Et voilà! His formatting function was ready for prime-time!

                                      # output
format-list <>,        :language<en>; # '' 
format-list <a>,       :language<en>; # 'a'
format-list <a b>,     :language<en>; # 'a and b'
format-list <a b c>,   :language<en>; # 'a, b, and c'
format-list <a b c d>, :language<en>; # 'a, b, c, and d'

Perfect! Except for one small problem. When they actually started using this, the computer systems melted some of the snow away because it overheated. Every single time they called the function, the CLDR database needed to be queried and the strings would need to be clipped. The elf had to come up with something to be a slight bit more efficient.

He searched high and wide for a solution, and eventually found himself in the dangerous lands of here be dragons, otherwise known in Raku as EVAL. He knew that EVAL could potentially be dangerous, but that for his purposes, he could avoid those pitfalls. What he would do is query CLDR just once, and then produce a code block that would do the simple logic based on the number of items in the list. The string values could probably be hard coded, sparing some variable look ups too.

There be dragons here

EVAL should be used with great caution. All it takes is one errant unescaped string being accepted from an unknown source and your system could be taken. This is why it requires you to affirmatively type use MONKEY-SEE-NO-EVAL in a scope that requires it. However, in situations like this, where we control all inputs going in, things are much safer. In tomorrow's article, we'll discuss ways to do this in an even more safer manner, although it adds a small degree of complexity.

Back to the regularly scheduled program

To begin, the elf imagined his formatting function as if it had hardcoded values. He just used the English ones for now:

sub format-list(+@items) {
    if    @items  > 2 { @items[0] ~ $start ~ @items[1..*-2].join($middle) ~ $end ~ @items[*-1] }
    elsif @items == 2 { @items[0] ~ $two ~ @items[1] }
    elsif @items == 1 { @items[0] }
    else              { '' }
}

That was ... really simple! But he needed this in a string format. One way to do that would be to just use straight string interpolation, but he decided to use Raku's equivalent of a heredoc, q:to. For those unfamiliar, in Raku, quotation marks are actually just a form of syntactic sugar to enter into the Q (for quoting) sublanguage. Using quotation marks, you only get a few options: ' ' means no escaping except for \\, and using " " means interpolating blocks and $-sigiled variables. If we manually enter the Q-language (using q or Q), we get a LOT more options. If you're more interested in those, you can check out Elizabeth Mattijsen's 2014 Advent Calendar post on the topic. Our little elf decided to use the q:to option to enable him to keep his code as is.

my $format = cldr{$language}.list-format{$type}{$length};
my ($start, $middle, $end, $two) = $format<start middle end two>;

my $code = q:to/FORMATCODE/;
    sub format-list(+@items) {
        if    @items  > 2 { @items[0] ~ $start ~ @items[1..*-2].join($middle) ~ $end ~ @items[*-1] }
        elsif @items == 2 { @items[0] ~ $two ~ @items[1] }
        elsif @items == 1 { @items[0] }
        else              { '' }
    }
    FORMATCODE
EVAL $code;

The only small catch is that he'd need to get a slightly different version of the text from CLDR. If the text and were placed verbatim where $two is, that block would end up being @items[0] ~ and ~ @items[1] which will cause a compile error. Luckily, Raku has a command here to help out! By using the .raku function, we get a Raku code form for most any object. For instance:

              # REPL output
'abc'.raku    # "abc"
"abc".raku    # "abc"
<a b c>.raku  # ("a", "b", "c")

So he just changed his initial assignment line to chain one more method (.raku):

my ($start, $middle, $end, $two) = $format<start middle end two>.map: *.substr(3,*-3).raku;

Now his code work. His last step was to find a way to reuse it to benefit from this initial extra work. He made a very rudimentary caching set up (rudimentary because it's not theoretically threadsafe, but even in this case, since values are only added, and will be identically produced, there's not a huge problem). This is what he came up with (declarator pod and type information removed):

sub format-list (+@items, :$language 'en', :$type = 'and', :$length = 'standard') {
    state %formatters;
    my $code = "$language/$type/$length";
    
    # Get a formatter, generating it if it's not been requested before
    my &formatter  = %cache{$code} 
                  // %cache{$code} = generate-list-formatter($language, $type, $length);
    
    formatter @items;
}

sub generate-list-formatter($language, $type, $length --> Sub ) {
    # Get CLDR information
    my $format = cldr{$language}.list-format{$type}{$length};
    my ($start, $middle, $end, $two) = $format<start middle end two>.map: *.substr(3,*-3).raku;
    
    # Generate code
    my $code = q:to/FORMATCODE/;
        sub format-list(+@items) {
            if    @items  > 2 { @items[0] ~ $start ~ @items[1..*-2].join($middle) ~ $end ~ @items[*-1] }
            elsif @items == 2 { @items[0] ~ $two ~ @items[1] }
            elsif @items == 1 { @items[0] }
            else              { '' }
        }
        FORMATCODE
        
    # compile and return
    use MONKEY-SEE-NO-EVAL;
    EVAL $code;
}

And there he was! His function was all finished. He wrapped it up into a module and send it off to the other elves for testing:

format-list <apples bananas kiwis>, :language<en>;      # apples, bananas, and kiwis
format-list <apples bananas>, :language<en>, :type<or>; # apples or bananas
format-list <manzanas plátanos>, :language<es>;         # manzanas y plátanos
format-list <انارها زردآلو تاریخ>, :language<fa>;       # انارها، زردآلو، و تاریخ

Hooray!

Tomorrow, though, another elf took up his work and decided to go even crazier! Stay tuned for more of the antics from Santa's elves.

If there's anything that Santa and his elves ought to know, it's how to make a list. After all, they're reading lists that children send in, and Santa maintains his very famous list. Another thing we know is that Santa and his elves are quite multilingual.

What you probably didn't know is that the elves stay on top of the latest and greatest technology. Being well-known avid Raku programmers, the elves were excited to hear about RakuAST and decided to see how they might be able to use it. Their first step was to see how well it could generate an list. What follows is the story of how they upgraded their current technology to use RakuAST.

Background

The current code that the elves had is quite simple. The list pattern data in CLDR is of a format like {0}, {1} (generating apples, bananas given those two fruits). It's also heavily commented, so it should be mostly straightforward what's happening.

sub format-list (
    +@list,                     #= The list of items to format (each will be stringified with .Str)
    :$language = user-language, #= The locale to use for formatting (defaults to user-language)
    :$length   = 'standard',    #= The length of the format (standard, short, or narrow);
    :$type     = 'and'          #= The type of the list (and, unit, or);
) is export {
        
    # Sub to clip pattern strings. Doesn't seem safe, but is 
    # as all list patterns begin with '{0}' and end with '{1}'
    my sub clip ($pattern) { $pattern.substr: 3, *-3 }
    
    # Obtain the patterns object from CLDR
    use Intl::CLDR;
    my $patterns = cldr{$lang}.list-patterns{$type}{$length};

    # Format the string based on its length
    if @list > 2 {
        ~ @list.head
        ~   $patterns.start.&clip                    # first two joined by the start pattern
        ~ @list[1..*-2].join($patterns.middle.&clip) # second to penultimate joined by the middle
        ~   $patterns.end.&clip                      # last two joined by the end pattern
        ~ @list.tail.&clip
    }
    elsif @list == 2 { @list.join: $patterns.two.&clip }
    elsif @list == 1 { @list.head                      }
    else             { ''                              }
}

But this method was slow. Each time a list had to be formatted, the code had to query the CLDR database, and reclip the format strings. One elf had the idea that they could actual create a formatter object, and while that might take a bit more time initially, for subsequent runs, it would be much faster.

This is what the creative elf came up with:

sub format-list (+@items, :$language 'en', :$type = 'and', :$length = 'standard') {
    state %formatters;
    my $code = "$language/$type/$length";
    
    # Get a formatter, generating it if it's not been requested before
    my &formatter  = %cache{$code} 
                  // %cache{$code} = generate-list-formatter($language, $type, $length);
    formatter @items;
}

sub generate-list-formatter($language, $type, $length) {
    use Intl::CLDR;
    my $pattern           = cldr{$lang}.list-patterns{$type}{$length};
    my $two-infix         = $pattern.two.substr(    3, *-3).raku;
    my $more-first-infix  = $pattern.start.substr(  3, *-3).raku;
    my $more-middle-infix = $pattern.middle.substr( 3, *-3).raku;
    my $more-final-infix  = $pattern.end.substr(    3, *-3).raku;
    
    my $code = qq:to/FORMATTER/;
        sub format(+@list) \{
            if @list > 2 \{
                ~ list.head
                ~   $more-first-infix
                ~ @list[1..*-2].join($more-middle-infix)
                ~   $more-final-infix
                ~ @list.tail
            }
            elsif @list == 2 \{ @list.join: $two-infix }
            elsif @list == 1 \{ @list.head             }
            else             \{ ''                     }
        }
        FORMATTER
        
    use MONKEY-SEE-NO-EVAL;
    EVAL $code;
}

While the caching technique is rudimentary and technically not thread-safe, it works (a different elf will later revisit the code to make it so). Now, when creating all the lists for, say, children in Georgia, the data for Georgian list formatters in CLDR will only need to be accessed a single time. For the next half a million or so calls, the code will be run practically as fast as if it had been hard coded (since, in effect, it has been).

The problem is how the generate-list-formatter code works. The code block uses a heredoc-style :to string, but it's interpolated.
There are numerous ways to accomplish this but all of them require having to use proper escapes. That's.... risky.

Another elf, having seen the performance improvements that this new EVAL code brought, wanted to find a way to avoid the risky string evaluation. She had heard about the new RakuAST and decided to give it a whirl. While it initially looked more daunting, she quickly realized that RakuAST was very powerful.

What is RakuAST

RakuAST is an object-based representation of Raku's abstract syntax tree, or roughly what you might get if you parsed Raku's code into its individual elements. For instance, a string literal might be represented as 'foo' in code, but once parsed, becomes a string literal. That string literal, by the way, can be created by using RakuAST::StrLiteral.new(…). Remember how the elf had to worry about how the string might be interpolated? By creating a the string literal directly via a RakuAST node, that whole process is safely bypassed. No RakuAST::StrLiteral node can be created that will result in a string injection!

Every single construct in the Raku language has an associated RakuAST node. When creating nodes, you might frequently pass in another node, which means you can build up code objects in a piece-by-piece fashion, and again, without ever worrying about string interpolation, escaping, or injection attacks.

So let's see how the elf eventually created the safer RakuAST version of the formatter method.

The elf works her AST off

To ease her transition into RakuAST, the elf decided to go from the simplest to the most complex part of the code. The simplest is the value for the final else block:

my $none = RakuAST::StrLiteral.new('');

Okay. That was easy. Now she wanted to tackle the single element value.
In the original code, that was @list.head. Although we don't normally think of it as such, . is a special infix for method calling. Operators can be used creating an RakuAST::Apply___fix node, where ___ is the type of operator. Depending on the node, there are different arguments. In the case of RakuAST::ApplyPostfix, the arguments are operand (the list), and postfix which is the actual operator. These aren't as simple as typing in some plain text, but when looking at the code the elf came up with, it's quite clear what's going on:

my $operand = RakuAST::Var::Lexical.new('@list');
my $postfix = RakuAST::Call::Method.new(
                  name => RakuAST::Name.from-identifier('head')
              );
my $one = RakuAST::ApplyPostfix.new(:$operand, :$postfix)

The operand isn't a literal, but a variable.
Specifically, it's a lexical variable, so we create a node that will reference it. The call method operator needs a name as well, so we do that as well.

This involves a lot of assignment statements. Sometimes that can be helpful, but for something this simple, the elf decided it was easier to write it as one line:

my $one = RakuAST::ApplyPostfix.new(
    operand => RakuAST::Var::Lexical.new('@list'),
    postfix => RakuAST::Call::Method.new(
        name => RakuAST::Name.from-identifier('head')
    )
);

Alright, so the first two cases are done. How might she create the result for when the list has two items? Almost exactly like the last time, except now she'll provide an argument. While you might think it would be as simple as adding args => RakuAST::StrLiteral($two-infix), it's actually a tiny bit more complicated because in Raku, argument lists are handled somewhat specially, so we actually need a RakuAST::ArgList node.
So the equivalent of @list.join($two-infix) is

my $two = RakuAST::ApplyPostfix.new(
    operand => RakuAST::Var::Lexical.new('@list'),
    postfix => RakuAST::Call::Method.new(
        name => RakuAST::Name.from-identifier('join'),
        args => RakuAST::ArgList.new(
            RakuAST::StrLiteral($two-infix)
        )
    )
);

The RakuAST::ArgList takes in a list of arguments — be they positional or named (named applied by way of a RakuAST::FatComma).

Finally, the elf decided to tackle what likely would be the most complicated bit: the code for 3 or more items. This code makes multiple method calls (including a chained one), as well as combining everything with a chained infix operator.

The method calls were fairly straightforward, but she thought about what the multiple ~ operators would be handled. As it turns out, it would actually require being set up as if (($a ~ $b) ~ $c) ~ $d, etc., and the elf didn't really like the idea of having ultimately intending her code that much. She also thought about just using join on a list that she could make, but she already knew how to do method calls, so she thought she'd try something cool: reduction operators (think [~] $a, $b, $c, $d for the previous).
This uses the RakuAST::Term::Reduce node that takes a simple list of arguments. For the * - 2 syntax, to avoid getting too crazy, she treated it as if it had been written as the functionally identical @list - 2.

Because the that reduction bit has some many elements, she ending up breaking things into pieces: the initial item, the special first infix, a merged set of the second to penultimate items joined with the common infix, the special final infix, and the final item.
For a list like [1,2,3,4,5] in English, that amounts to 1 (initial item), , (first infix), 2, 3, 4 (second to penultimate, joined with , ), , and (final infix) and 5 (final item). In other languages, the first and repeated infixes may be different, and in others, all three may be identical.

# @list.head
my $more-first-item = RakuAST::ApplyPostfix.new(
    operand => RakuAST::Var::Lexical.new('@list'),
    postfix => RakuAST::Call::Method.new(
        name => RakuAST::Name.from-identifier('head')
    )
);
                      
# @list[1, * - 2].join($more-middle-infix)
my $more-mid-items = RakuAST::ApplyPostfix.new(
    # @list[1, @list - 2
    operand => RakuAST::ApplyPostfix.new(
        operand => RakuAST::Var::Lexical.new('@list'),
        postfix => RakuAST::Postcircumfix::ArrayIndex.new(
            # (1 .. @list - 2)
            RakuAST::SemiList.new(
                RakuAST::ApplyInfix.new(
                    left  => RakuAST::IntLiteral(1),
                    infix => RakuAST::Infix.new('..'),
                    # @list - 2
                    right => RakuAST::ApplyInfix.new(
                        left  => RakuAST::Var::Lexical.new('@list'),
                        infix => RakuAST::Infix.new('-'),
                        right => RakuAST::IntLiteral.new(2)
                )
            )
        )
    ),
    # .join($more-middle-infix)
    postfix => RakuAST::Call::Method.new(
        name => RakuAST::Name.from-identifier('join'),
        args => RakuAST::ArgList.new(
            RakuAST::StrLiteral.new($more-middle-infix)
        )
    )
);
                         
# @list.tail
my $more-final-item = RakuAST::ApplyPostfix.new(
    operand => RakuAST::Var::Lexical.new('@list'),
    postfix => RakuAST::Call::Method.new(
        name => RakuAST::Name.from-identifier('tail')
    )
);


my $more = RakuAST::Term::Reduce.new(
    infix => RakuAST::Infix.new('~'),
    args  => RakuAST::ArgList.new(
        $more-first-item,
        $more-first-infix,
        $more-mid-items,
        $more-final-infix,
        $more-final-item,
    );

As one can note, as RakuAST code starts getting more complex, it can be extremely helpful to store interim pieces into variables. For complex programs, some RakuAST users will create functions that do some of the verbose stuff for them. For instance, one might get tired of the code for an infix, and write a sub like

sub rast-infix($left, $infix, $right) {
    RakuAST::ApplyInfix.new:
        left  => $left,
        infix => RakuAST::Infix.new($infix),
        right => $right
}

to enable code like rast-infix($value, '+', $value) which ends up being much less bulky. Depending on what they're doing, they might make a sub just for adding two values, or maybe making a list more compactly.

In any case, the hard working elf had now programmatically defined all of the formatter code. All that was left was for her to piece together the number logic and she'd be done. That logic was, in practice, quite simple:

if    @list  > 2 { $more }
elsif @list == 2 { $two  }
elsif @list == 1 { $one  }
else             { $none }

In practice, there was still a bit of a learning curve. Why? As it turns out, the [els]if statements are actually officially expressions, and need to be wrapped up in an expression block. That's easy enough, she could just use RakuAST::Statement::Expression. Her conditions end up being coded as

# @list > 2
my $more-than-two = RakuAST::Statement::Expression.new(
    expression => RakuAST::ApplyInfix.new(
        left  => RakuAST::Var::Lexical.new('@list'),
        infix => RakuAST::Infix.new('>'),
        right => RakuAST::IntLiteral(2)
    )
);

# @list == 2
my $exactly-two = RakuAST::Statement::Expression.new(
    expression => RakuAST::ApplyInfix.new(
        left  => RakuAST::Var::Lexical.new('@list'),
        infix => RakuAST::Infix.new('=='),
        right => RakuAST::IntLiteral(2)
    )
);

# @list == 1
my $exactly-one = RakuAST::Statement::Expression.new(
    expression => RakuAST::ApplyInfix.new(
        left  => RakuAST::Var::Lexical.new('@list'),
        infix => RakuAST::Infix.new('=='),
        right => RakuAST::IntLiteral(1)
    )
);

That was simple enough.
But now sure realized that actually, the then statements were actually not just the simple code she had made, but were actually a sort of block! She would need to wrap them with a RakuAST::Block. A block has a required RakuAST::Blockoid element, which in turn has a required RakuAST::Statement::List element, and this in turn will contain a list of statements, the simplest of which is a RakuAST::Statement::Expression that she had already seen. She decided to try out the technique of writing a helper sub to do this:

sub wrap-in-block($expression) {
    RakuAST::Block.new(
        RakuAST::Blockoid.new(
            RakuAST::StatementList.new(
                RakuAST::Statement::Expression.new: 
                    :$expression
            )
        )
    )
}

$more = wrap-in-block $more;
$two  = wrap-in-block $two;
$one  = wrap-in-block $one;
$none = wrap-in-block $none;

Phew, that was a pretty easy way to handle some otherwise very verbose coding. Who knew Raku hid away so much complex stuff in such simple syntax?! Now that she had both the if and then statements finished, she was ready to finish the full conditional:

my $if = RakuAST::Statement::If.new(
    condition => more-than-two,
    then => $more,
    elsifs => [
        RakuAST::Statement::Elsif.new(
            condition => exactly-two,
            then => $two
            ),
        RakuAST::Statement::Elsif.new(
            condition => exactly-one,
            then => $one
            )
    ],
    else => $zero
);

All that was left was for her to wrap it up into a Routine and she'd be ready to go! She decided to put it into a PointyBlock, since that's a sort of anonymous function that still takes arguments. Her fully-wrapped code block ended up as:

my $code = RakuAST::PointyBlock.new(
    signature => RakuAST::Signature.new(
        parameters => (
            RakuAST::Parameter.new(
                target => RakuAST::ParameterTarget::Var.new('@list')
            ),
        ),
    ),
    body => RakuAST::Blockoid.new(
        RakuAST::StatementList.new(
            RakuAST::Statement::Expression.new(
                expression => $if
            )
        )
    )
);

Working with RakuAST, she really got a feel for how things worked internally in Raku. It was easy to see that a runnable code block like a pointy block consisted of a signature and a body. That signature had a list of parameters, and the body a list of statements. Seems obvious, but it can be enlightening to see it spread out like she had it.

The final step was for her actually evaluate this (now much safer!) code. For that, nothing changed. In fact, the entire rest of her block was simply

sub generate-list-formatter($language, $type, $length) {
    use Intl::CLDR;
    my $pattern           = cldr{$lang}.list-patterns{$type}{$length};
    my $two-infix         = $pattern.two.substr:    3, *-3;
    my $more-first-infix  = $pattern.start.substr:  3, *-3;
    my $more-middle-infix = $pattern.middle.substr: 3, *-3;
    my $more-final-infix  = $pattern.end.substr:    3, *-3;

    ...

    use MONKEY-SEE-NO-EVAL;
    EVAL $code;
}

Was her code necessarily faster than the older method? Not necessarily. It didn't require a parse phase, which probably saved a bit, but once compiled, the speed would be the same.

So why would she bother doing all this extra work when some string manipulation could have produced the same result? A number of reasons, actually. To begin, she learned the innards of RakuAST, which helped her learn the innards of Raku a bit better. But for us non-elf programmers, RakuAST is important for many other reasons. For instance, at every stage of this process, everything was fully introspectable! If your mind jumped to writing optimizers, besides being a coding masochist, you've actually thought about something that will likely come about.

Macros is another big feature that's coming in Raku and will rely heavily on RakuAST. Rather than just do text replacement in the code like macros in many other languages, macros will run off of RakuAST nodes. This means an errant quote will never cause problems, and likely enable far more complex macro development.

The future

So what is the status of RakuAST? When can you use it?
As of today, you will need to build the rakuast branch of Rakudo to use it. Very shortly (expected within the next few weeks), that work will be merged into the main branch, and it is expected to be usable with an experimental-use feature guard. When that happens, I'll update this post with instructions. In the meantime, check out the Rakudo Weekly where lizmat keeps us updated on progress.

@MasterDuke17
Copy link

“generate an list”

@MasterDuke17
Copy link

‘:$language = user-language’ should probably quote ‘user-language’? And then it looks like a ‘$lang’ is used in the function, not ‘$language’.

@MasterDuke17
Copy link

“actual create” -> “actually create”

@MasterDuke17
Copy link

“:$language ‘en’” is missing an ‘=‘

@MasterDuke17
Copy link

“creating a the”

@MasterDuke17
Copy link

“what the <…> would be handled” should probably have “how” instead of “what”.

@MasterDuke17
Copy link

“intending her code that much”

@MasterDuke17
Copy link

“the that reduction bit has some many elements”

@MasterDuke17
Copy link

“But now sure realized”

@alabamenhu
Copy link
Author

Thanks for the coments! I'll be updating those. user-language doesn't go in quotes, but I didn't discuss the module it's from, that's been "fixed" by discussing it in the new first day post.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment