uzluisf/iters.md

## iters.md

      
    Raw
  

              iters.md
            
          
    Iteration, Iterables, and Iterators in Raku.

What's iteration?

As general concept, iteration can be defined as the process
of doing some repeated action over something in order to generate a sequence
of outcomes. At a more detailed and programmatic level, iteration is
the process of visiting all the elements of an iterable object using
another object known as an iterator.
What's an iterable?

Simply put, an iterable is anything that can be iterated over, meaning we can
visit each of its elements one by one in a sequential manner. For example,
lists, arrays, hashes, etc. in Raku are iterables. In general, a Raku type can
be thought as iterable if it composes the
Iterable role and implements the
iterator method
which is supposed to return an Iterator
object.
For instance, arrays are iterable data structures because the
Array class inherits from the
List class which in turn composes the
Iterable role. If you'd like to know if a given object implements a given
role, you'd need to ask if the object conforms to that role with the does
method. Thus, for the Iterable role:
my @nums = 1, 2, 3;

@nums.^name;         #=> Array
@nums.does(Iterable) #=> True

In fact, we don't need to instantiate the type object (e.g., List, Array,
Int, etc.) to check if it composes the role, we can ask it about it directly:
List.does(Iterable);  #=> True
Array.does(Iterable); #=> True
Int.does(Iterable);   #=> False
Str.does(Iterable);   #=> False

As you can see, unlike in other programming languages, Raku Strings
cannot be iterated over (e.g., with a for loop); the Str type doesn't
compose the Iterable role. However, they can be iterated over by using
the comb routine which searches the
input string for matches and returns a Seq of
non-overlapping matches limited to a number of matches. For instance:
say "foobar".comb.raku;       # OUTPUT: "("f", "o", "o", "b", "a", "r").Seq␤"
say "foobar".comb(2).raku;    # OUTPUT: "("fo", "ob", "ar").Seq␤"
say "foobar".comb(2, 1).raku; # OUTPUT: "("fo",).Seq␤"
say "foobar".comb(/o/).raku;  # OUTPUT: "("o", "o").Seq␤"

.say  for "foobar".comb;      # OUTPUT: "f␤o␤o␤b␤a␤r␤"

What's an iterator?

An iterator is an object that enables us to iterate over an Iterable object.
In Raku, a type is called an iterator if it composes the
Iterator role and implements
the pull-one method
which is supposed to either return the next value in the collection of values
or return the sentinel value IterationEnd to indicate that all values have
been exhausted and nothing else can be produced from the collection.
We know arrays are iterables, but are they also iterators? The answer is no.
This is clear from the fact that the Array class doesn't compose the
Iterator role and therefore neither does an object of its type:
my @nums = 1, 2, 3;
@nums.does(Iterator); #=> False

Well then, what does compose the Iterator role? We know that Array
composes the Iterable role and thus implements the iterator method which
is supposed to return an Iterator. Therefore, calling the iterator method on
an iterable should provide us with an iterator. This method must be invoked on
an object instance instead of a type object so we'll use the
object instance @nums of Array from before:
my @nums = 1, 2, 3;

my $iter-nums = @nums.iterator;

$iter-nums.does(Iterator); #=> True

As we can see, calling the iterator method on an iterable object gives us
back an Iterator object. What can it be used for?
An iterable object only tells us that it can be iterated over
(e.g., with a for loop). In hindsight, an iterable is an object capable of
returning an iterator but it doesn't specifically know how to move from one
value to the next. This is what the iterator object is for; it's an
object that returns a single value at a given time and then increments the
internal pointer to the next value. This action of moving from one value to
the next and letting us know when no more values can be consumed from the
collection underlies the process of iterating over each element of the
iterable object.
Flow control constructs such as the for loop are able to iterate
through the data of iterable objects sequentially by calling the iterator
method on the objects. Usually the users don't need to tell iterables when
to behave as iterables because the method is invoked implicitly in the
appropriate context


by statements such as a for loop which calls .iterator on the variable it
precedes, and then run a block once for every item.


when the created object is assigned to a Positional variable such as
@-sigiled variables.


From the documentation:

Users usually don't have to care about iterators,
their usage is hidden behind iteration APIs such as for @list { }, map,
grep, head, tail, skip and list indexing with .[$idx].

An example

my @nums = 1, 2, 3;

for @nums -> $num { say $num }
# OUTPUT: "1␤2␤3␤"
In the example above, the for loop iterates over each element of the array
and the block is executed at each turn of the loop. However, could we consume
the iterable on our own, without a built-in loop? The answer is yes and
we just need to get the iterator from the iterable and then repeatedly call
the pull-one method on the iterator ourselves. For example:
my @nums = 1, 2, 3;

my $iter-nums = @nums.iterator;

say $iter-nums.pull-one; #=> "1␤"
say $iter-nums.pull-one; #=> "2␤"
say $iter-nums.pull-one; #=> "3␤"
We've already consumed the three elements in the array. What happens if we
invoke the pull-one method one more time?
my @nums = 1, 2, 3;

my $iter-nums = @nums.iterator;

say $iter-nums.pull-one;      #=> "1␤"
say $iter-nums.pull-one;      #=> "2␤"
say $iter-nums.pull-one;      #=> "3␤"
say $iter-nums.pull-one.raku; #=> IterationEnd

As you can see the sentinel value IterationEnd is returned after another
pull-one invocation once the last value has been returned, thus to visit
the entire iterable there will always be one more invocation of pull-one
than there is data in the iterable. While it's the case that .pull-one
can be invoked on many iterators once they've been exhausted, the design
dictates that once an iterator has produced the IterationEnd sentinel,
.pull-one should not be called anymore. Any subsequent calls may produce
undefined results, or even throw an exception. On the one hand, this should
make iterators easier to implement. On the other hand, it can make them
more difficult to implement.
This behavior is unlike in other programming languages that raise their
respective exceptions once the iterable has been exhausted. For example,
Python throws a
StopIteration exception
whenever an iterable is consumed.
In Raku, if we'd like to test if we've consumed all the elements in an iterable,
we must do an identity comparison of the sentinel value IterationEnd
(using =:=) with the output of
the pull-one method directly:
my @nums = 1, 2, 3;
my $iter-nums = @nums.iterator;

say $iter-nums.pull-one;
say $iter-nums.pull-one;
say $iter-nums.pull-one;

if $iter-nums.pull-one =:= IterationEnd {
    say "All the iterable's elements have been consumed."
}

# OUTPUT: "1␤2␤3␤All the iterable's elements have been consumed.␤"

Instead of comparing against pull-one's return value directly, we could
bind its return value to a variable and compare it to IterationEnd instead.
Note that an assignment won't work; just a binding will do. It's also worth
noting that the only defined use for the sentinel value is the identity
comparison and any other behavior is undefined and implementation dependent.
We already know how to check if we've consumed all the elements from an iterable
so we could emulate printing out an iterable's elements with a for loop by
using a while loop instead. For example:
my @nums = 1, 2, 3;
my $iter-nums = @nums.iterator;

loop {
    my $item := $iter-nums.pull-one;
    last if $item =:= IterationEnd;
    say $item;
}

We loop indefinitely, printing out values along the way, until we
reach IterationEnd in which case we exit the loop with last. By this point,
we've consumed the iterable's elements and if we wish to do the same once more,
we'd need to get a new iterator. The takeaway is that Raku iterators
can’t be reset, meaning they return IterationEnd once they’re exhausted.
As mentioned above, any subsequent calls to an exhausted iterator isn't part
of the contract and thus it may produce undefined results.
Iterables and Iterators

Regarding the relationship between iterables and iterators, the Raku
documentation states:

While Iterable provides the high-level interface loops will be working with,
Iterator provides the lower-level functions that will be called in every
iteration of the loop.

The summary for this section is that iterating as a protocol and as the
action of visiting all items in a complex object or data structure is
composed of two parts:  1) an iterable (the object being iterated over)
and 2) the iterator (the pointer-like object that moves over the iterable).
In the sections below, we delve more deeply into the Iterable and Iterator
roles, how to compose them into user-defined classes, and how to implement
the two main methods that are of interest to us, namely the iterator
and pull-one methods, and that the composed classes must implement.
The  Iterable role

The primary purpose of the Iterable role is to promise that an iterator
method is available in the object or data structure that composes it. As mentioned
previously, this method is used:


by statements such as a for loop which calls .iterator on the variable it
precedes, and then run a block once for every item.


when the creator object is assigned to a Positional variable such as
@-sigiled variables.


A simple example of composing the Iterable role into a class and implementing
the iterator method looks as follows:
subset Strand of Str where { .match(/^^ <[ACGT]>+ $$/) && .chars %% 3 }

class DNA does Iterable {
    has Strand $.chain;
 
    method iterator {
        $!chain.comb(3).iterator;
    }

};

my $dna-chain := DNA.new(chain => 'ACGTACGTT');

for $dna-chain -> $codons {
    say $codons.join('');
}

# OUTPUT: "ACG␤TAC␤GTT␤"

Here we have the DNA class that mixes the Iterable role and offer a new
way of iterating over what is essentially a string constrained to the four
DNA letters, and whose length is a multiple of 3. In the looping part, the
for actually hooks to the iterator method and then the letters are
printed in groups of 3.
In this example, we bind the DNA instance object to the variable
$dna-chain. We could've assigned the newly created object to a Positional
variable in which case the iterator would be called on the iterable object
and its consumed values would be assigned to the variable. For instance:
my @codons = DNA.new(chain => 'ACGTACGTT');

for @codons -> $codon {
    say $codon.join('');
}

# OUTPUT: "ACG␤TAC␤GTT␤"

Note that @codons isn't a DNA object; it's just a plain array that contains
the consumed values from the iterable object DNA.new(chain => 'ACGTACGTT').
What would happen if we assign a DNA object directly to a scalar variable?
Could we still loop over that variable with the built-in for loop?
my $chain = DNA.new(chain => 'ACGTACGTT');

.say for $chain;
# OUTPUT: "<anon|34>.new␤"

Well, that's not what we expected. What if we invoke the iterator method
on the scalar variable?
my $chain = DNA.new(chain => 'ACGTACGTT');

.say for $chain.iterator;
# OUTPUT: "<anon|34>.new␤"

We get the same output. Previously we mentioned that the iterator method
is invoked by statements such as for loop and other iterating constructs
such as assigning to a Positional variable. So the question is
when does for call the iterator method? From
raiph on StackOverflow:

As I understand, if the (single)
argument to an iterating feature is a Scalar container, then it uses the value
in the container and does not call .iterator. Otherwise, it calls
.iterator on it, evaluating it first if it's an expression or routine call.

Hence in our example with my $chain = DNA.new(chain => 'ACGTACGTT');,
we rightfully creates a Scalar container. When we try to iterate over it
with for, the scalar returns its only item, the object itself, and
no iterator method is called on it. Why does binding instead of assignment
work?
my $binded-chain := DNA.new(chain => 'ACGTACGTT');
my $assign-chain  = DNA.new(chain => 'ACGTACGTT');

say $binded-chain.VAR.WHAT; (DNA)
say $assign-chain.VAR.WHAT; (Scalar)

By binding, we bind the value directly to the variable and no container is
used to interface between the value and the variable. This isn't what happen
with assignment; instead of directly binding the value to the variable,
a container (a Scalar object) is created for you and that is bound to
the lexpad entry of the variable. The takeaway is that you must be
mindful to what you assign your iterables to. More about
containers and
containers in Perl 6.
Nonetheless, we could still manually iterate over the DNA object stored
in the scalar variable. It's just a matter of getting our hands on an
iterator by calling .iterator on the object and then accessing each value
by pulling one with the pull-one method. Thus:
my $chain = DNA.new(chain => 'ACGTACGTT');

my $iter = $chain.iterator;

loop {
    my $codon := $iter.pull-one;   # must bind for identity comparison
    last if $codon =:= IterationEnd;
    say $codon.join('');
}

# OUTPUT: "ACG␤TAC␤GTT␤"

Certainly this isn't the most Raku-y code but it helps us to see that, even after
being containerized, the iterable's elements are still accessible, albeit
not through the elegance of a for loop.
Lastly we know that the iterator method is invoked when assigning to
a Positional variable so we could postfix it with [] and let the
for loop know that the scalar value is indeed iterable:
my $chain = DNA.new(chain => 'ACGTACGTT');

for $chain[] -> $codon {
    say $codon.join('');
}
# OUTPUT: "ACG␤TAC␤GTT␤"

The Iterator role

As mentioned above, the Iterator role provides the lower-level subroutines
that will be called in every iteration of the loop. One of these is the
pull-one method, which either returns the next element from the collection,
or the sentinel value IterationEnd if no more elements are available
(i.e., when the collection is exhausted/consumed). In addition to this method,
this role provides a few other methods to allow for a finer operation
of iteration in several contexts such as adding or eliminating items,
consuming them all at once, skipping over them to access other items, etc.
Except for .pull-one which must be implemented in the composed class,
the Iterator role provide default implementations, in terms of pull-one,
for these methods.
A simple example of composing the Iterator role into a class and implementing
the pull-one method looks as follows:
# Countdown from a positive integer N to 0.
class CountDown does Iterator {
    has $.current;
    
    method new( $current where { $_ >= 0 } ) {
        self.bless(:$current)
    }
    
    method pull-one( --> Mu ) {
        my $result = $!current--;
        
        if $result == -1 {
            # return the sentinel value after the last value is consumed.
            return IterationEnd
        }
        elsif $result <= -2 {
            # calling .pull-one again after it returns IterationEnd
            # is undefined so we choose to throw an exception. 
            die "Countdown already consumed."  
        }
        else {
            return $result;
        }
    }
}
 
sub loop-over( Iterable:D $sequence ) {
    my $iterator = $sequence.iterator;
 
    loop {
        my $item := $iterator.pull-one;
        last if $item =:= IterationEnd;
        say $item;
    }
    
}
 
loop-over Seq.new(CountDown.new(5));
# OUTPUT: "5␤4␤3␤2␤1␤0␤"

The CountDown class composes the Iterator role and implements a simple
pull-one method that will iterate over a collection of positive descending
integers until it reaches zero. The call to pull-one after all the collection's
elements are consumed returns the sentinel value IterationEnd. Normally any calls
afterwards is undefined but here we choose to throw an exception. A for loop won't
be able to iterate over CountDown object since it doesn't implement the
Iterable role. Therefore, we use the helper loop-over subroutine to
do so. As you might've noticed, the subroutine's signature requires an
Iterable so whenever we pass a CountDown object to loop-over
we create a Seq from the object in order to make it iterable as shown
in the example.
Even when we compose the Iterator role and implement
the pull-one method for finer control on how an object's values must
be looped over, we still might need compose the Iterable role and
implement the iterator method  because this is what advertises to constructs
such as the for loop that the object in question is iterable. Thus, we
can get rid of the helper subroutine and do away with creating an iterable
with Seq by simply composing the Iterable role into CountDown as follows
(note that we're using simple logic inside the pull-one method so we moved
from regular if/elsif/else conditions to simpler statement modifier ifs):
# Countdown from a positive integer N to 0.
class CountDown does Iterable does Iterator {
    has $.current;
    
    method new( $current where { $_ >= 0 } ) {
        self.bless(:$current)
    }
    
    method iterator() { self }
    
    method pull-one( --> Mu ) {
        my $result = $!current--;
        return IterationEnd               if $result == -1;
        die "Countdown already consumed." if $result <= -2;
        return $result                    if $result >= 0; # this `if` isn't necessary, only
                                                           # used to make things look balanced
    }
}

for CountDown.new(5) -> $count {
    say $count
}
# OUTPUT: "5␤4␤3␤2␤1␤0␤"

my @counts = CountDown.new(5);
say @counts.join(', ');
# OUTPUT: "5, 4, 3, 2, 1, 0␤"

There's not doubt this looks a lot simpler. As you might've noticed, we implement
the iterator method from the Iterable role by simply returning self,
the object itself. The reason for this is that an iterating construct can
call the pull-one method on it to access every element in turn. This is
possible because the object will be both Iterable and an Iterator.
The relationship between an iterable and an iterator is a close one, and
the following bears some repetition:

An iterable describes a collection of items and advertises that these items
can be iterated over, i.e., they can be visited one at a time until all
of them are exhausted. An iterator is an entity which does what an iterable
advertises can be done on it. More specifically, an iterator tracks which item
to get at a given time, moves from the current to the next until the last one,
and then stops.

More examples

Infinite iterator

An infinite iterator is an iterator capable of producing an infinite number
of iterations. For example:
class CountUp does Iterable does Iterator {
    has $!current = 0;
    
    method iterator { self }
    
    method pull-one( --> Mu ) {
        my $val = $!current;
        $!current += 1;
        return $val;
    }
}

The pull-one method doesn't have a condition that must be met in order
to return IterationEnd, thus the iterator continues forever.
Using a CountUp object with a for loop isn't an option; it won't ever
stop, at least not before it runs out of resources. However, we can still
consume values from the iterable object by pulling them out manually
with .pull-one:
my $x = CountUp.new;
my $it = $x.iterator;

say $it.pull-one; #=> 0
say $it.pull-one; #=> 1
say $it.pull-one; #=> 2
...
say $it.pull-one; #=> 99
say $it.pull-one; #=> 100
...

This could continue forever and it shows that with an iterator that iterates
over an infinite set, we can have an infinite number of items without having
to store all of them in memory; we just need to produce them as we
need them. However, we don't need to bother implementing this ourselves since
Raku already offers elegant ways of producing iterable, lazy sequence of values
with the Seq class.
Conclusion