Skip to content

Instantly share code, notes, and snippets.

@uzluisf
Created January 13, 2020 16:53
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save uzluisf/6faff852ace828a9d283d9aaa944e76d to your computer and use it in GitHub Desktop.
Save uzluisf/6faff852ace828a9d283d9aaa944e76d to your computer and use it in GitHub Desktop.
Iterators and Iterables in Raku [Draft]

Iteration, Iterables, and Iterators in Raku.

What's iteration?

As general concept, iteration can be defined as the process of doing some repeated action over something in order to generate a sequence of outcomes. At a more detailed and programmatic level, iteration is the process of visiting all the elements of an iterable object using another object known as an iterator.

What's an iterable?

Simply put, an iterable is anything that can be iterated over, meaning we can visit each of its elements one by one in a sequential manner. For example, lists, arrays, hashes, etc. in Raku are iterables. In general, a Raku type can be thought as iterable if it composes the Iterable role and implements the iterator method which is supposed to return an Iterator object.

For instance, arrays are iterable data structures because the Array class inherits from the List class which in turn composes the Iterable role. If you'd like to know if a given object implements a given role, you'd need to ask if the object conforms to that role with the does method. Thus, for the Iterable role:

my @nums = 1, 2, 3;

@nums.^name;         #=> Array
@nums.does(Iterable) #=> True

In fact, we don't need to instantiate the type object (e.g., List, Array, Int, etc.) to check if it composes the role, we can ask it about it directly:

List.does(Iterable);  #=> True
Array.does(Iterable); #=> True
Int.does(Iterable);   #=> False
Str.does(Iterable);   #=> False

As you can see, unlike in other programming languages, Raku Strings cannot be iterated over (e.g., with a for loop); the Str type doesn't compose the Iterable role. However, they can be iterated over by using the comb routine which searches the input string for matches and returns a Seq of non-overlapping matches limited to a number of matches. For instance:

say "foobar".comb.raku;       # OUTPUT: "("f", "o", "o", "b", "a", "r").Seq␤"
say "foobar".comb(2).raku;    # OUTPUT: "("fo", "ob", "ar").Seq␤"
say "foobar".comb(2, 1).raku; # OUTPUT: "("fo",).Seq␤"
say "foobar".comb(/o/).raku;  # OUTPUT: "("o", "o").Seq␤"

.say  for "foobar".comb;      # OUTPUT: "f␤o␤o␤b␤a␤r␤"

What's an iterator?

An iterator is an object that enables us to iterate over an Iterable object.

In Raku, a type is called an iterator if it composes the Iterator role and implements the pull-one method which is supposed to either return the next value in the collection of values or return the sentinel value IterationEnd to indicate that all values have been exhausted and nothing else can be produced from the collection.

We know arrays are iterables, but are they also iterators? The answer is no. This is clear from the fact that the Array class doesn't compose the Iterator role and therefore neither does an object of its type:

my @nums = 1, 2, 3;
@nums.does(Iterator); #=> False

Well then, what does compose the Iterator role? We know that Array composes the Iterable role and thus implements the iterator method which is supposed to return an Iterator. Therefore, calling the iterator method on an iterable should provide us with an iterator. This method must be invoked on an object instance instead of a type object so we'll use the object instance @nums of Array from before:

my @nums = 1, 2, 3;

my $iter-nums = @nums.iterator;

$iter-nums.does(Iterator); #=> True

As we can see, calling the iterator method on an iterable object gives us back an Iterator object. What can it be used for?

An iterable object only tells us that it can be iterated over (e.g., with a for loop). In hindsight, an iterable is an object capable of returning an iterator but it doesn't specifically know how to move from one value to the next. This is what the iterator object is for; it's an object that returns a single value at a given time and then increments the internal pointer to the next value. This action of moving from one value to the next and letting us know when no more values can be consumed from the collection underlies the process of iterating over each element of the iterable object.

Flow control constructs such as the for loop are able to iterate through the data of iterable objects sequentially by calling the iterator method on the objects. Usually the users don't need to tell iterables when to behave as iterables because the method is invoked implicitly in the appropriate context

  • by statements such as a for loop which calls .iterator on the variable it precedes, and then run a block once for every item.

  • when the created object is assigned to a Positional variable such as @-sigiled variables.

From the documentation:

Users usually don't have to care about iterators, their usage is hidden behind iteration APIs such as for @list { }, map, grep, head, tail, skip and list indexing with .[$idx].

An example

my @nums = 1, 2, 3;

for @nums -> $num { say $num }
# OUTPUT: "1␤2␤3␤"

In the example above, the for loop iterates over each element of the array and the block is executed at each turn of the loop. However, could we consume the iterable on our own, without a built-in loop? The answer is yes and we just need to get the iterator from the iterable and then repeatedly call the pull-one method on the iterator ourselves. For example:

my @nums = 1, 2, 3;

my $iter-nums = @nums.iterator;

say $iter-nums.pull-one; #=> "1␤"
say $iter-nums.pull-one; #=> "2␤"
say $iter-nums.pull-one; #=> "3␤"

We've already consumed the three elements in the array. What happens if we invoke the pull-one method one more time?

my @nums = 1, 2, 3;

my $iter-nums = @nums.iterator;

say $iter-nums.pull-one;      #=> "1␤"
say $iter-nums.pull-one;      #=> "2␤"
say $iter-nums.pull-one;      #=> "3␤"
say $iter-nums.pull-one.raku; #=> IterationEnd

As you can see the sentinel value IterationEnd is returned after another pull-one invocation once the last value has been returned, thus to visit the entire iterable there will always be one more invocation of pull-one than there is data in the iterable. While it's the case that .pull-one can be invoked on many iterators once they've been exhausted, the design dictates that once an iterator has produced the IterationEnd sentinel, .pull-one should not be called anymore. Any subsequent calls may produce undefined results, or even throw an exception. On the one hand, this should make iterators easier to implement. On the other hand, it can make them more difficult to implement.

This behavior is unlike in other programming languages that raise their respective exceptions once the iterable has been exhausted. For example, Python throws a StopIteration exception whenever an iterable is consumed.

In Raku, if we'd like to test if we've consumed all the elements in an iterable, we must do an identity comparison of the sentinel value IterationEnd (using =:=) with the output of the pull-one method directly:

my @nums = 1, 2, 3;
my $iter-nums = @nums.iterator;

say $iter-nums.pull-one;
say $iter-nums.pull-one;
say $iter-nums.pull-one;

if $iter-nums.pull-one =:= IterationEnd {
    say "All the iterable's elements have been consumed."
}

# OUTPUT: "1␤2␤3␤All the iterable's elements have been consumed.␤"

Instead of comparing against pull-one's return value directly, we could bind its return value to a variable and compare it to IterationEnd instead. Note that an assignment won't work; just a binding will do. It's also worth noting that the only defined use for the sentinel value is the identity comparison and any other behavior is undefined and implementation dependent.

We already know how to check if we've consumed all the elements from an iterable so we could emulate printing out an iterable's elements with a for loop by using a while loop instead. For example:

my @nums = 1, 2, 3;
my $iter-nums = @nums.iterator;

loop {
    my $item := $iter-nums.pull-one;
    last if $item =:= IterationEnd;
    say $item;
}

We loop indefinitely, printing out values along the way, until we reach IterationEnd in which case we exit the loop with last. By this point, we've consumed the iterable's elements and if we wish to do the same once more, we'd need to get a new iterator. The takeaway is that Raku iterators can’t be reset, meaning they return IterationEnd once they’re exhausted. As mentioned above, any subsequent calls to an exhausted iterator isn't part of the contract and thus it may produce undefined results.

Iterables and Iterators

Regarding the relationship between iterables and iterators, the Raku documentation states:

While Iterable provides the high-level interface loops will be working with, Iterator provides the lower-level functions that will be called in every iteration of the loop.

The summary for this section is that iterating as a protocol and as the action of visiting all items in a complex object or data structure is composed of two parts: 1) an iterable (the object being iterated over) and 2) the iterator (the pointer-like object that moves over the iterable).

In the sections below, we delve more deeply into the Iterable and Iterator roles, how to compose them into user-defined classes, and how to implement the two main methods that are of interest to us, namely the iterator and pull-one methods, and that the composed classes must implement.

The Iterable role

The primary purpose of the Iterable role is to promise that an iterator method is available in the object or data structure that composes it. As mentioned previously, this method is used:

  • by statements such as a for loop which calls .iterator on the variable it precedes, and then run a block once for every item.

  • when the creator object is assigned to a Positional variable such as @-sigiled variables.

A simple example of composing the Iterable role into a class and implementing the iterator method looks as follows:

subset Strand of Str where { .match(/^^ <[ACGT]>+ $$/) && .chars %% 3 }

class DNA does Iterable {
    has Strand $.chain;
 
    method iterator {
        $!chain.comb(3).iterator;
    }

};

my $dna-chain := DNA.new(chain => 'ACGTACGTT');

for $dna-chain -> $codons {
    say $codons.join('');
}

# OUTPUT: "ACG␤TAC␤GTT␤"

Here we have the DNA class that mixes the Iterable role and offer a new way of iterating over what is essentially a string constrained to the four DNA letters, and whose length is a multiple of 3. In the looping part, the for actually hooks to the iterator method and then the letters are printed in groups of 3.

In this example, we bind the DNA instance object to the variable $dna-chain. We could've assigned the newly created object to a Positional variable in which case the iterator would be called on the iterable object and its consumed values would be assigned to the variable. For instance:

my @codons = DNA.new(chain => 'ACGTACGTT');

for @codons -> $codon {
    say $codon.join('');
}

# OUTPUT: "ACG␤TAC␤GTT␤"

Note that @codons isn't a DNA object; it's just a plain array that contains the consumed values from the iterable object DNA.new(chain => 'ACGTACGTT').

What would happen if we assign a DNA object directly to a scalar variable? Could we still loop over that variable with the built-in for loop?

my $chain = DNA.new(chain => 'ACGTACGTT');

.say for $chain;
# OUTPUT: "<anon|34>.new␤"

Well, that's not what we expected. What if we invoke the iterator method on the scalar variable?

my $chain = DNA.new(chain => 'ACGTACGTT');

.say for $chain.iterator;
# OUTPUT: "<anon|34>.new␤"

We get the same output. Previously we mentioned that the iterator method is invoked by statements such as for loop and other iterating constructs such as assigning to a Positional variable. So the question is when does for call the iterator method? From raiph on StackOverflow:

As I understand, if the (single) argument to an iterating feature is a Scalar container, then it uses the value in the container and does not call .iterator. Otherwise, it calls .iterator on it, evaluating it first if it's an expression or routine call.

Hence in our example with my $chain = DNA.new(chain => 'ACGTACGTT');, we rightfully creates a Scalar container. When we try to iterate over it with for, the scalar returns its only item, the object itself, and no iterator method is called on it. Why does binding instead of assignment work?

my $binded-chain := DNA.new(chain => 'ACGTACGTT');
my $assign-chain  = DNA.new(chain => 'ACGTACGTT');

say $binded-chain.VAR.WHAT; (DNA)
say $assign-chain.VAR.WHAT; (Scalar)

By binding, we bind the value directly to the variable and no container is used to interface between the value and the variable. This isn't what happen with assignment; instead of directly binding the value to the variable, a container (a Scalar object) is created for you and that is bound to the lexpad entry of the variable. The takeaway is that you must be mindful to what you assign your iterables to. More about containers and containers in Perl 6.

Nonetheless, we could still manually iterate over the DNA object stored in the scalar variable. It's just a matter of getting our hands on an iterator by calling .iterator on the object and then accessing each value by pulling one with the pull-one method. Thus:

my $chain = DNA.new(chain => 'ACGTACGTT');

my $iter = $chain.iterator;

loop {
    my $codon := $iter.pull-one;   # must bind for identity comparison
    last if $codon =:= IterationEnd;
    say $codon.join('');
}

# OUTPUT: "ACG␤TAC␤GTT␤"

Certainly this isn't the most Raku-y code but it helps us to see that, even after being containerized, the iterable's elements are still accessible, albeit not through the elegance of a for loop.

Lastly we know that the iterator method is invoked when assigning to a Positional variable so we could postfix it with [] and let the for loop know that the scalar value is indeed iterable:

my $chain = DNA.new(chain => 'ACGTACGTT');

for $chain[] -> $codon {
    say $codon.join('');
}
# OUTPUT: "ACG␤TAC␤GTT␤"

The Iterator role

As mentioned above, the Iterator role provides the lower-level subroutines that will be called in every iteration of the loop. One of these is the pull-one method, which either returns the next element from the collection, or the sentinel value IterationEnd if no more elements are available (i.e., when the collection is exhausted/consumed). In addition to this method, this role provides a few other methods to allow for a finer operation of iteration in several contexts such as adding or eliminating items, consuming them all at once, skipping over them to access other items, etc. Except for .pull-one which must be implemented in the composed class, the Iterator role provide default implementations, in terms of pull-one, for these methods.

A simple example of composing the Iterator role into a class and implementing the pull-one method looks as follows:

# Countdown from a positive integer N to 0.
class CountDown does Iterator {
    has $.current;
    
    method new( $current where { $_ >= 0 } ) {
        self.bless(:$current)
    }
    
    method pull-one( --> Mu ) {
        my $result = $!current--;
        
        if $result == -1 {
            # return the sentinel value after the last value is consumed.
            return IterationEnd
        }
        elsif $result <= -2 {
            # calling .pull-one again after it returns IterationEnd
            # is undefined so we choose to throw an exception. 
            die "Countdown already consumed."  
        }
        else {
            return $result;
        }
    }
}
 
sub loop-over( Iterable:D $sequence ) {
    my $iterator = $sequence.iterator;
 
    loop {
        my $item := $iterator.pull-one;
        last if $item =:= IterationEnd;
        say $item;
    }
    
}
 
loop-over Seq.new(CountDown.new(5));
# OUTPUT: "5␤4␤3␤2␤1␤0␤"

The CountDown class composes the Iterator role and implements a simple pull-one method that will iterate over a collection of positive descending integers until it reaches zero. The call to pull-one after all the collection's elements are consumed returns the sentinel value IterationEnd. Normally any calls afterwards is undefined but here we choose to throw an exception. A for loop won't be able to iterate over CountDown object since it doesn't implement the Iterable role. Therefore, we use the helper loop-over subroutine to do so. As you might've noticed, the subroutine's signature requires an Iterable so whenever we pass a CountDown object to loop-over we create a Seq from the object in order to make it iterable as shown in the example.

Even when we compose the Iterator role and implement the pull-one method for finer control on how an object's values must be looped over, we still might need compose the Iterable role and implement the iterator method because this is what advertises to constructs such as the for loop that the object in question is iterable. Thus, we can get rid of the helper subroutine and do away with creating an iterable with Seq by simply composing the Iterable role into CountDown as follows (note that we're using simple logic inside the pull-one method so we moved from regular if/elsif/else conditions to simpler statement modifier ifs):

# Countdown from a positive integer N to 0.
class CountDown does Iterable does Iterator {
    has $.current;
    
    method new( $current where { $_ >= 0 } ) {
        self.bless(:$current)
    }
    
    method iterator() { self }
    
    method pull-one( --> Mu ) {
        my $result = $!current--;
        return IterationEnd               if $result == -1;
        die "Countdown already consumed." if $result <= -2;
        return $result                    if $result >= 0; # this `if` isn't necessary, only
                                                           # used to make things look balanced
    }
}

for CountDown.new(5) -> $count {
    say $count
}
# OUTPUT: "5␤4␤3␤2␤1␤0␤"

my @counts = CountDown.new(5);
say @counts.join(', ');
# OUTPUT: "5, 4, 3, 2, 1, 0␤"

There's not doubt this looks a lot simpler. As you might've noticed, we implement the iterator method from the Iterable role by simply returning self, the object itself. The reason for this is that an iterating construct can call the pull-one method on it to access every element in turn. This is possible because the object will be both Iterable and an Iterator.

The relationship between an iterable and an iterator is a close one, and the following bears some repetition:

An iterable describes a collection of items and advertises that these items can be iterated over, i.e., they can be visited one at a time until all of them are exhausted. An iterator is an entity which does what an iterable advertises can be done on it. More specifically, an iterator tracks which item to get at a given time, moves from the current to the next until the last one, and then stops.

More examples

Infinite iterator

An infinite iterator is an iterator capable of producing an infinite number of iterations. For example:

class CountUp does Iterable does Iterator {
    has $!current = 0;
    
    method iterator { self }
    
    method pull-one( --> Mu ) {
        my $val = $!current;
        $!current += 1;
        return $val;
    }
}

The pull-one method doesn't have a condition that must be met in order to return IterationEnd, thus the iterator continues forever. Using a CountUp object with a for loop isn't an option; it won't ever stop, at least not before it runs out of resources. However, we can still consume values from the iterable object by pulling them out manually with .pull-one:

my $x = CountUp.new;
my $it = $x.iterator;

say $it.pull-one; #=> 0
say $it.pull-one; #=> 1
say $it.pull-one; #=> 2
...
say $it.pull-one; #=> 99
say $it.pull-one; #=> 100
...

This could continue forever and it shows that with an iterator that iterates over an infinite set, we can have an infinite number of items without having to store all of them in memory; we just need to produce them as we need them. However, we don't need to bother implementing this ourselves since Raku already offers elegant ways of producing iterable, lazy sequence of values with the Seq class.

Conclusion

@lizmat
Copy link

lizmat commented Jan 14, 2020

s/This isn't what happen/This isn't what happens/

Re: However, we don't need to bother implementing this ourselves since Raku already offers elegant ways of producing iterable, lazy sequence of values with the Seq class.

I'm not sure that makes sense in this context. A lazy Iterator is just an instantiated Iterator that returns True with its is-lazy method. As long as Raku knows an Iterator is lazy, it won't try to be eager and hang because it tries to produce all possible values. So adding a:

method is-lazy(--> True) { }

is enough to make an Iterator appear lazy, and thus prevent hanging.

@7stud
Copy link

7stud commented Feb 28, 2024

s/In general, a Raku type can be thought as iterable/
  In general, a Raku type can be thought of as iterable/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment