Skip to content

Instantly share code, notes, and snippets.

@uzluisf
Created December 13, 2020 14:55
Show Gist options
  • Save uzluisf/6a045c5031c9f8d6ef7cb53f15366f36 to your computer and use it in GitHub Desktop.
Save uzluisf/6a045c5031c9f8d6ef7cb53f15366f36 to your computer and use it in GitHub Desktop.
=begin pod :kind("Language") :subkind("Language") :category("tutorial")
=TITLE Iterating
=SUBTITLE Functionalities available for visiting all items in a complex data structure
Similar to other many programming languages, Raku makes a fine distinction among
the terms I<iteration>, I<iterable>, and I<iterator>. Familiarizing yourself
with them is key for understanding the Raku constructs that implement these
concepts:
=item An B<iteration> is a mechanism by which a block of code is executed repeatedly.
=item An B<iterable> is an object that produce an iterator.
=item An B<iterator> is an object that performs an iteration. In other words, an
iterator knows how to compute the next value during an iteration.
Thus, an B<iterator> contains the state that moves as it works through the
available values. On the contrary, an B<iterable> is the originating container
that has no need to contain the state corresponding to the process of iteration.
Raku provides support for both iterables and iterators via the
L<C<Iterable>|/type/Iterable> and L<C<Iterator>|/type/Iterator> roles.
=head1 The C<Iterable> role
The L<C<Iterable>|/type/Iterable> role is relatively simple and it requires that
the composed class provides an C<iterator> method, which must return
an L<C<Iterator>|/type/Iterator> object. The L<C<iterator>|/type/Iterable#method_iterator>
method is the construct actually used by statements such as C<for>, which
invokes C<iterator> on the variable it precedes, and then run a block once for
every item. Similarly other statements, such as array assignment, will make the
C<Iterable>-composed class behave in the same way.
Any object that can return an C<Iterator> is an iterable. Most container types
in Raku conform to the C<Iterable> API. To check if a type object or instance
object composes the C<Iterable> role, and thus implement the
C<iterator> method, you can use the L<C<does>|/type/Mu#routine_does> method.
=begin code
say Seq.does(Iterable) # OUTPUT: «True␤»
say List.does(Iterable) # OUTPUT: «True␤»
say Map.does(Iterable) # OUTPUT: «True␤»
say Str.does(Iterable) # OUTPUT: «False␤»
say Int.does(Iterable) # OUTPUT: «False␤»
=end code
The following example illustrates the composition of the C<Iterable> role
into a user-defined C<DNA> class, and the required implementation of the
C<iterator> method:
=begin code
class DNA does Iterable {
has $.chain;
method new(Str $chain where { .match(/^^ <[ACGT]>+ $$/) && .chars %% 3 }) {
self.bless(:$chain);
}
method iterator(DNA:D:) {
$.chain.split('', :skip-empty).rotor(3).iterator
}
}
=end code
The C<$.chain.split('', :skip-empty).rotor(3).iterator> statement in the
C<iterator> method simply splits the DNA strand by the empty string, and group
the resulting values into lists of trinucleotides. The final result is a
L<C<Seq>|/type/Seq>, which itself implements the C<Iterable> role and has an
C<iterator> method that we end up invoking on it.
Instance objects of the C<DNA> class can be iterated over with C<for>, C<map>,
C<grep>, and array assignment, in the same manner as objects of iterable types,
such as L<C<Seq>|/type/Seq>, L<C<List>|/type/List> and L<C<Array>|/type/Array>,
can.
=begin code :skip-test<compile time error; undeclared name>
my \dna = DNA.new('ACGTACGTTACT');
.join.say for dna;
# OUTPUT: «ACG␤TAC␤GTT␤ACT»
my @longer-chain = DNA.new('ACGTACGTTACT');
say @longer-chain.raku;
# OUTPUT: «[("A", "C", "G"), ("T", "A", "C"), ("G", "T", "T"), ("A", "C", "T")]␤»
=end code
=head1 The C<Iterator> role
The L<C<Iterator>|/type/Iterator> role is a bit more complex than C<Iterable>.
An C<Iterator> object must have at least a L<C<pull-one>|/type/Iterator#method_pull-one>
method that returns either a sentinel value
L<C<IterationEnd>|/type/Iterator#index-entry-IterationEnd> or the value
extracted from the collection during the iteration.
In addition to C<pull-one>, the C<Iterator> role provides a
L<series of methods|/type/Iterator#Methods>, which allows for a finer operation
of iteration in several contexts: adding or eliminating items, or skipping over
them to access other items. However, the role provides a default implementation
for all the other methods, so the only one that has to be defined is precisely
C<pull-one>, of which only a stub is provided by the role.
While C<Iterable> provides the high-level interface loops will be working with,
C<Iterator> provides the lower-level functions that will be called in every
iteration of the loop.
Let's extend the C<DNA> class with this role. The demands for our iterable
C<DNA> class are simple enough, however to motivate the discussion of the C<Iterator>
role, we'll step it up a notch. When iterating over a C<DNA> instance object, we
want the reverse of any codon whose index is divisible by 2, and a rotated version
of any codon whose index is divisible by 3. Whenever the codon's index is both
divisible by 2 and 3, the case for divisible by 2 takes precedence. Otherwise,
return the codon as-is.
Here we need lower-level functions that will be called in every iteration
of the loop and will make those decisions for us. Thus, we need to provide
our own user-defined iterator:
=begin code
class DNA does Iterable {
has $.chain;
method new(Str $chain where { .match(/^^ <[ACGT]>+ $$/) && .chars %% 3 }) {
self.bless(:$chain);
}
my class DNAIterator does Iterator {
has $.dna is required;
has $!index = 0;
method pull-one(--> Mu){
return IterationEnd unless 3 * $!index < $!dna.chain.chars;
my $codon = do given $!dna.chain.comb.rotor(3)[$!index] {
when $!index %% 2 { $_.reverse }
when $!index %% 3 { $_.tail, |$_.head(2) }
default { $_ }
}
$!index += 1;
return $codon.list;
}
}
method iterator(DNA:D:) {
DNAIterator.new: dna => self
}
}
=end code
In the updated example above, we create our own C<DNAIterator> by composing the
C<Iterator> role into it and implementing the C<pull-one> method. We've
L<C<my>|/language/variables#The_my_declarator>-declared C<DNAIterator> within
C<DNA>, however the C<DNA> class is almost unchanged, aside from the fact we replaced
the body of C<iterator> with an instance object of the new iterator initialized
to the C<DNA> class (hence C«dna => self»).
Running an example:
=begin code :skip-test<compile time error; undeclared name>
my @dna = DNA.new('ACGTACGTTACTTGCACCGTT');
for @dna.kv -> \k, \codon {
printf("%-5s\t%s\n", k, codon.join)
}
# OUTPUT:
# 0 GCA
# 1 TAC
# 2 TTG
# 3 TAC
# 4 CGT
# 5 ACC
# 6 TTG
=end code
=head1 Composing both the C<Iterable> and C<Iterator> roles
Both the C<Iterable> and C<Iterator> roles can be composed into a single
class, and their required methods implemented:
=begin code
class DNA does Iterable does Iterator {
has $.chain;
has $!index = 0;
method new(Str $chain where { .match(/^^ <[ACGT]>+ $$/) && .chars %% 3 }) {
self.bless(:$chain);
}
method iterator(DNA:D:) { self }
method pull-one(--> Mu) {
return IterationEnd unless 3 * $!index < $!chain.chars;
my $codon = do given $!chain.comb.rotor(3)[$!index] {
when $!index %% 2 { $_.reverse }
when $!index %% 3 { $_.tail, |$_.head(2) }
default { $_ }
}
$!index += 1;
return $codon.list;
}
}
=end code
There are two main differences between this implementation and the
implementation from the previous section:
=item The C<iterator> method returns a single instance of the class per instance
object. In the previous section, the C<iterator> method returns a brand new
C<DNAIterator> object on demand per instance object.
=item The class itself is an iterator. Unlike iterables that are I<stateless>,
iterators are I<stateful> (i.e., once an item has been consumed from them, it's
gone), an instance object of the class will return the same iterator (i.e., the
class itself) which won't produce more values once they've been consumed. One
way to mitigate this is to let an array consume the instance object's values,
and then use that array as needed. Another alternative is to reset C<$!index> to
0 once all the values are consumed, however this would violate the iterator
protocol.
=begin code :skip-test<compile time error; undeclared name>
my \dna = DNA.new('ACGTACGTT');
say dna.map(*.join).join('|'); # OUTPUT: «GCA|TAC|TTG␤»
say dna.map(*.join).join('|'); # OUTPUT: «␤»
=end code
Keeping the aforementioned caveats in mind, implementing this low-level
interface simplifies the implementation of the C<Iterable> interface. Now the
iterator will be the object itself, since we can invoke C<pull-one> on it to
access every member in turn; C<iterator> will thus return just C<self>, which is
possible since the object will be both an C<Iterable> and an C<Iterator>.
This need not always be the case, and in most cases C<iterator> will have to
build an iterator type to be returned (that will, for instance, keep track of
the iteration state, which we are doing now in the main class), such as we did
in the previous section; however, this example shows the minimal code needed to
build a class that fulfills both the iterator and iterable roles.
=head1 Manual iteration
Iterating constructs, such as C<for> loops, use iterators to iterate through
collections. However, the programmer can also request values from an iterator
one at a time without the aid of any iterating construct.
You can get an iterator from any iterable, and you can use an iterator to manually
loop over the iterable it came from. Thus, the way to manually iterate through an
iterable is to get a hold of an iterator object by invoking the C<iterator> method
and call the C<pull-one> method on said iterator to request a new item until
C<IterationEnd> is returned.
=begin code
my @languages = <perl python haskell ruby>;
my $iter = @languages.iterator;
say $iter.pull-one; #=> «perl␤»
say $iter.pull-one; #=> «python␤»
say $iter.pull-one; #=> «haskell␤»
say $iter.pull-one; #=> «ruby␤»
say $iter.pull-one; #=> «IterationEnd␤»
=end code
After all the iterator's values are consumed, the returned value is the constant
C<IterationEnd>. Once C<IterationEnd> has been generated, it's forbidden to make
attempts to fetch more data, and behavior for doing so is undefined. The only
valid use of the sentinel value C<IterationEnd> in a program is identity
comparison (using L<C<=:=>|/routine/=:=>) with the result of a method in the
iterator API. Any other behavior is undefined and implementation dependent.
Thus, the manual iteration can be rewritten as a loop:
=begin code
my @languages = <perl python haskell ruby>;
my $iter = @languages.iterator;
loop {
my \value = $iter.pull-one;
last if value =:= IterationEnd;
say value;
}
=end code
Since C<IterationEnd> is a constant, it's important that the variable we compare
against it be bound, and not assigned. Hence, the use of the sigilless variable
C<value> in the previous example.
=head1 How to iterate: contextualizing and topic variables
Loops, such as C<for>, place the item produced in every iteration into the
L<topic variable C<$_>|/language/variables#index-entry-topic_variable>, or
capture them into the variables that are declared along with the block. These
variables can be directly used inside the loop, without needing to declare them,
by using the
L<C<^> twigil|/syntax/$CIRCUMFLEX_ACCENT#(Traps_to_avoid)_twigil_^>.
Implicit iteration occurs when using the L<sequence operator|/language/operators#index-entry-..._operators>:
say 1,1,1, { $^a² + 2*$^b + $^c } … * > 300; # OUTPUT: «(1 1 1 4 7 16 46 127 475)
The generating block is being run once while the condition to finish the
sequence, in this case the term being bigger than 300, is not met. This has the
side effect of both running a loop and creating a list.
This can be done more systematically through the use of
L<C<gather/take> blocks|/syntax/gather take>, which are a different kind of
iterating construct that instead of running in sink context, returns an item
every iteration. This
L<Advent Calendar tutorial|https://perl6advent.wordpress.com/2009/12/23/day-23-lazy-fruits-from-the-gather-of-eden/>
explains use cases for this kind of loops; in fact, C<gather> is not so much a
looping construct, but a statement prefix that collects the items produced by
C<take> and creates a list out of them.
=head1 Classical loops and why we do not like them
Classical C<for> loops, with a loop variable being incremented, can be done in
Raku through the L<C<loop> keyword|/language/control#loop>. Other loops, such
as L<repeat|/language/control#repeat/while,_repeat/until> and
L<while|/language/control#while,_until>, are also possible.
However, in general, they are discouraged. Raku is a functional and concurrent
language; when coding in Raku, you should look at loops in a functional way:
processing, one by one, the items produced by an iterator, that is, feeding an
item to a block without any kind of secondary effects. This functional view
allows also easy parallelization of the operation via the
L<C<hyper>|/routine/hyper> or L<C<race>|/routine/race> auto-threading methods.
If you feel more comfortable with your good old loops, the language allows you
to use them. However, it is considered better practice in Raku to try and use, whenever
possible, functional and concurrent iterating constructs.
I<Note:> Since version 6.d loops can produce a list of values from the values of
last statements.
=end pod
# vim: expandtab softtabstop=4 shiftwidth=4 ft=perl6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment