Skip to content

Instantly share code, notes, and snippets.

@donaldh
Last active September 6, 2018 15:48
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save donaldh/8bc4039daa779570e21425520791f41d to your computer and use it in GitHub Desktop.
Save donaldh/8bc4039daa779570e21425520791f41d to your computer and use it in GitHub Desktop.

Golfing Faster FASTA

After reading Timotimo’s excellent Faster FASTA Please blog post, I wanted to test the performance of some other Perl 6 methods.

IO.slurp.lines

This is my baseline which incorporates Timotimo’s performance improvements to the original solution, somewhat simplified.

my %seqs;
my $s = '';
my $id;
for 'genome.fa'.IO.slurp(:enc<latin1>).lines -> $line {
    if $line.starts-with('>') {
        if $id {
            %seqs{$id} = $s;
            $id = Nil;
        }
        $id = $line.substr(1);
        $s = '';
    } else {
        $s ~= $line;
    }
}
if $id {
    %seqs{$id} = $s;
}
say "Took { now - BEGIN now } seconds";

: Took 3.58698513 seconds

IO.lines

What if we avoid using slurp? Hopefully IO.lines will manage to be faster.

my %seqs;
my $s = '';
my $id;
for 'genome.fa'.IO.lines(:enc<latin1>) -> $line {
    if $line.starts-with('>') {
        if $id {
            %seqs{$id} = $s;
            $id = Nil;
        }
        $id = $line.substr(1);
        $s = '';
    } else {
        $s ~= $line;
    }
}
if $id {
    %seqs{$id} = $s;
}
say "Took { now - BEGIN now } seconds";

: Took 4.71259838 seconds

After a few runs, it seems to average out at being just a bit slower than slurping the file before iterating. But it has the advantage of avoiding the memory required for the whole file and should scale better for much larger files.

Split and Skip

This is my baseline for the second implementation in Timotimo’s post.

my %seqs = slurp('genome.fa', :enc<latin1>).split('>').skip(1).map: {
    .head => .skip(1).join given .split("\n").cache;
}
say "Took { now - BEGIN now } seconds";

: Took 7.4847424 seconds

Racing Split and Skip

There is some potential for speedup using a .race here:

my %seqs = slurp('genome.fa', :enc<latin1>).split('>').skip(1).race.map: {
    .head => .skip(1).join given .split("\n").cache;
}
say "Took { now - BEGIN now } seconds";

: Took 4.2423127 seconds

Hyper Split and Skip

How does .hyper compare to .race?

my %seqs = slurp('genome.fa', :enc<latin1>).split('>').skip(1).hyper.map: {
    .head => .skip(1).join given .split("\n").cache;
}
say "Took { now - BEGIN now } seconds";

: Took 5.2303269 seconds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment