Skip to content

Instantly share code, notes, and snippets.

@masak
Created August 5, 2012 20:20
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save masak/abd360db840a0719da2f to your computer and use it in GitHub Desktop.
Save masak/abd360db840a0719da2f to your computer and use it in GitHub Desktop.
Proceedings

What are macros?

Macros are code templates. Just like HTML templates allow you to specify a mostly-constant HTML document with some interesting values inserted here and there, a macro allows you to specify a mostly-constant chunk of code with some interesting parametric chunks of code here and there.

Perl 6 allows you to dynamically prepare the template that gets inserted into the mainline code. That's right, the macro runs right in the middle of compile-time, much like a BEGIN block. This creates some interesting challenges having to do with crossing from compiling the program to running it, and back.

Much of the unique power of macros stems from straddling this boundary, essentially allowing you to program your program.

Before we dive into macros, let's make sure we understand lexical scoping and ordinary routines fully.

Lexical scoping

Lexical scoping is the idea that a variable declaration is scoped to the block it appears in:

{
    # $var not defined here
    my $var;
    # $var defined here
}
# $var not defined here

What's especially powerful about this is that the scope information is available to the parser. We don't have to wait until runtime to get scoping errors. (This is what we mean by lexical scoping, as opposed to dynamic scoping.)

Routines are shaped funny

We all know the "shape" of arrays and hashes: arrays hold sequences and let us access them with an integer to get stuff out. Hashes hold mappings and let us access them with a string to get stuff out.

What is the shape of a routine?

Well, routines remember computations and let us access them with a bunch of parameters. Essentially, they are shaped like little programs. They can contain anything, including access to variables and its own variable declarations.

A closure is a function value with at least one variable defined outside of itself. Like so:

sub outer {
    my $x = 42;
    sub inner {
        say $x;  # 42
    }
    inner();
}

Together, those outside variables make up the environment of the closure. I do this a lot in my Perl 6 programming, because the environment of inner subs contains the parameters of outer subs, and so the inner subs need much fewer parameters.

It's so straightforward it doesn't even feel strange. But it actually is kinda strange and wonderful. It gets more obviously strange and wonderful when allow the closure to escape the scope of its environment.

Note that an anonymous function is not the same as a closure. An anonymous function is just a function literal that lacks a name. (Why are we so obsessed with functions having names? We don't go around calling integer literals "anonymous integers".) The confusion arises because we often use anonymous functions for their capacity to generate closures, like here:

sub counter-constructor(Int $start) {
    my $counter = $start;
    return sub { $counter++ };
}

(The sub is included to make this more readable to Perl 5 people. It's not really necessary in Perl 6.)

We've now returned the closure out of its environment. The environment lives on because it is referenced by the closure.

my $c1 = counter-constructor(5);
say $c1();   # 5
say $c1();   # 6
say $c1();   # 7

And we can prove to ourselves that each invocation to the outer function yields a unique closure with its separate environment:

my $c2 = counter-constructor(42);
say $c2();   # 42
say $c1();   # 8
say $c2();   # 43

Because a closure behave like this, a variable that's part of the closure's environment, like $counter, will have to be allocated on the heap rather than on the stack. Put differently, the presence of closures in a programming language necessitate garbage collection. See also the funarg problem.

Note that the $counter variable is completely encapsulated inside the outer function. We can provide piecemeal access to it in exactly the same way as we can with objects. Closures and objects are equal in power. Which carries us into the next section.

A koan

Because closures and objects are equal in power, they can be defined in terms of one another, like so:

  • An object can be made out of a closure. Data hiding comes from declaring variables in the closure's environment. Behavior comes from calling the closure. We can emulate method dispatch by passing the method name as a first parameter.
  • A closure is a kind of function object with its environment stored as data, and one method: apply.

This duality has been immortalized in a koan by Anton van Straaten:

The venerable master Qc Na was walking with his student, Anton. Hoping to prompt the master into a discussion, Anton said "Master, I have heard that objects are a very good thing - is this true?" Qc Na looked pityingly at his student and replied, "Foolish pupil - objects are merely a poor man's closures."

Chastised, Anton took his leave from his master and returned to his cell, intent on studying closures. He carefully read the entire "Lambda: The Ultimate..." series of papers and its cousins, and implemented a small Scheme interpreter with a closure-based object system. He learned much, and looked forward to informing his master of his progress.

On his next walk with Qc Na, Anton attempted to impress his master by saying "Master, I have diligently studied the matter, and now understand that objects are truly a poor man's closures." Qc Na responded by hitting Anton with his stick, saying "When will you learn? Closures are a poor man's object." At that moment, Anton became enlightened.

For the purposes of this text, a closure is a callable thing with internal state, just like an object. An object's private environment is its class, and a closure's private environment is the totality of the variables defined outside of itself.

ASTs are closures

AST objects are closures. They are not like closures, they are closures. They are a representation of executable code (potentially) using variables declared outside of themselves.

If our actions are limited to the following, we can work with ASTs while preserving their environments:

  • Extracting a sub-AST out of an AST.
  • Inserting an AST into another.
  • Inserting synthetic AST nodes into an existing AST.

If we manage to talk about an AST without an environment (by creating one from scratch, for example), we could make that AST do things and participate in code, as long as we don't refer to any outside variables.

The five stages of macro

Let's look at the lifetime of an ordinary subroutine through compilation and running. For simplicity, let's assume it's called exactly once.

  • α: The subroutine is parsed.
  • β: The subroutine call is parsed. (This may happen before the subroutine is parsed, actually, because subs can be post-declared. Nevermind.)
  • γ: The subroutine call is run.
  • δ: The subroutine runs.

Macros are more entwined in the process of parsing than that, and so for macros we can identify five stages:

  • a: The macro and the quasi are parsed.
  • b: The macro call is parsed. Immediately as the macro call has been parsed, we invoke the macro.
  • c: The macro runs. As part of this, the quasi gets incarnated and now has no holes anymore. An AST is returned from the macro.
  • d: Back in parse mode, this AST is inserted into the call site in lieu of the macro call.
  • e: At some point in the distant future, when compiling is over, the inserted macro code is run.

The steps b and d correspond to the relatively uninteresting step β. The runtime step c, corresponding to step γ, is sandwiched between the parse-time steps b and d. In short, subroutines have a clear separation of parse-time and runtime. Macros deliberately mix runtime with parse-time.

Hygienic macros

According to Wikipedia, "Hygienic macros are macros whose expansion is guaranteed not to cause the accidental capture of identifiers." That is, we don't want variables (or other names) to collide between the macro and the rest of the program. If we make it so that they don't, we've successfully made the macro hygienic.

Hygiene is a big deal for all languages with macros in them, since the lack of hygiene can cause weird behaviors due to unintentional collisions. We'll get back to various techniques used to achieve hygiene.

How closures cause hygiene

Let's reach for an example to illustrate how hygiene just falls out naturally when we treat ASTs as closures.

macro foo($ast) {
    my $value = "in macro";
    quasi {
        say $value;
        {{{$ast}}};
    }
}

my $value = "in mainline";
foo say $value;

Keeping in mind that ASTs retain a "link" to their point of origin, we step through the stages of a macro:

  • a
    • The macro and the quasi are parsed.
    • $value in the quasi block is recognized to refer to the declared variable in the macro block.
  • b
    • The macro call is parsed. Immediately as the macro call has been parsed, we invoke the macro.
    • An AST is created out of say $value as a natural effect of parsing. This AST is rooted in the mainline, so the $value variable refers to the one in the mainline.
  • c
    • The macro runs. As part of this, the quasi gets incarnated and now has no holes anymore.
    • say $value gets inserted, but retains its link to the mainline.
    • An AST is returned from the macro.
    • This AST retains its link to the macro block.
  • d
    • Back in parse mode, this AST is inserted into the call site in lieu of the macro call.
    • Because of how it was constructed, the AST as a whole links to the macro block, but a part of it links to the mainline.
  • e
    • At some point in the distant future, when compiling is over, the inserted macro code is run.
    • And voila, it prints in macro and then in mainline.

This is perhaps the smallest example that shows how things stay out of the way of each other. Each step is simple and fully general. It works out similarly even for more composed cases, when the going gets really tough.

Other approaches to hygiene

Wikipedia lists five ways to achieve hygiene in macros:

  • Obfuscation. Using strange names that won't collide with anything else.
  • Temporary symbol creation. Also known as "gensym".
  • Read-time uninterned symbol. Essentially giving symbols inside of a macro their own namespace.
  • Packages. Keeping the macro's symbols in a separate package.
  • Hygienic transformation. The macro processor does gensym for you.

None of these ways rely on ASTs-as-closures. And yet that seems to be all that's required to solve the problem of macro hygiene.

When I have this working — and, after thinking about this for over half a year, I don't see any reason it shouldn't — I'm going to edit the Wikipedia page to include Closures as a sixth option.

Conclusion

Lexical scoping and all of its consequences may be the best idea in computer science, ever. Closures are a natural consequence of taking both lexical scoping and first-class function values seriously.

Function values are shaped not just according to the computation they perform, but also according to the environment they perform it in. This may sound like a weakness, but it's actually a great strength. We can use it to achieve encapsulation and data hiding, just like with objects.

The work on macros in Rakudo is coming along fine. I feel I have gained a deeper appreciation of lexical scoping and closures because of it. And there's more to come.

Exploits

This talk takes the following thesis as a starting point, and explores it:

Every feature in a system is a potential source of exploits.

In the context of this talk, let's define "exploit" as "use outside of the intended parameters". Those intended parameters, depending on the system, could be set by the system's originator, its user base, or just society at large. Exploits don't have to be malicious — if I build a castle out of sugar cubes, that's using sugar cubes outside of their intended parameters (sweetening stuff), but it isn't malicious.

Features

The accumulated potential for exploits grows with the number of features. If features are so bad, maybe we shouldn't add so many to our systems? The problem is that we are rather fond of features. They're the whole point of our systems. Maybe some features can be dropped. Most probably can't. Sometimes the willingness to drop a feature shifts when the feature is considered from an exploitation point of view.

However, one thing we sometimes can do is think about which features can be unified. Let's think of "unification" in this case as taking code paths that belonged to individual features and reducing them into a single code path. In some sense, that allows you to retain your features but expose them as aspects of a single underlying feature. Done correctly, this can reduce the exposure to exploits. Unification can also have the advantage of making the domain model conceptually simpler, and the resulting win in manageability can lead to a net decrease in bugs.

C. A. R. Hoare has a relevant quote:

There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult. It demands the same skill, devotion, insight, and even inspiration as the discovery of the simple physical laws which underlie the complex phenomena of nature.

I'd like to add to this that what I've learned in recent years is that it doesn't take a lot to make a system cross the "complicated" threshold.

Bounded contexts

Everything we've said so far could make it seem like the prevalence of exploits grows roughly linearly with the amount of features. But it's much worse than that.

Every combination of features in a system is a potential source of exploits.

By sheer combinatorics, we're fucked.

This is why we talk of corner cases — because features interact, whether or not we anticipate that they will.

I've always liked this tweet, because it proposes such an exploit:

When I start a band, I'm gonna call it "Podcasts", just to fuck with iTunes' directory structure.

Features tend to interact because they work on the same data. Try to keep this to a minimum. In fact, whenever possible, try to put bundles of features which do not interact as far away from each other as possible. Put unrelated code paths in different bounded contexts, and strongly limit interaction across such contexts. In some sense, each bounded context becomes system on its own, with fewer unexpected corner cases. It's not a matter of removing all contact between different bounded contexts; rather to have such contact take place over a predefined protocol, such as inter-context message passing. This allows each context to maintain its own data without being entangled with unrelated concerns.

Positive bias

But a real source of exploits in software is positive bias, often referred to as confirmation bias: a widespread weakness in human information processing whereby we forget to eliminate hypotheses. We tend to think that finding positive examples of a hypothesis is enough, but in reality it's finding negative examples that allow us to distinguish between hypotheses and reach conclusions.

This excerpt from Chapter 8 of the fanfic Harry Potter and the Methods of Rationality illustrates this phenomenon better than I can.

The boy's expression grew more intense. "This is a game based on a famous experiment called the 2-4-6 task, and this is how it works. I have a rule - known to me, but not to you - which fits some triplets of three numbers, but not others. 2-4-6 is one example of a triplet which fits the rule. In fact... let me write down the rule, just so you know it's a fixed rule, and fold it up and give it to you. Please don't look, since I infer from earlier that you can read upside-down."

The boy said "paper" and "mechanical pencil" to his pouch, and she shut her eyes tightly while he wrote.

"There," said the boy, and he was holding a tightly folded piece of paper. "Put this in your pocket," and she did.

"Now the way this game works," said the boy, "is that you give me a triplet of three numbers, and I'll tell you 'Yes' if the three numbers are an instance of the rule, and 'No' if they're not. I am Nature, the rule is one of my laws, and you are investigating me. You already know that 2-4-6 gets a 'Yes'. When you've performed all the further experimental tests you want - asked me as many triplets as you feel necessary - you stop and guess the rule, and then you can unfold the sheet of paper and see how you did. Do you understand the game?"

"Of course I do," said Hermione.

"Go."

"4-6-8" said Hermione.

"Yes," said the boy.

"10-12-14", said Hermione.

"Yes," said the boy.

Hermione tried to cast her mind a little further afield, since it seemed like she'd already done all the testing she needed, and yet it couldn't be that easy, could it?

"1-3-5."

"Yes."

"Minus 3, minus 1, plus 1."

"Yes."

Hermione couldn't think of anything else to do. "The rule is that the numbers have to increase by two each time."

"Now suppose I tell you," said the boy, "that this test is harder than it looks, and that only 20% of grownups get it right."

Hermione frowned. What had she missed? Then, suddenly, she thought of a test she still needed to do.

"2-5-8!" she said triumphantly.

"Yes."

"10-20-30!"

"Yes."

"The real answer is that the numbers have to go up by the same amount each time. It doesn't have to be 2."

"Very well," said the boy, "take the paper out and see how you did."

Hermione took the paper out of her pocket and unfolded it.

Three real numbers in increasing order, lowest to highest.

Hermione's jaw dropped. She had the distinct feeling of something terribly unfair having been done to her, that the boy was a dirty rotten cheating liar, but when she cast her mind back she couldn't think of any wrong responses that he'd given.

"What you've just discovered is called 'positive bias'," said the boy. "You had a rule in your mind, and you kept on thinking of triplets that should make the rule say 'Yes'. But you didn't try to test any triplets that should make the rule say 'No'. In fact you didn't get a single 'No', so 'any three numbers' could have just as easily been the rule. It's sort of like how people imagine experiments that could confirm their hypotheses instead of trying to imagine experiments that could falsify them - that's not quite exactly the same mistake but it's close. You have to learn to look on the negative side of things, stare into the darkness. When this experiment is performed, only 20% of grownups get the answer right. And many of the others invent fantastically complicated hypotheses and put great confidence in their wrong answers since they've done so many experiments and everything came out like they expected."

We program in terms of scenarios and use cases that we imagine in our mind. Unless we explicitly train ourselves to think in terms of sad paths and possible vulnerabilities, we're unlikely to picture the "negative hypotheses" in our code, the things that might go wrong. The uses that might fall outside of our intended parameters.

Power features

Let's apply the danger of positive bias to so-called "power features" in programming languages. Perl is actually a good example here, because many of its power features are double-edged swords.

Somewhere inside the CPAN module SOAP::Lite, there is a method dispatch that (simplified) has this form:

$soap_object->$method_name(@parameters);

This makes a lot of sense, because SOAP is an RPC-like protocol in which the method name to call is passed from client to the server, and comes in as input to SOAP::Lite. We don't know the method name at compile time, so it makes sense to use Perl's built-in facility to dispatch on the name found in a variable, here $method_name.

But with great power comes great exploitability. Here's a comment and a check from the SOAP::Lite module:

# check to avoid security vulnerability: Protected->Unprotected::method(@parameters)
# see for more details: http://www.phrack.org/phrack/58/p58-0x09
die "Denied access to method ($method_name)\n" unless $method_name =~ /^\w+$/;

The phrack.org link is dead, but the Internet Archive has our back. What follows is a simplified explanation of the attack.

The first thing that the script kiddie exploits is that $method_name could include not just a method name but also a package, such as HTTP::Daemon::ClientConn::send_file. This file was never meant to be called through the SOAP dispatch mechanism, but the semantics of Perl allow it. The programmer probably never conceived of the possibility before being hit with the exploit. (Notice how even the variable name, $method_name, allows us to endure in our positive bias here?)

Here's a simple script which demonstrates the exploit.

use 5.014;
use strict;
use warnings;

package C;

sub foo {
    my ($self) = shift;
    say "foo called with arguments ",
        join ", ", map { qq["$_"] } @_;
}

package main;

my $method_name = shift @ARGV
    or die "Usage: $0 <method name> <method argument>*";
my @arguments = @ARGV;

C->$method_name(@arguments);

Running this script looks like this:

$ perl explain_exploit foo Hello world!
foo called with arguments "Hello", "world!"

But now let's call the same script with this first parameter:

$ perl explain_exploit HTTP::Daemon::ClientConn::send_file ...

And let's further assume that among the dependencies of SOAP::Lite, there is code that looks like this:

package HTTP::Daemon::ClientConn;

sub send_file {
    my($self, $file) = @_;
    if (!ref($file)) {
        open(F, $file) || return undef;
        # ...
    }
}

The second thing that the script kiddie exploits is that the above code contains an "unprotected open", the two-argument form where $file is allowed to contain not just the file name, but also the mode for the file to be opened in. Common modes are reading (<), writing (>), and appending (>>), but there's also piping (|), which executes a system command. This can be very useful.

One has to be careful, though, not to allow arbitrary user input to populate the parameter $file. (Notice, by the way, how the naming of the variable $file also assists our positive bias in thinking everything is OK?) The ability of the second argument of an open call to dictate the mode is the reason two-argument open is strongly discouraged nowadays.

The following should send chills down your spine:

$ perl explain_exploit HTTP::Daemon::ClientConn::send_file '|/bin/ps'
  PID TTY          TIME CMD
25500 pts/16   00:00:00 bash
28836 pts/16   00:00:00 perl
28837 pts/16   00:00:00 ps

The whole exploit is two injection attacks stacked on top of each other. The two exploitable features aren't even in the same module, and not even written by the same author. Perl's indirect method dispatch doesn't make it easy to separate contexts.

It's interesting to note how subverting a CPAN module this way is very close to being art. I don't condone computer intrusion in any way, but I do admire the thinking that went into this exploit. We should all think more like this; our software would be better for it.

Script kiddies and bug reports

It is said that every good bug report should contain these three things:

  • Steps to reproduce
  • What you observed
  • What you expected

These rules become self-evident when viewed from the perspective of a fourteen-year-old script kiddie:

  • Steps to reproduce: pics or it didn't happen
  • What you observed: I 'sploited it...
  • What you expected: ...and they didn't even see it coming

Rakudobugs

An ancient Greek myth tells about king Midas, who was granted the ability to transmute things he touched turn into gold. Although initially very excited about his powers, he quickly found the downsides of the gift, as he couldn't eat anything. Also, he inadvertently turned his daughter into a gold statue. There's probably some lesson in there — maybe that of riches being less important in life than some other things, like food and love.

In 2008, I discovered that I have something of the Midas touch when it comes to Rakudo bugs. Actually, it seems to be me and software, but for some reason the effect is very strong with Rakudo. I did the opposite journey king Midas did; no-one wants to have their software break all the time, but I came to accept it and consider it an asset of sorts. Better for these bugs to hit me, I reasoned, than future users of Perl 6.

So I set forth and submitted rakudobugs to our RT instance as I found them. I quickly got the epithet "bug wrangler", and learned to streamline the bug submitting process as much as possible. Somehow my brain considers submitting a rakudobug as "zero work".

As of August 2012, I've submitted 1356 tickets in four years; slightly less than one per day. There's a total 2874 tickets in the perl6 queue, so about 47% of them were submitted by me.

I submit a fair amount of bugs for others, but the majority of the bugs I submit are things I discover by actually using Perl 6. Surprisingly many of these are found as part of refactoring programs. After a refactor, there's an implicit expectation that things will work the same. If they don't, that's a bug (or a thinko on the part of the programmer).

Then again, sometimes all it takes is trying a feature in a new way. In a sense, I hope to be the jungle guide who blasts a path through Perl 6 use cases with a machete, classifying and containing interesting bugs as they attack me.

A very rewarding kind of bugs are "Null PMC access" errors (generated by the Parrot VM, very much like a null reference exception on other VMs) and segfaults. Both of these are by definition use outside of the parameters of the language implementation, which should never leak VM errors to the user. As such, these bugs count as "exploits" as we have defined it.

Make no mistake: the people working on Rakudo Perl are really good developers. Incompetence is not the cause of these bugs; complexity is. If you want an example of a software design "so complicated that there are no obvious deficiencies", Perl 6 is it.

In fact, sometimes I've fantasized about constructing a huge multiplication table of all the features in Perl 6, and then just go through it cell by cell trying every pair of features to see if that digs up new bugs. Though perhaps a simple script suggesting random combinations of features would be more apt.

Some rakudobug case studies

Let me show how bug golfing happens. The following instance is still fresh enough in my mind that I can give an account of my thought process.

A user, nebuchadnezzar, showed up on #perl6 and reported that the following example from the Perl 6 book didn't work:

class Rock     { }
class Paper    { }
class Scissors { }

multi wins(Scissors $, Paper    $) { +1 }
multi wins(Paper    $, Rock     $) { +1 }
multi wins(Rock     $, Scissors $) { +1 }
multi wins(::T      $, T        $) {  0 }
multi wins(         $,          $) { -1 }

sub play ($a, $b) {
    given wins($a, $b) {
        when +1 { say "Player One wins" }
        when  0 { say "Draw"            }
        when -1 { say "Player two wins" }
    }
}

play(Rock, Rock); # output: Player two wins

given wins(Rock, Rock) {
    when +1 {say "Player One wins"}
    when 0 {say "Draw"}
    when -1 { say "Player two wins"}
} # output: Draw

His running hypothesis was that '"given" in the subroutine does not seems to behave the same way as outside'. But the program is far too large for us to say anything sensible about it. So, we golf. (Watch as we get less and less code the more we zero in on the bug. This is very typical of this kind of exploration.)

By the way, a type capture, like ::T above captures the type in the variable T, and allows you do do type matching on it later. So the signature of wins(::T $, T $) is to be read as "accepts two parameters with identical type".

The first variant I come up with is this:

class R {}
multi w(::T, T) { 0 }
multi w($, $) { -1 }
sub p($a, $b) { w $a, $b }
say p(R, R);
say w(R, R);

OUTPUT: -1\n0\n

Note, there is no given construct. So we can put that hypothesis aside. My new hypothesis is instead that it's the p subroutine that does it somehow. So I try with a "pointy block" instead of a subroutine.

class R {}
multi w(::T, T) { 0 }
multi w($, $) { -1 }
(-> $a, $b { say w $a, $b })(R, R);
say w R, R;

OUTPUT: -1\n0\n

So it wasn't the subroutines. My guess now is that it's parameter binding.

class R {}
multi w(::T, T) { 0 }
multi w($, $) { -1 }
my ($a, $b) = R, R;
say w $a, $b;

OUTPUT: -1\n

So it isn't parameter binding either. It's variables. Or rather, containers.

A containter is the thing that allows you to assign new values to a variable, array element, or other similar mutable thing.

These two runs seem to corroborate the container hypothesis:

multi w(::T, T) { 0 }
multi w($, $) { -1 }
say w(|[1, 1]);
say w(1, 1)

OUTPUT: -1\n0\n

And then, finally, the run that exposes the bug:

sub w(::T, T) { 0 }
say w(|[1, 1])

OUTPUT: Nominal type check failed for parameter
        ''; expected Scalar but got Int instead
        in sub w

So the whole bug boils down to type captures and containers not playing well together.

The next one I had already golfed a fair bit when I presented it to the channel:

use Test;
class A {}
(-> &c, $m { A.new()(); CATCH { default { ok 1, $m } } })(A, "")

OUTPUT: (signal SEGV)

The last line is a pointy block which we instantly invoke. An instance of A gets created and immediately invoked, an operation it does not support, thus generating an exception. In the catch clause, we call ok, provided through the Test module. There's nothing weird going on here; so why does it segfault?

moritz manages to remove the dependency on Test:

class A {}
(-> &c, $m { A.new()(); CATCH { default { say $m } } } )(Mu.new, '')

OUTPUT: Null PMC access in find_method('gist')

A Null PMC access is slightly less dramatic than a segfault, but we're still chasing the same bug here.

At this point, I've convinced myself that the A.new()(); statement actually runs. This next run disprives that hypothesis:

use Test;
class A {}
(-> &c, $m { CATCH { default { ok 1, $m } } })(A, "")

OUTPUT: (signal SEGV)

Which is our first really big clue: the CATCH block triggers even without any other statements in the pointy block. Which means that the CATCH block catches something that is not in the pointy block as such.

Could it be that CATCH blocks (wrongly) trap binding errors? moritz tests:

 sub f(&x) { CATCH { default { say "OH NOES" } } }; f Mu.new
 
OUTPUT: OH NOES\n

Yup. And that's the bug. When we tried to print $m, it had a Null PMC in it because it had not been initialized by the binder, which gave up on the first parameter:

(-> &c, $m { CATCH { default { say $m } } } )(Mu.new, '')

OUTPUT: Null PMC access in find_method('gist')

Clearly CATCH in a block shouldn't catch binder-generated exceptions, and that was the bug here.

I always liked this next one:

class B;

method foo() {
    use A; # A.pm just defined a grammar
}

OUTPUT: You can not add a Method to a module; use a class, role or grammar

This one happened in the interaction between the parser keeping track of whether it is inside a module, a class, or something else, and inclusion of new compilation units. The solution this time was simply to make the parser do a bit of extra bookkeeping when seeing a new compilation unit.

The next one presupposes the knowledge of named parameters (which bind to named arguments and are identified by their name rather than their position) and anonymous parameters (which have just a sigil).

What happens if we have an anonymous named parameter?

sub foo(:$) {}
say &foo.signature.perl

OUTPUT: :(Any $?)\n

That shows it as an anonymous positional parameter, which is wrong. No-one ever considered the possibility of an anonymous named parameter up until the point when this rakudobug was submitted.

We end this exposition with the infamous snowman-comet bug:

say "abc" ~~ m ☃.(.).☄

Now, regexes may be delimited with matching opener and closer characters. A Unicode snowman and comet are not matching opener and closer characters. (Though I admit it would be quite cute if they were.)

This only worked for regexes. Other quoting constructs were not susceptible to this. The bug did eventually get fixed, but not so much by finding the cause of it, as by building a next-generation regex engine.

Conclusion

Software is hard. When you can, avoid creating so many features. They will be used against you.

When you can, isolate features from each other so that they can't interact. Identify subsystems of features that can be thus isolated.

As a programmer, be wary of positive bias and the way it hides exploits from you when you code.

Due to positive bias, power features are sources of exploits. Script kiddies are inventive; they will chain exploits in order to take control of your environment.

Perl 5's indirect method call is a useful power feature. The data passed to it must be validated if it's user input.

Perl 5's two-argument open is a power feature, but it's unsafe. It has been obsoleted by three-argument open. Do not use two-argument open. Do not load modules that use it.

The number of corner cases grows with the square of the number of features. Ask yourself where your threshold of keeping track of such combinations lies.

Experience shows that these corner cases are not just a theoretical concern; they show up all the time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment