jkeenan/how-to-bisect-a-perl-debugger-problem.mkdn

## how-to-bisect-a-perl-debugger-problem.mkdn

      
    Raw
  

              how-to-bisect-a-perl-debugger-problem.mkdn
            
          
    How to Bisect a Problem in the Perl 5 Debugger

Bisection and the Perl 5 Core Distribution

Bisection is a developer's technique used to locate the point in a series of
commits to a repository where the behavior of an application changed.  While
this change could be a change for the better -- "When did this feature start
accidentally working?" -- the change being sought is usually a change for the
worse:  "When did this feature stop working correctly?".
Source code stored in a git repository can generally be bisected with some
variant of the git bisect command.  In the Perl 5 repository we have a
program, Porting/bisect.pl, which is essentially a giant wrapper around
git bisect intended to handle bisection situations specific to that
codebase.
For reasons I do not understand, the documentation for
Porting/bisect.pl is actually found in a different file:
Porting/bisect-runner.pl.  The synopsis for that documentation (as of
perl-5.27.3) contains a number of typical use cases:
$> perldoc Porting/bisect-runner.pl

...

SYNOPSIS
    # When did this become an error?
    .../Porting/bisect.pl -e 'my $a := 2;'
    # When did this stop being an error?
    .../Porting/bisect.pl --expect-fail -e '1 // 2'
    # When did this stop matching?
    .../Porting/bisect.pl --match '\b(?:PL_)hash_seed_set\b'
    # When did this start matching?
    .../Porting/bisect.pl --expect-fail --match '\buseithreads\b'
    # When did this test program stop working?
    .../Porting/bisect.pl -- ./perl -Ilib ../test_prog.pl
    # When did this first become valid syntax?
    .../Porting/bisect.pl --target=miniperl --end=v5.10.0 \
         --expect-fail -e 'my $a := 2;'
    # What was the last revision to build with these options?
    .../Porting/bisect.pl --test-build -Dd_dosuid

Bisection is not something anyone does on an every day basis.  Consequently,
every time I have to bisect some code I have to stop to think, "Which of
these cases covers my current condition?"  I eventually figured out that it
would be good to write down the answers to that question!  So what follows is
one instance of problem solving via bisection in the Perl 5 core distribution.
YMMV!
The Problem at Hand:  Regression in the Perl 5 Debugger

Is there anyone who loves the Perl 5 debugger?  Let's have a show of hands.
Ah, I didn't think so.  It looks ugly on the screen.  It's less user-friendly as a
learning tool than REPLs in, say, Python or Scala.  It's something which its
own documentation warns you not to use unless you have already run your code
with warnings enabled.  Its learning curve is much more difficult than simply
scattering print statements through your code.  And since you only use when
you've got a sticky problem, you don't use it often enough to master all its
functionality.
As a consequence, the debugger's own code doesn't get much love, either.  The
debugger is nothing but a Perl program found within the Perl 5 core
distribution -- lib/perl5db.pl, to be precise.  So bugs in the debugger are
reported to the main Perl 5 bug tracker, RT, where they
tend to linger un-worked-on for years.
Such was the case with RT #120174,
entitled "Debugger
command regression: Now requires more spaces".  This ticket was created by a
long-time British (I think) Perl hacker named Smylers on the anniversary of
Thelonious Monk's birth in October 2013.  Smyler reported that for a long time
he had been accustomed to using the p and x debugger commands -- used to
print and execute statements, respectively -- without a subsequent
wordspace, like this:
p@ARGV

x@ARGV

x\@ARGV

x\%INC

This syntax is one character shorter than the documented syntax for these
commands:
p @ARGV

x @ARGV

x \@ARGV

x \%INC

Smylers noted that the wordspace-less syntax worked in perl-5.14.3 but was
clearly not working in perl-5.19.5.  By "not working", Smyler meant that when
he tried the wordspace-less syntax, he got exceptions like this:
   DB<72> x\@ARGV
 Backslash found where operator expected at (eval 8)[lib/perl5db.pl:732] line 2, near "x\"
 at (eval 8)[lib/perl5db.pl:732] line 2.
   eval 'no strict; ($@, $!, $^E, $,, $/, $\\, $^W) = @DB::saved;package main; $^D = $^D | $DB::db_stop;
 x\\@ARGV;
 ' called at lib/perl5db.pl line 732
   DB::eval called at lib/perl5db.pl line 3091
   DB::DB called at -e line 1
syntax error at (eval 8)[lib/perl5db.pl:732] line 2, near "x\"

Smylers then added, "I haven't tried to narrow it down any further, mainly cos I
don't know how to come up with a non-interactive way of demonstrating the
change."
Neither, apparently, did anyone else.  Nor was anyone else moved to reply to
this bug report for nearly four years.  However, having too much time on my
hands this summer, I began to once again work my way through the RT "New"
queue, that is, RT tickets which had not received a single reply.  On
September 3, 2017, I confirmed Smyler's original report.  It then took me
several stages to, in Smyler's words, "...  come up with a
non-interactive way of demonstrating the change."
Stage 1:  Perlbrew

In hacking on the Perl 5 core distribution the command-line tool perlbrew
is your friend.  You can get it from CPAN at
App-perlbrew.  It was
created by a Perl hacker named Kang-min Liu, whose nationality or terrestrial
location, to be honest, I do not know.  With perlbrew you can have multiple
versions of perl installed beneath your home directory.  So if you want to
find out how something worked in, say, perl-5.14 back in 2011-12, you simply
say:
perlbrew use perl-5.14.4

And there it is!
$ perl -v | head -2 | tail -1
This is perl 5, version 14, subversion 4 (v5.14.4) built for x86_64-linux

So the first thing I did in studying Smyler's problem was to use perlbrew
to quickly move through Perl versions, firing up the debugger in each, until I
found:

The most recent production version in which the wordspace-less syntax
worked; and
The oldest production version in which the wordspace-less syntax no longer
worked.

I was quickly able to determine that the wordspace-less syntax was working in
perl-5.16.3 but was not working in perl-5.18.4.  This meant that the change of
behavior probably took place between 2012 and 2013, somewhere between this
commit:
commit 68cd360812f9eaa2d34c45c501e2fef87c44ccde
CommitDate: Tue May 1 19:08:57 2012 -0400

... and this commit:
commit 1799399c45baa7e5af45db33567d2a2de629d5e0
CommitDate: Fri Jun 7 11:51:45 2013 -0400

In this period of slightly over a year, there were more than 200 commits to
lib/perl5db.pl -- but thousands of commits to the entire core
distribution.  This was clearly going to need bisection.
Stage 2:  Which Syntax Should I Use for Porting/bisect.pl?

But which of the various invocations for Porting/bisect.pl shown above was
I to use?  Well, if any other Perl program terminated with syntax errors like
Backslash found where operator expected, I would know that the program, in
Unix terms, exited with a non-zero error number.  Any program that ended with
EXIT_FAILURE would be suitable for this invocation:
    .../Porting/bisect.pl -- ./perl -Ilib ../test_prog.pl

... as this version is designed to report the first commit at which
../test_prog.pl failed, i.e., exited with a non-zero error number.
The Perl 5 debugger, however, is not just any other Perl program.  When you
are debugging code and encounter an exception, the debugger traps that
exception but it itself does not die.  It waits for you to tell it what to
do next.  If, for example, you tell the debugger to quit (q), the debugger
program will exit successfully, i.e., it will exit 0.
Hence, we won't go anywhere with an invocation in which the debugger is itself
the final argument:
    .../Porting/bisect.pl -- ./perl -Ilib lib/perl5db.pl  # <-- WON'T WORK

So what we have to do is to write a program which:

simulates the action of the debugger's p and x commands;
throws exceptions when the simulated p and x commands are invoked
without a subsequent wordspace; and
exits with a non-zero error number that can be detected by
Porting/bisect.pl and cue the termination of the bisection process.

Stage 3:  Write a Program that Fails

What we do a program that exercises parts of an application's functionality?
We call it a test.
So, to exercise the debugger's p or x commands we can seek guidance by
looking at the way the Perl 5 debugger is itself tested within the core
distribution.  The debugger is tested via lib/perl5db.t.  To get a basic
understanding of how that program works, let's suppose that we have a little,
4-line Perl program whose code we want to step through with the debugger:
$> cat rt-120174

@abc = (1..4);
print "hello world\n";
%xyz = ( 'alpha' => 'beta', 'gamma' => 'delta' );
print "goodbye world\n";

We place this program in lib/perl5db/t.  We then add a block of code to
lib/perl5db.t which sucks in this 4-line program and iterates through an
array of debugger commands until we get success or failure.
{
    # perl 5 RT #120174 - 'p' command
    my $wrapper = DebugWrap->new(
        {
            cmds =>
            [
                'b 2',
                'c',
                'p@abc',
                'q',
            ],
            prog => '../lib/perl5db/t/rt-120174',
        }
    );

    ...
}

lib/perl5db.t contains a DebugWrap whose constructor takes a
hash reference with cmds and prog elements.  The cmds element holds a
list of debugger commands.  The prog element points to the chunk of code
we're going to step through.  In the example above, we:


Set a breakpoint for line 2


Instruct the program to continue until the breakpoint is reached


At the breakpoint print out @abc
Based on the 4-line program -- and assuming the wordspace-less syntax is
working -- we expect that output to be:
  1234

That is, the same as the output of this command in a regular Perl program:
  print "@abc\n";


Quit the debugger


The DebugWrap class provides a contents_like() method which asks if the
output of a debugger command matches a pattern specified as the method's first
argument.
{
    ...

    $wrapper->contents_like(
        qr/1234/,
        q/RT 120174: p command can be invoked without space after 'p'/,
    );
}

If you were to create branches from an "old" version of Perl and from Perl 5
blead, respectively, then add the tests described above to each branch, you
would see that:

In the "old" version p@abc works as Smylers was accustomed
to having it work, that is, it would print 1234 and the contents_like
unit test would PASS -- permitting lib/perl5db.t and make test to
pass as well.
In the blead-based version p@abc would throw exceptions and the
contents_like unit test would FAIL -- in turn causing lib/perl5db.t
and make test to fail.

Stage 4:  Write a Program that Fails that We Can Use during Bisection

If the unit test block and sample code above had been added to the Perl 5 core
distribution ten years ago, then we would have known when the behavior
changed.  But it wasn't, which means we have to hunt for the breaking commit.
We have to place our testing code and our 4-line sample code somewhere
outside the core distribution on our filesystem.
That's conceptually simple, but difficult in practice.  Why?  Because
lib/perl5db.t requires code which is found only within the core
distribution, specifically, t/test.pl.
BEGIN {
    chdir 't' if -d 't';
    @INC = '../lib';
    require './test.pl';
}

The DebugWrap package uses subroutines imported from t/test.pl.  We have
to jump through some hoops to be able to use the DebugWrap package to work
properly.  I had to write a program which I called adapted-perl5db.t which
I placed in a scratch directory.  This program starts by chdir-ing to my
git-checkout directory, and then requires t/test.pl from there.
BEGIN {
    chdir $ENV{PERL_WORKDIR};
    print STDERR "Working from: $ENV{PERL_WORKDIR}\n";
    chdir 't' if -d 't';
    @INC = '../lib';
    require './test.pl';
}

adapted-perl5db.t then copies much of the code from lib/perl5db.t but
throws out all the unit test blocks except the ones we want for testing our
current bug.
my $plancount = 4;
plan($plancount);

In package main within adapted-perl5db.t, I define the
following variables and subroutines:
my $passcount = 0;

my $myprog = '/home/jkeenan/learn/perl/p5p/120174-debugger/rt-120174';

sub capture_return_value {
    my $rv = shift;
    $rv = 0 unless $rv;
    $passcount += $rv;
}

I used subroutine capture_return_value() to capture the return value of a
given call to contents_like() -- 1 for PASS and "" for FAIL and
add that to a tally of passing tests.
{
    # perl 5 RT #120174 - 'p' command
    my $wrapper = DebugWrap->new(
        {
            cmds =>
            [
                'b 2',
                'c',
                'p@abc',
                'q',
            ],
            prog => $myprog,
        }
    );

    capture_return_value($wrapper->contents_like(
        qr/1234/,
        q/RT 120174: p command can be invoked without space after 'p'/,
    ));
}

If at the end of the test file $passcount is equal to $plancount, then
the file as a whole has passed and ought to exit with Unix-success, that is,
with a 0 exit value.
If, however, some of the unit tests have not succeeded, then $passcount
will be less than $plancount.  The file as a whole fails and ought to exit
with Unix-failure, that is, a non-zero exit value.  In adapted-perl5db.t we
have an END block which does the calculation for us.
END {
    my $netcount = $plancount - $passcount;
    1 while unlink ($rc_filename, $out_fn);
    exit $netcount;
}

So we now have a Perl program which can simulate individual Perl debugger
commands and test assertions as to the output of those commands.  We can
build perl at any given commit, run the test program at that commit and
determine whether the test program is "working" or not.
Stage 5:  At Long Last, Bisection

I generally perform bisection of the Perl 5 core distribution in a git
checkout different from my everyday checkout of blead.  Hence, this sequence
of commands:
$> export PERL_WORKDIR="$HOMEDIR/gitwork/perl2"
$> cd $PERL_WORKDIR
$> git clean -dfx
$> git checkout blead

Recall from early in this posting that I had determined that the behavior of
the Perl 5 debugger had changed somewhere between the 5.16.3 and 5.18.4
production releases.  Roughly speaking that change took place between commits
68cd360812 and 1799399c45.  Porting/bisect.pl can be called with --start
and --end command-line switches so as to shorten the time bisection takes.
That enabled me to compose and invoke this bisection command:
perl ./Porting/bisect.pl \
    --start 68cd360812 \
    --end   1799399c45 \
    -- ./perl -Ilib ~/learn/perl/p5p/120174-debugger/adapted-perl5db.t

This instance of Porting/bisect.pl took 21 minutes on my laptop.  Upon conclusion it displayed:
    HEAD is now at d478d7a Add more commands to the lookup table.
bad - non-zero exit from ./perl -Ilib /home/jkeenan/learn/perl/p5p/120174-debugger/adapted-perl5db.t
d478d7a00517925a72f7077951fbed1283fb6a07 is the first bad commit
commit d478d7a00517925a72f7077951fbed1283fb6a07
Author: Shlomi Fish <shlomif@shlomifish.org>
Date:   Sun Oct 14 12:56:53 2012 +0200

Add more commands to the lookup table.

:040000 040000 9182f304ff915f7da07be7ffdd7a0006aa128656 f0d83b81b7e7c82259a8cc87ab30e4e79af80a51 M  lib
bisect run success
That took 1267 seconds.

Verification of Bisection

Often times I'm not entirely sure of the results of running
Porting/bisect.pl.  So to confirm the results I checked out the commit
identified by the bisection, built perl at that commit, then ran the test
program at that commit.
$> git clean -dfx
$> git checkout d478d7a
$> sh ./Configure -des -Dusedevel
$> make test_prep
$> ./perl -Ilib /home/jkeenan/learn/perl/p5p/120174-debugger/adapted-perl5db.t 

As I expected, the test program threw exceptions and FAILed at that commit.
I then checked out the immediately preceding commit, built perl at
that commit, then ran the test program again.
$> git clean -dfx
$> git checkout d478d7a^                # <-- Note the ^
$> sh ./Configure -des -Dusedevel
$> make test_prep
$> ./perl -Ilib /home/jkeenan/learn/perl/p5p/120174-debugger/adapted-perl5db.t 

As I expected, the test program threw no exceptions and PASSed at this
commit.
Stage 6:  A Closer Look Means We Need More Bisection

Recall that in RT #120174 Smylers reported that in a newer perl he got
exceptions when using either the p or x debugger commands without a
subsequent wordspace.  It turns out that the p and x commands "broke" at
different commits.  At commit d478d7a, the x command "broke" but the p
command still "worked".  I still had the task of identifying the commit at
which the p command first "broke".
To learn that, I had to set up a second round of bisection.
$> export PERL_WORKDIR="$HOMEDIR/gitwork/perl2"
$> cd $PERL_WORKDIR
$> git clean -dfx
$> git checkout blead

perl ./Porting/bisect.pl \
    --start d478d7a005 \
    --end   1799399c45 \
    -- ./perl -Ilib ~/learn/perl/p5p/120174-debugger/p-only-adapted-perl5db.t

The end parameter is the same as in the previous bisection.  But now for
the start parameter I am providing the SHA which was the "first bad commit"
from the earlier bisection.  Upon concluding some 18 minutes later,
Porting/bisect.pl displayed this:
HEAD is now at c9a9a6c Move more commands to the dispatch.
good - zero exit from ./perl -Ilib
/home/jkeenan/learn/perl/p5p/120174-debugger/p-only-adapted-perl5db.t
8f144dfcb6b7d7a28faa9e82a687338b141c272f is the first bad commit
commit 8f144dfcb6b7d7a28faa9e82a687338b141c272f
Author: Shlomi Fish <shlomif@shlomifish.org>
Date:   Sun Oct 14 13:22:45 2012 +0200

    Low hanging fruit is now in %cmd_lookup.

:040000 040000 111452b264cefed7ee7ae68f4a222fba1ae0de02
ceae6a53cd1591c6bd4d36f46919377485f29709 M  lib
bisect run success
That took 1099 seconds.

This illustrates a point which we have to bear in mind whenever we are using
bisection as a diagnostic tool.  git bisect and, by extension,
Porting/bisect.pl will natively only tell us the first commit at which
certain source code has broken.  They cannot say anything as to whether or not
the code broke at other commits after the point of first breakage.  The
possibility of multiple points of breakage increases with the number of
commits in question and the extent to which the test suite fails to exercise
(or "cover") the source code in question.  (See my
CPAN module Devel::Git::MultiBisect
and my December 2016 talk at Philadelphia.pm for
another case where we had multiple points of breakage in the Perl 5 core
distribution.)
Where Do We Stand Now?

In this particular case of the Perl 5 debugger, the use of the p and x
commands without a subsequent wordspace has never been tested and never
been documented.  The fact that they worked at all constitutes an
accidental feature.  From 2012 to 2013 Israeli Perl hacker Shlomi Fish had
a Perl foundation grant to clean up the debugger code.  At the conclusion of
his work it was merged into Perl 5 blead by then-pumpking Rik Signes.  But
since the "feature" had never been documented, neither Shlomi nor Rik nor any
of the Perl 5 Porters with commit bits -- including myself -- ever caught this
change in the debugger's behavior.  Smylers didn't report the bug until five
months after the release of perl-5.18, and no one else paid it any mind for
nearly four more years.
As I write this in early September 2017, RT 120174 has not been resolved.  If
the wordspace-less syntax had been documented, then we would be obliged to
support it by writing patches to revert certain parts of Shlomi's 2012-13 work
and -- just as important -- to include tests of the syntax in
lib/perl5db.t.
Given that the wordspace-less syntax was never documented, however, the case
for making it work once again is exceedingly weak.  The point of Shlomi's
2012-13 work was to clean up the debugger's code in support of its documented
functionality.  Restoring previously undocumented functionality --
functionality which only one person on the planet has ever testified to using
-- will necessarily make the debugger's code more complex and difficult to
maintain going forward.  I do not believe that it is the job of the Perl 5
Porters or the Perl community more generally to support indefinitely any
accidental features that any Perl programmers have ever stumbled upon.  Hence,
I am hoping that the pumpking will rule in favor of not restoring accidental
features to the Perl 5 debugger.