Skip to content

Instantly share code, notes, and snippets.

@pervognsen
Last active August 5, 2021 07:58
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pervognsen/a002735fd626a481222e5204ac15b11e to your computer and use it in GitHub Desktop.
Save pervognsen/a002735fd626a481222e5204ac15b11e to your computer and use it in GitHub Desktop.
// For N = 11, a call to perms(p, N, 0) only takes 0.36 seconds.
// Why would a debugger seemingly hang forever when you try to
// step over the outermost recursive call to perms in the loop?
// Hint: How is "step over" ("next" in gdb/lldb) implemented?
u64 perms(u64 p[N], int n, u64 s) {
if (n == 1) {
s += signature(p);
} else {
for (int i = 0; i < n; i++) {
swap(p[0], p[i]);
s = perms(p + 1, n - 1, s);
swap(p[0], p[i]);
}
}
return s;
}
@aolo2
Copy link

aolo2 commented Jul 16, 2021

There are at least two ways I know of [how "next" is implemented], but neither should hang on this.

The first method (used by lldb IIRC) involves actually interperting the machine code yourself (in lldb they convert it to LLVM IR and run that) to get the list of breakpoint locations. I always assumed that the point of this approach is to analyze the IR somehow, but that is as far as my understanding goes.

The second method (used by RemedyBG) is just to do singlesteps until the line number changes. But George explicitly mentioned that he skips over call, rep xxx etc (skipping over those is safe in terms of determining the line number as far as I understand).

@pervognsen
Copy link
Author

pervognsen commented Jul 16, 2021

I wrote "seemingly hang forever" but it's "just" intractably slow. For N = 12 there are about a hundred million recursive returns to the return address but only one of them corresponds to the outer call you wanted to step over. There are different ways you can implement the return filtering in a debugger but the most common technique I've seen is to put a breakpoint on the return address after the call. When the breakpoint triggers, the debugger will check the stack pointer against the value that was captured before entering the step-over call. If the pointers don't match, it means the breakpoint triggered for a nested return instead of the outer return. Thus the debugger asks the program to resume execution with the breakpoint still in place. If the stack pointer does match, it removes the breakpoint and hands control to the debugger user interface.

@aolo2
Copy link

aolo2 commented Jul 16, 2021

Oh, I see. That leads me to believe that "smart" singlestepping should have no problem with this. Lemme run remedy and check.

upd: nah, it's just as bad if not worse. I've got a long way to go with my understanding it seems..

@nakst
Copy link

nakst commented Jul 16, 2021

This gets really annoying sometimes when debugging remote targets.

Maybe if compilers modified the return address after returning from a function, e.g.

call some_function
push 0
add rsp,8

then debuggers could set a watch breakpoint to detect function exit?

@pervognsen
Copy link
Author

pervognsen commented Jul 17, 2021

You can generate a little return shim whose address you push on the stack before starting execution at the step-over call target's address. The shim just does int3 but has unwind info, a proper prologue and whatever else is needed so that generic platform-compliant stack trace logic will be able to walk past it while your debugger's stack tracer can hide it from the user. Another side benefit of this approach (which I haven't seen any debugger do) is that it will handle longjmp/non-local exits more sanely when you longjmp past a step-over stack pointer mark.

@aolo2
Copy link

aolo2 commented Jul 26, 2021

Ok, so I got a little further with my understanding of this (implemented the "effectively hangs" version myself), and I have a further question if that's ok.

Regarding Wons comments on "implementing CFA and strategically placing traps": is it worth it at all? My reasoning is as follows: you want to do a single step, which effectively means executing all instructions on a single line of source code. But how many instructions could that be, if we skip over call and rep whatevs? I mean yes, you could put all your code on one line, but in a real situation, how many instructions would a single source line map to? A few dozen?

That's is my argument for "single stepping until the heat death of the universe".

What are your thoughts on this?

@pervognsen
Copy link
Author

I've never bothered doing CFA but the strongest case for something like that is implementing efficient "step over" for inlined calls. I haven't thought it through in detail, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment