EDIT: Finally got around to making a blog so the canonical location of this post is now here.
$ cargo build --example basic --features usdt-probes
[...snip...]
error: could not compile `dropshot`
Caused by:
process didn't exit successfully: `rustc [...snip...]` (signal: 11, SIGSEGV: invalid memory reference)
Achievement unlocked: rustc
segfault.
Stack trace
fffffc7fce3fcbc0 librustc_driver-77cef3efbfa7284c.so`llvm::BranchProbabilityInfo::computeEestimateBlockWeight(llvm::Function const&, llvm::DominatorTree*, llvm::PostDominatorTree*)+0xd84()
fffffc7fce3fd370 librustc_driver-77cef3efbfa7284c.so`llvm::BranchProbabilityInfo::calculate(llvm::Function const&, llvm::LoopInfo const&, llvm::TargetLibraryInfo const*, llvm::DominatorTree*, llvm::PostDominatorTree*)+0x131()
fffffc7fce3fd3c0 librustc_driver-77cef3efbfa7284c.so`llvm::BranchProbabilityAnalysis::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&)+0x134()
fffffc7fce3fd5f0 librustc_driver-77cef3efbfa7284c.so`llvm::detail::AnalysisPassModel<llvm::Function, llvm::BranchProbabilityAnalysis, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>::Invalidator>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&)+0x2f()
fffffc7fce3fd6a0 librustc_driver-77cef3efbfa7284c.so`llvm::AnalysisManager<llvm::Function>::getResultImpl(llvm::AnalysisKey*, llvm::Function&)+0x2de()
fffffc7fce3fd6d0 librustc_driver-77cef3efbfa7284c.so`llvm::BlockFrequencyAnalysis::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&)+0x3f()
fffffc7fce3fd710 librustc_driver-77cef3efbfa7284c.so`llvm::detail::AnalysisPassModel<llvm::Function, llvm::BlockFrequencyAnalysis, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>::Invalidator>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&)+0x26()
fffffc7fce3fd7c0 librustc_driver-77cef3efbfa7284c.so`llvm::AnalysisManager<llvm::Function>::getResultImpl(llvm::AnalysisKey*, llvm::Function&)+0x2de()
fffffc7fce3fdc30 librustc_driver-77cef3efbfa7284c.so`llvm::AlwaysInlinerPass::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&)+0xa2c()
fffffc7fce3fdc50 librustc_driver-77cef3efbfa7284c.so`llvm::detail::PassModel<llvm::Module, llvm::AlwaysInlinerPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&)+0x15()
fffffc7fce3fddc0 librustc_driver-77cef3efbfa7284c.so`llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&)+0x4b5()
fffffc7fce3ff170 librustc_driver-77cef3efbfa7284c.so`LLVMRustOptimizeWithNewPassManager+0x7f2()
fffffc7fce3ff3a0 librustc_driver-77cef3efbfa7284c.so`rustc_codegen_llvm::back::write::optimize_with_new_llvm_pass_manager+0x372()
fffffc7fce3ff5b0 librustc_driver-77cef3efbfa7284c.so`rustc_codegen_llvm::back::write::optimize+0x388()
fffffc7fce3ff900 librustc_driver-77cef3efbfa7284c.so`rustc_codegen_ssa::back::write::execute_work_item::<rustc_codegen_llvm::LlvmCodegenBackend>+0x1f3()
fffffc7fce3ffdb0 librustc_driver-77cef3efbfa7284c.so`std::sys_common::backtrace::__rust_begin_short_backtrace::<<rustc_codegen_llvm::LlvmCodegenBackend as rustc_codegen_ssa::traits::backend::ExtraBackendMethods>::spawn_named_thread<rustc_codegen_ssa::back::write::spawn_work<rustc_codegen_llvm::LlvmCodegenBackend>::{closure#0}, ()>::{closure#0}, ()>+0xf7()
fffffc7fce3fff60 librustc_driver-77cef3efbfa7284c.so`<<std::thread::Builder>::spawn_unchecked_<<rustc_codegen_llvm::LlvmCodegenBackend as rustc_codegen_ssa::traits::backend::ExtraBackendMethods>::spawn_named_thread<rustc_codegen_ssa::back::write::spawn_work<rustc_codegen_llvm::LlvmCodegenBackend>::{closure#0}, ()>::{closure#0}, ()>::{closure#1} as core::ops::function::FnOnce<()>>::call_once::{shim:vtable#0}+0xa9()
fffffc7fce3fffb0 libstd-ef15f81a900bedf3.so`std::sys::unix::thread::Thread::new::thread_start::h24133bfe318082b5+0x27()
fffffc7fce3fffe0 libc.so.1`_thrp_setup+0x6c(fffffc7fed642280)
fffffc7fce3ffff0 libc.so.1`_lwp_start()
Ok, so we're faulting somewhere in LLVM it seems like. From Cliff's initial investigation:
Anyway, yeah, something about the CFG construction there is generating either an empty basic block or a basic block ending in an unexpected type of instruction (something that is not an LLVM IR terminator instruction) and triggering https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/IR/BasicBlock.h#L121
First order of business then is to just check if the IR is valid. LLVM has a pass to do just that and we can ask rustc
to run it first by passing -Z verify-llvm-ir=yes
(note we need to switch to nightly to use -Z
flags):
$ RUSTFLAGS="-Z verify-llvm-ir=yes" cargo +nightly build --example basic --features usdt-probes
Haha, nope:
Basic Block in function '_ZN8dropshot6server24http_request_handle_wrap28_$u7b$$u7b$closure$u7d$$u7d$17h503b14ddd4edd1deE' does not have terminator!
label %bb24
LLVM ERROR: Broken module found, compilation aborted!
# Demangle w/ rustfilt (c++filt works well enough too)
# Single quotes important to not misinterpret $ as shell vars!
$ rustfilt '_ZN8dropshot6server24http_request_handle_wrap28_$u7b$$u7b$closure$u7d$$u7d$17h503b14ddd4edd1deE'
dropshot::server::http_request_handle_wrap::{{closure}}
The IR generated for a closure in dropshot::server::http_request_handle_wrap
is invalid—some basic block is missing a terminator.
Ok, is it rustc generating the bad IR directly or the result of some transformation pass miscompiling it?
But first, let's cheat and just get the final failing rustc
command so we don't need to rebuild all the deps anytime we change RUSTFLAGS
. Re-running the failing cargo
command should just output the failing rustc
invocation:
$ cargo +nightly build --example basic --features usdt-probes
Compiling dropshot v0.6.1-dev (/src/dropshot/dropshot)
error: could not compile `dropshot`
Caused by:
process didn't exit successfully: `rustc [...snip...]` (signal: 11, SIGSEGV: invalid memory reference)
From this point we can just directly run the rustc
command as outputted with a few modifications:
- add
+nightly
otherwise therustc
wrapper will attempt to use the rust version mentioned inrust-toolchain.toml
- remove the
--error-format=json
and--json=...
flags for human-readable output - add
-Z verify-llvm-ir=yes
- change the
--emit
argument to--emit=llvm-ir
because that should be enough to trigger the issue and we'd like to look at the IR later
Stick this in a simple shell script to easily modify it and run it; call it repro.sh
. Verify it still fails as expected:
$ ./repro.sh
Basic Block in function '_ZN8dropshot6server24http_request_handle_wrap28_$u7b$$u7b$closure$u7d$$u7d$17h503b14ddd4edd1deE' does not have terminator!
label %bb24
LLVM ERROR: Broken module found, compilation aborted!
Now back to figuring out where this invalid IR is coming from. Even though we're doing a debug build, there are still some LLVM passes that get run. So if we want to verify the IR that rustc
directly generated, we need to make sure no LLVM passes are run at all (aside from the verify
pass itself). The way to do that is via -C no-prepopulate-passes
so let's edit our repro.sh
and run it again:
$ ./repro.sh
$ echo $?
0
Ok rustc
has been proven innocent. Looks like some LLVM pass generates invalid IR which really shouldn't happen!
Well, now what? Let's try to find out what pass is responsible!
Our first attempt is by asking LLVM to print the IR after each pass—maybe we'll get lucky and see the offending pass last. We do this by modifying repro.sh
again:
- remove
-C no-prepopulate-passes
&-Z verify-llvm-ir=yes
- add
-C llvm-args=--print-after-all
to print the IR after every pass - add
-C codegen-units=1 -Z no-parallel-llvm
to make the output a bit more readable
Alas, this doesn't go the way we want as we get the same segfault as before without any of the actual output we wanted :(
Ok, new attempt. Let's skip rustc
and see if we can just invoke the LLVM machinery directly via opt
. For that, let's first install it:
$ rustup component add --toolchain nightly llvm-tools-preview
It is not the most discoverable because it just gets plopped somewhere into rustc
's sysroot directory:
$ OPT=$(find $(rustc +nightly --print sysroot) -name opt)
We also need the actual IR to pass to opt
so let's go back and modify our repro.sh
to only pass -C no-prepopulate-passes
. We should find our initial rustc
generated IR. It's also worth remove the -C debuginfo=2
to make the IR a bit smaller:
$ ls ./target/debug/examples/basic*.ll
./target/debug/examples/basic-5f5f0491fbb5b7d3.ll
Let's try something simple first and just run the IR through opt
without any flags as a smoke test:
$ $OPT ./target/debug/examples/basic-5f5f0491fbb5b7d3.ll
opt: ./target/debug/examples/basic-5f5f0491fbb5b7d3.ll:425470:1: error: expected instruction opcode
bb25: ; preds = %bb24
^
😐 Wat. Looking at the IR around that line, we find this:
bb24:
; [...snip...]
%186 = invoke i64 asm sideeffect inteldialect "990: clr rax\0A\0A .pushsection set_dtrace_probes,\22aw\22,\22progbits\22\0A .balign 8\0A 991:\0A .4byte 992f-991b // length\0A .byte 1\0A .byte 0\0A .2byte 1\0A .8byte 990b // address\0A .asciz \22dropshot\22\0A .asciz \22request-start\22\0A // null-terminated strings for each argument\0A .balign 8\0A 992: .popsection\0A \0A .pushsection yeet_dtrace_probes\0A .8byte 991b\0A .popsection\0A \0A ", "=&{ax}"() #23
to label %bb25 unwind label %cleanup26, !srcloc !38
store i64 %186, i64* %is_enabled, align 8
bb25: ; preds = %bb24
%_78 = load i64, i64* %is_enabled, align 8
%187 = icmp eq i64 %_78, 0
br i1 %187, label %bb46, label %bb26
Well that looks awfully like the error the LLVM IR verifier was telling us about (%bb24
not having a terminator)!
So looks like our assumption about rustc
not being the one generating valid IR is wrong. Where did we go wrong?
Some light digging into the rustc
source reveals that using the new LLVM pass manager (default for LLVM >= 13 thus Rust >= 1.56) means -Z verify-llvm-ir=yes
is ignored when combined with -C no-prepopulate-passes
. Whelp :/
(To be clear, it's not really the new pass manager's fault but rather the way it is setup in LLVMRustOptimizeWithNewPassManager
).
So we've found one (minor) rustc
bug so far but that doesn't help solve our original question. No worries, we can either switch to the old pass manager (-Z new-llvm-pass-manager=no
) or just manually add the verifier pass (-C passes="verify"
), either way we get the same ol error:
$ ./repro.sh
Basic Block in function '_ZN8dropshot6server24http_request_handle_wrap28_$u7b$$u7b$closure$u7d$$u7d$17h503b14ddd4edd1deE' does not have terminator!
label %bb24
LLVM ERROR: Broken module found, compilation aborted!
This brings us back to the actual culprit: rustc
!
(We really should have suspected this after the brief foray with trying to print the resulting IR after every LLVM pass didn't give us anything: the IR we feed it was botched to begin with!).
So back to the invalid IR: at the end of bb24
we are using invoke
(a terminator) with our inline assembly from the usdt probes in dropshot
followed by a store
instruction. Clearly this is wrong because store
isn't a terminator and thus we shouldn't end a basic block with it. Let's see what the corresponding MIR (Rust's Mid-level IR) looks like.
Since the failing code is coming from dropshot
and not the basic example itself, we can't use our repro.sh
hack and so back we go to cargo
and RUSTFLAGS
:
Using -Z dump-mir='http_request_handle_wrap'
:
$ RUSTFLAGS="-Z dump-mir='http_request_handle_wrap'" cargo +nightly build --example basic --features usdt-probes
[...snip...]
(signal: 11, SIGSEGV: invalid memory reference)
$ ls mir_dump
mir_dump: No such file or directory
Ok, that isn't working as expected (-Z dump-mir=F
should print just the MIR for functions matches the filter F
and place it in a mir_dump
folder). A bit annoying and we can only shave so many yaks right now but nothing a bigger hammer can't fix (just use --emit=mir
to dump out all the mir into the target folder and find the corresponding one for dropshot
):
$ RUSTFLAGS="--emit=mir" cargo +nightly build --example basic --features usdt-probes
$ ls target/debug/deps/dropshot-*.mir
target/debug/deps/dropshot-30d947b7471013cc.mir
Ok, now this looks like reasonable:
bb24: {
[...snip...]
asm!("990: clr rax
.pushsection set_dtrace_probes,\"aw\",\"progbits\"
.balign 8
991:
.4byte 992f-991b // length
.byte 1
.byte 0
.2byte 1
.8byte 990b // address
.asciz \"dropshot\"
.asciz \"request-start\"
// null-terminated strings for each argument
.balign 8
992: .popsection
.pushsection yeet_dtrace_probes
.8byte 991b
.popsection
", out("ax") _77, options(NOMEM | PRESERVES_FLAGS | NOSTACK)) -> [return: bb25, unwind: bb217]; // scope 10 at dropshot/src/lib.rs:581:1: 581:41
}
bb25: {
_78 = _77; // scope 9 at dropshot/src/lib.rs:581:1: 581:41
switchInt(move _78) -> [0_u64: bb46, otherwise: bb26]; // scope 9 at dropshot/src/lib.rs:581:1: 581:41
}
Note in MIR, asm
itself is a terminator and so bb24
here appropriately says that under normal control flow to go to bb25
or if unwinding go to bb217
. In bb25
we see a simple statement, _78 = _77;
, which is assigning the output (_77
i.e. is_enabled
) from the asm
and this should correspond to the store
we saw in the LLVM IR.
So looks like something in the lowering from Rust MIR to LLVM IR is not quite right. Just eyeballing it, it seems like the store of the asm output is getting added to the wrong LLVM basic block. Wherein it should be part of the "normal" basic block taken by the invoke
(indicated by the to label %bb25
argument), instead it is incorrectly placed directly after the invoke
.
That seems to happens here in rustc
.
Now that we have a pretty good idea of why things fail we should be able to make a smaller repro (*):
#![feature(asm_unwind)]
fn main() {
let _x = String::from("string here just cause we need something with a non-trivial drop");
let foo: u64;
unsafe {
std::arch::asm!(
"mov {}, 1",
out(reg) foo,
options(may_unwind)
);
}
println!("{}", foo);
}
$ rustc +nightly asm-miscompile.rs
[1] 6057 segmentation fault (core dumped) rustc +nightly asm-miscompile.rs
Cool, we've got our smaller repro segfaulting, but is it the same issue? Let's take a look at what the LLVM IR says:
$ rustc +nightly asm-miscompile.rs --emit=llvm-ir -C no-prepopulate-passes
$ less asm-miscompile.ll
[...snip...]
bb1: ; preds = %start
%1 = invoke i64 asm sideeffect alignstack inteldialect unwind "mov ${0:q}, 1", "=&r,~{dirflag},~{fpsr},~{flags},~{memory}"()
to label %bb2 unwind label %cleanup, !srcloc !9
store i64 %1, i64* %foo, align 8
bb2:
[...snip...]
Would you look at that, we've got a store
as the last instruction in the basic block with the asm
.
Now that we have a small repro and know roughly where the issue is in rustc
we can try fixing it:
diff --git a/compiler/rustc_codegen_llvm/src/asm.rs b/compiler/rustc_codegen_llvm/src/asm.rs
index 03c390b4bd4..91d132eb343 100644
--- a/compiler/rustc_codegen_llvm/src/asm.rs
+++ b/compiler/rustc_codegen_llvm/src/asm.rs
@@ -290,6 +290,11 @@ fn codegen_inline_asm(
}
attributes::apply_to_callsite(result, llvm::AttributePlace::Function, &{ attrs });
+ // Switch to the 'normal' basic block if we did an `invoke` instead of a `call`
+ if let Some((dest, _, _)) = dest_catch_funclet {
+ self.switch_to_block(dest);
+ }
+
// Write results to outputs
for (idx, op) in operands.iter().enumerate() {
if let InlineAsmOperandRef::Out { reg, place: Some(place), .. }
A little waiting later and we can try compiling our small repro again:
$ rustc +stage1 asm-miscompile.rs
$ echo $?
0
🎉 Success! So what does the LLVM IR look like now?
$ rustc +stage1 asm-miscompile.rs --emit=llvm-ir -C no-prepopulate-passes
$ less asm-miscompile.ll
[...snip...]
bb1: ; preds = %start
%1 = invoke i64 asm sideeffect alignstack inteldialect unwind "mov ${0:q}, 1", "=&r,~{dirflag},~{fpsr},~{flags},~{memory}"()
to label %bb2 unwind label %cleanup, !srcloc !9
bb2:
store i64 %1, i64* %foo, align 8
[...snip...]
The store has moved down into %bb2
right where it should be.
(*) Note: we had to add options(may_unwind)
and an unused String
variable to actually get it to fail. Removing either of those will stop it from segfaulting. The difference being, the LLVM IR that rustc
generates. Without both pieces, rustc
just uses a simple call
instruction for the inline asm whereas in the broken case, it's using invoke
which is considered a terminator unlike call
. After the invoke
, the control transfer goes to either the 'normal' label or the 'unwind' label. By marking our asm with options(may_unwind)
we essentially tell rustc
to opt our inline assembly into participating in unwinding. The unused string is there so that there's actually something to cleanup in the case that we do unwind.
But, looking at the asm from the original failing code in dropshot
there's no mention of MAY_UNWIND
:
asm!("[...snip...]", out("ax") _77, options(NOMEM | PRESERVES_FLAGS | NOSTACK)) -> [return: bb25, unwind: bb217]; // scope 10 at dropshot/src/lib.rs:581:1: 581:41
usdt certainly isn't adding it so what gives?
Poking through rustc
, it seems like that may be due to the way MIR inlining is implemented. A cleanup block may be assigned if a terminator (like inline asm) gets inlined. The lowering pass will use the presence of such a cleanup target to ultimately decide whether to use call
or invoke
.
But that theory is quickly shot down because the MIR inliner is still disabled by default.
Just searching through rustc
for other places the unwind target of a terminator might be set yields something promising when it comes to generators. This would track as the original dropshot
failure case is in the context of an async method. Further confirmation that this is where our InlineAsm terminator is getting an unwind target set is that the posion_block
mentioned in that bit of code lines up. It has a single statement and if we hop back to the MIR of the failing dropshot
example, lo and behold:
asm!("[...snip...]", out("ax") _77, options(NOMEM | PRESERVES_FLAGS | NOSTACK)) -> [return: bb25, unwind: bb217];
[...snip...]
bb217 (cleanup): {
discriminant((*(_1.0: &mut [static generator@dropshot/src/server.rs:651:43: 741:2]))) = 2; // scope 0 at dropshot/src/server.rs:651:43: 741:2
resume; // scope 0 at dropshot/src/server.rs:651:43: 741:2
}
bb217
there is the unwind target and it matches exactly with the poison_block
as constructed in the rustc.
Armed with some new evidence, we can adapt our simple repro to more closely match the async scenario encountered in dropshot
:
extern crate futures; // 0.3.21
async fn bar() {
let foo: u64;
unsafe {
std::arch::asm!(
"mov {}, 1",
out(reg) foo,
);
}
println!("{}", foo);
}
fn main() {
futures::executor::block_on(bar());
}
(Segfaults on playground.)
Thus all the mysteries are solved:
- the MIR -> LLVM IR lowering for inline assembly outputted invalid LLVM IR when generated with an
invoke
instruction (Fix submitted here). - every
async fn
in rust is implemented as a generator and as part of that, terminators in the basic blocks of such a function are modified to include a cleanup target if they can unwind (*) so as to poison the generator (**). - We also discovered a bug with
-Z verify-llvm-ir=yes
and-C no-prepopulate-passes
along the way (Fix submitted here).
(*) note the segfault in the above playground goes away if you remove the println!
from foo
because that is the only part that may actually unwind.
(**) if you've ever seen a panic saying "async fn
resumed after panicking", this is why.
Update: The above bugs are fixed as of rustc 1.62.0-nightly (52ca603da 2022-04-12)
.
This really really really need a TL;DR at the top. I didn't follow how it's was rust fault if llvm segfault even with "wrong" input it's llvm that segfault not rustc, but I really understand half you write it's was just too much detail for my curiosity I skip most of it ^^