I've figured this bug out. I'm thinking out loud here trying to figure out what could be better about this debugging experience because this seemed way more circuitous (and time-consuming) than it should have been.
In my test suite, I have:
let testctx = test_setup().await.external_api;where the return value of test_setup().await is a struct into which I've stashed an HTTP server that I had successfully started (but it's not in the external_api field). The behavior I observed was that when I went to make a request against that HTTP server, I got ECONNREFUSED.
(I first confirmed with Wireshark that the client was connecting to the correct IP and port of the server I started, and also confirmed that the server was successfully responding to requests (by inserting a sleep after starting the server and hitting it with curl). So far, this is mostly reasonable.)
I then checked the log from the test and saw that the server was initiating a graceful shutdown early in the test process. I sort of suspected RAII + accidentally letting something go out of scope was causing this. I figured this should be completely obvious from a stack trace and/or core file. I saved a core file where the log message was being printed (using dtrace), but this was useless because the logging is async. (I already have a TODO to fix this.)
At this point I looked briefly at rust-usdt, which seems to be a thing, and would have helped, but since I had to change the program anyway to put a probe in, I just put a panic at the point where we generate the log message. Like I mentioned in chat, I found that there's no way to configure panic to generate a core file in the test suite. I did get a stack trace, from which I cannot for the life of me figure out where the code is that's triggering the shutdown.
I gave up and fell back to guessing, which really feels like defeat. Turns out it was easy enough to guess: I gather that because I was never assigning the result of test_setup().await, the compiler figured it could move the external_api field out of it, drop the containing struct -- including the other fields in the object, including my HTTP server, shutting it down.
There are a bunch of unfortunate things here:
- I need to fix logging in this program to not be async, for this and other reasons.
- It should would be nice to have USDT probes at log points like we did in Node!
- No core files from panic in tests?!
- For the HTTP server: there are some objects where it'd be nice to be able to say "I'll manage the lifecycle explicitly, thanks", if for no other reason than to assert program correctness. Basically, I don't want it to just shut down because it went out of scope -- I want to have to close it myself.
- The stack trace seemed to leave a lot to be desired.
Here's the stack trace:
thread 'test_disks' panicked at 'dropshot server shutting down without invoking close(): RecvError(())', dropshot/src/server.rs:144:13
stack backtrace:
0: backtrace::backtrace::libunwind::trace
at /Users/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/libunwind.rs:88
1: backtrace::backtrace::trace_unsynchronized
at /Users/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/mod.rs:66
2: std::sys_common::backtrace::_print_fmt
at src/libstd/sys_common/backtrace.rs:77
3: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
at src/libstd/sys_common/backtrace.rs:59
4: core::fmt::write
at src/libcore/fmt/mod.rs:1052
5: std::io::Write::write_fmt
at src/libstd/io/mod.rs:1426
6: std::sys_common::backtrace::_print
at src/libstd/sys_common/backtrace.rs:62
7: std::sys_common::backtrace::print
at src/libstd/sys_common/backtrace.rs:49
8: std::panicking::default_hook::{{closure}}
at src/libstd/panicking.rs:204
9: std::panicking::default_hook
at src/libstd/panicking.rs:224
10: std::panicking::rust_panic_with_hook
at src/libstd/panicking.rs:472
11: rust_begin_unwind
at src/libstd/panicking.rs:380
12: std::panicking::begin_panic
13: std::panicking::begin_panic
14: core::result::Result<T,E>::expect
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libcore/result.rs:991
15: dropshot::server::HttpServer::new::{{closure}}
at dropshot/src/server.rs:144
16: <std::future::GenFuture<T> as core::future::future::Future>::poll
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libstd/future.rs:43
17: <hyper::server::shutdown::Graceful<I,S,F,E> as core::future::future::Future>::poll
at /Users/dap/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.13.4/src/server/shutdown.rs:71
18: <core::pin::Pin<P> as core::future::future::Future>::poll
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libcore/future/future.rs:119
19: std::future::poll_with_tls_context
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libstd/future.rs:98
20: dropshot::server::HttpServer::run::{{closure}}
at dropshot/src/server.rs:99
21: <std::future::GenFuture<T> as core::future::future::Future>::poll
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libstd/future.rs:43
22: tokio::task::core::Core<T>::poll
at /Users/dap/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.13/src/task/core.rs:128
23: tokio::task::harness::Harness<T,S>::poll::{{closure}}::{{closure}}
at /Users/dap/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.13/src/task/harness.rs:119
24: core::ops::function::FnOnce::call_once
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libcore/ops/function.rs:232
25: <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libstd/panic.rs:318
26: std::panicking::try::do_call
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libstd/panicking.rs:305
27: __rust_maybe_catch_panic
at src/libpanic_unwind/lib.rs:86
28: std::panicking::try
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libstd/panicking.rs:281
29: std::panic::catch_unwind
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libstd/panic.rs:394
30: tokio::task::harness::Harness<T,S>::poll::{{closure}}
at /Users/dap/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.13/src/task/harness.rs:100
31: tokio::loom::std::causal_cell::CausalCell<T>::with_mut
at /Users/dap/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.13/src/loom/std/causal_cell.rs:41
32: tokio::task::harness::Harness<T,S>::poll
at /Users/dap/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.13/src/task/harness.rs:99
33: tokio::task::raw::poll
at /Users/dap/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.13/src/task/raw.rs:162
34: tokio::task::raw::RawTask::poll
at /Users/dap/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.13/src/task/raw.rs:113
35: tokio::task::Task<S>::run
at /Users/dap/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.13/src/task/mod.rs:381
36: tokio::runtime::basic_scheduler::SchedulerPriv::tick
at /Users/dap/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.13/src/runtime/basic_scheduler.rs:192
37: tokio::runtime::basic_scheduler::BasicScheduler<P>::block_on
at /Users/dap/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.13/src/runtime/basic_scheduler.rs:142
38: tokio::runtime::Runtime::block_on::{{closure}}
at /Users/dap/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.13/src/runtime/mod.rs:413
39: tokio::runtime::context::enter
at /Users/dap/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.13/src/runtime/context.rs:72
40: tokio::runtime::handle::Handle::enter
at /Users/dap/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.13/src/runtime/handle.rs:34
41: tokio::runtime::Runtime::block_on
at /Users/dap/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.13/src/runtime/mod.rs:410
42: test_disks::test_disks
at tests/test_disks.rs:44
43: test_disks::test_disks::{{closure}}
at tests/test_disks.rs:44
44: core::ops::function::FnOnce::call_once
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libcore/ops/function.rs:232
45: <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/liballoc/boxed.rs:1015
46: __rust_maybe_catch_panic
at src/libpanic_unwind/lib.rs:86
47: std::panicking::try
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libstd/panicking.rs:281
48: std::panic::catch_unwind
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libstd/panic.rs:394
49: test::run_test_in_process
at src/libtest/lib.rs:539
50: test::run_test::run_test_inner::{{closure}}
at src/libtest/lib.rs:452
My test is at frame 42. I think what happened is that in that function, the compiler generated a call to drop, which closed a sending channel, which caused a future that was waiting for a message on that channel to complete -- that's what's happening at frame 15. So the context where the drop happened is lost. This is a pretty classic async/await debugging issue.
With @cbiffle's help, I've got a better understanding of what likely happened.
drop.)rx.awaitreturned with an error indicating the sender was dropped, but at this point the cause (thedropcall) isn't on the stack any moreThis explains why
dropisn't on the stack, and it's a good example of how even with long stack traces in async/await, debugging can still be harder than with synchronous code.