I've figured this bug out. I'm thinking out loud here trying to figure out what could be better about this debugging experience because this seemed way more circuitous (and time-consuming) than it should have been.
In my test suite, I have:
let testctx = test_setup().await.external_api;where the return value of test_setup().await is a struct into which I've stashed an HTTP server that I had successfully started (but it's not in the external_api field). The behavior I observed was that when I went to make a request against that HTTP server, I got ECONNREFUSED.
(I first confirmed with Wireshark that the client was connecting to the correct IP and port of the server I started, and also confirmed that the server was successfully responding to requests (by inserting a sleep after starting the server and hitting it with curl). So far, this is mostly reasonable.)
I then checked the log from the test and saw that the server was initiating a graceful shutdown early in the test process. I sort of suspected RAII + accidentally letting something go out of scope was causing this. I figured this should be completely obvious from a stack trace and/or core file. I saved a core file where the log message was being printed (using dtrace), but this was useless because the logging is async. (I already have a TODO to fix this.)
At this point I looked briefly at rust-usdt, which seems to be a thing, and would have helped, but since I had to change the program anyway to put a probe in, I just put a panic at the point where we generate the log message. Like I mentioned in chat, I found that there's no way to configure panic to generate a core file in the test suite. I did get a stack trace, from which I cannot for the life of me figure out where the code is that's triggering the shutdown.
I gave up and fell back to guessing, which really feels like defeat. Turns out it was easy enough to guess: I gather that because I was never assigning the result of test_setup().await, the compiler figured it could move the external_api field out of it, drop the containing struct -- including the other fields in the object, including my HTTP server, shutting it down.
There are a bunch of unfortunate things here:
- I need to fix logging in this program to not be async, for this and other reasons.
- It should would be nice to have USDT probes at log points like we did in Node!
- No core files from panic in tests?!
- For the HTTP server: there are some objects where it'd be nice to be able to say "I'll manage the lifecycle explicitly, thanks", if for no other reason than to assert program correctness. Basically, I don't want it to just shut down because it went out of scope -- I want to have to close it myself.
- The stack trace seemed to leave a lot to be desired.
Here's the stack trace:
thread 'test_disks' panicked at 'dropshot server shutting down without invoking close(): RecvError(())', dropshot/src/server.rs:144:13
stack backtrace:
0: backtrace::backtrace::libunwind::trace
at /Users/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/libunwind.rs:88
1: backtrace::backtrace::trace_unsynchronized
at /Users/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/mod.rs:66
2: std::sys_common::backtrace::_print_fmt
at src/libstd/sys_common/backtrace.rs:77
3: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
at src/libstd/sys_common/backtrace.rs:59
4: core::fmt::write
at src/libcore/fmt/mod.rs:1052
5: std::io::Write::write_fmt
at src/libstd/io/mod.rs:1426
6: std::sys_common::backtrace::_print
at src/libstd/sys_common/backtrace.rs:62
7: std::sys_common::backtrace::print
at src/libstd/sys_common/backtrace.rs:49
8: std::panicking::default_hook::{{closure}}
at src/libstd/panicking.rs:204
9: std::panicking::default_hook
at src/libstd/panicking.rs:224
10: std::panicking::rust_panic_with_hook
at src/libstd/panicking.rs:472
11: rust_begin_unwind
at src/libstd/panicking.rs:380
12: std::panicking::begin_panic
13: std::panicking::begin_panic
14: core::result::Result<T,E>::expect
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libcore/result.rs:991
15: dropshot::server::HttpServer::new::{{closure}}
at dropshot/src/server.rs:144
16: <std::future::GenFuture<T> as core::future::future::Future>::poll
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libstd/future.rs:43
17: <hyper::server::shutdown::Graceful<I,S,F,E> as core::future::future::Future>::poll
at /Users/dap/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.13.4/src/server/shutdown.rs:71
18: <core::pin::Pin<P> as core::future::future::Future>::poll
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libcore/future/future.rs:119
19: std::future::poll_with_tls_context
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libstd/future.rs:98
20: dropshot::server::HttpServer::run::{{closure}}
at dropshot/src/server.rs:99
21: <std::future::GenFuture<T> as core::future::future::Future>::poll
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libstd/future.rs:43
22: tokio::task::core::Core<T>::poll
at /Users/dap/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.13/src/task/core.rs:128
23: tokio::task::harness::Harness<T,S>::poll::{{closure}}::{{closure}}
at /Users/dap/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.13/src/task/harness.rs:119
24: core::ops::function::FnOnce::call_once
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libcore/ops/function.rs:232
25: <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libstd/panic.rs:318
26: std::panicking::try::do_call
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libstd/panicking.rs:305
27: __rust_maybe_catch_panic
at src/libpanic_unwind/lib.rs:86
28: std::panicking::try
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libstd/panicking.rs:281
29: std::panic::catch_unwind
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libstd/panic.rs:394
30: tokio::task::harness::Harness<T,S>::poll::{{closure}}
at /Users/dap/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.13/src/task/harness.rs:100
31: tokio::loom::std::causal_cell::CausalCell<T>::with_mut
at /Users/dap/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.13/src/loom/std/causal_cell.rs:41
32: tokio::task::harness::Harness<T,S>::poll
at /Users/dap/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.13/src/task/harness.rs:99
33: tokio::task::raw::poll
at /Users/dap/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.13/src/task/raw.rs:162
34: tokio::task::raw::RawTask::poll
at /Users/dap/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.13/src/task/raw.rs:113
35: tokio::task::Task<S>::run
at /Users/dap/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.13/src/task/mod.rs:381
36: tokio::runtime::basic_scheduler::SchedulerPriv::tick
at /Users/dap/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.13/src/runtime/basic_scheduler.rs:192
37: tokio::runtime::basic_scheduler::BasicScheduler<P>::block_on
at /Users/dap/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.13/src/runtime/basic_scheduler.rs:142
38: tokio::runtime::Runtime::block_on::{{closure}}
at /Users/dap/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.13/src/runtime/mod.rs:413
39: tokio::runtime::context::enter
at /Users/dap/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.13/src/runtime/context.rs:72
40: tokio::runtime::handle::Handle::enter
at /Users/dap/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.13/src/runtime/handle.rs:34
41: tokio::runtime::Runtime::block_on
at /Users/dap/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.13/src/runtime/mod.rs:410
42: test_disks::test_disks
at tests/test_disks.rs:44
43: test_disks::test_disks::{{closure}}
at tests/test_disks.rs:44
44: core::ops::function::FnOnce::call_once
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libcore/ops/function.rs:232
45: <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/liballoc/boxed.rs:1015
46: __rust_maybe_catch_panic
at src/libpanic_unwind/lib.rs:86
47: std::panicking::try
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libstd/panicking.rs:281
48: std::panic::catch_unwind
at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libstd/panic.rs:394
49: test::run_test_in_process
at src/libtest/lib.rs:539
50: test::run_test::run_test_inner::{{closure}}
at src/libtest/lib.rs:452
My test is at frame 42. I think what happened is that in that function, the compiler generated a call to drop, which closed a sending channel, which caused a future that was waiting for a message on that channel to complete -- that's what's happening at frame 15. So the context where the drop happened is lost. This is a pretty classic async/await debugging issue.
Also noteworthy:
tests/test_disks.rs:44appears twice, once astest_disksand once astest_disks::{{closure}}because the#[tokio::test]macro wraps our async function with a sync function that creates a single-thread executor and usesblock_onto run our function.