Skip to content

Instantly share code, notes, and snippets.

@jlevon
Last active April 15, 2024 08:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jlevon/207b20ebac1ffe597d44d6a7ba53f289 to your computer and use it in GitHub Desktop.
Save jlevon/207b20ebac1ffe597d44d6a7ba53f289 to your computer and use it in GitHub Desktop.
qemu-dma-bh-hang.txt
NB: qemu-6.2.0, so entirely possible this has been subsequently fixed but no obvious fix found so far.
This version does *not* have the fix 7d751201 ("hw/ide: reset: cancel async DMA operation before resetting state")
Most likely, but unclear, there was some ongoing IO issue at the libiscsi server end causing requests
not to complete until eventually libiscsi times them out.
Stuck here with BQL held:
```
#4 0x0000564b03027c3c in aio_poll (ctx=0x564b03ce9070, blocking=blocking@entry=true) at ../util/aio-posix.c:607
#5 0x0000564b02ee431c in bdrv_aio_cancel (acb=0x7fc5003e8750) at ../block/io.c:2909
#6 0x0000564b02ed7899 in blk_aio_cancel (acb=<optimized out>) at ../block/block-backend.c:1556
#7 0x0000564b02cc9343 in ide_bus_reset (bus=bus@entry=0x564b04957bb8) at ../hw/ide/core.c:2459
#8 0x0000564b02cc2d85 in ahci_reset_port (s=0x564b04953c20, port=<optimized out>) at ../hw/ide/ahci.c:651
```
The acb:
```
(gdb) print *acb
$1 = {aiocb_info = 0x564b036ee710 <dma_aiocb_info>, bs = 0x0, cb = 0x564b02cc7fb0 <ide_dma_cb>, opaque = 0x564b04957c40, refcnt = 2}
(gdb) print *acb->aiocb_info
$3 = {cancel_async = 0x564b02c2ee20 <dma_aio_cancel>, get_aio_context = 0x564b02c2ece0 <dma_get_aio_context>, aiocb_size = 160}
```
so ->refcnt is never dropping down to allow aio_poll() to finish up.
At DMA layer:
```
(gdb) set $dma_cb = *(DMAAIOCB *)acb
(gdb) print $dma_cb
$4 = {common = {aiocb_info = 0x564b036ee710 <dma_aiocb_info>, bs = 0x0, cb = 0x564b02cc7fb0 <ide_dma_cb>, opaque = 0x564b04957c40, refcnt = 2}, ctx = 0x564b03ce9070, acb = 0x0, sg = 0x564b04957f68,
align = 512, offset = 10538872832, dir = DMA_DIRECTION_TO_DEVICE, sg_cur_index = 1, sg_cur_byte = 0, iov = {iov = 0x7fc500a0c4c0, niov = 0, {{nalloc = 1, local_iov = {iov_base = 0x0, iov_len = 0}}, {
__pad = "\001\000\000\000\000\000\000\000\000\000\000", size = 0}}}, bh = 0x7fc50013aa20, io_func = 0x564b02c2ed70 <dma_blk_write_io_func>, io_func_opaque = 0x564b04e2ce60}
```
Note that `dbs->bh` is set - which means that it's stuck in `reschedule_dma()` path as the guest memory
wasn't mapped at time of `dma_blk_cb()`.
But `dma_aio_cancel()` is supposed to handled that:
```
197 static void dma_aio_cancel(BlockAIOCB *acb)
198 {
199 DMAAIOCB *dbs = container_of(acb, DMAAIOCB, common);
200
201 trace_dma_aio_cancel(dbs);
202
203 assert(!(dbs->acb && dbs->bh));
204 if (dbs->acb) {
205 /* This will invoke dma_blk_cb. */
206 blk_aio_cancel_async(dbs->acb);
207 return;
208 }
209
210 if (dbs->bh) {
211 cpu_unregister_map_client(dbs->bh);
212 qemu_bh_delete(dbs->bh);
213 dbs->bh = NULL;
214 }
215 if (dbs->common.cb) {
216 dbs->common.cb(dbs->common.opaque, -ECANCELED);
217 }
218 }
```
I think ide_bus_reset() should have already called this via blk_aio_cancel(), so how come dbs->bh is still non-NULL?
Also, in this case (line :213), who is responsible for calling `qemu_aio_unref()` normally handled by `dma_complete()`?
2024-04-15
Live instance: aio is stuck constantly re-doing reschedule_dma():
```
(gdb) bt
#0 reschedule_dma (opaque=0x7f6dec0d2e50) at ../softmmu/dma-helpers.c:96
#1 0x0000563247639b0d in aio_bh_call (bh=0x7f6dec5d8580) at ../util/async.c:169
#2 aio_bh_poll (ctx=ctx@entry=0x563249c0ec20) at ../util/async.c:169
#3 0x0000563247627bd2 in aio_poll (ctx=0x563249c0ec20, blocking=blocking@entry=true) at ../util/aio-posix.c:659
#4 0x00005632474e431c in bdrv_aio_cancel (acb=0x7f6dec0d2e50) at ../block/io.c:2909
#5 0x00005632474d7899 in blk_aio_cancel (acb=<optimized out>) at ../block/block-backend.c:1556
#6 0x00005632472c9343 in ide_bus_reset (bus=bus@entry=0x56324a85b868) at ../hw/ide/core.c:2459
#7 0x00005632472c2d85 in ahci_reset_port (s=0x56324a8597a0, port=<optimized out>) at ../hw/ide/ahci.c:651
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment