Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
1. 100% on laptop and *almost* always on QEMU.
For QEMU I use the following command with a minimal kernel:
$ qemu-system-x86_64 -machine q35,accel=kvm -cpu core2duo -smp cores=8,threads=2 -kernel arch/x86/boot/bzImage -append "ignore_loglevel threadirqs nokaslr"
I've added "threadirqs", because it triggers the issue in the VM more reliably (lots of threads to schedule at early stages). On a real machine the issue is reproducible without this option too.
Minimal kernel config: [1]
I've also took a crashdump of this kernel from the VM:
vmcore: [2]
vmlinux: [3]
pds.o: [4]
dmesg: [5]
vmcore/vmlinux can be opened by `crash --minimal`.
The very first hit:
[ 0.674269] pds: cpu #6 affinity check mask - coregroup 0x0000ffbf
[ 0.675738] pds: cpu #7 affinity check mask - smt 0x00000040
[ 0.673374] BUG: unable to handle kernel paging request at ffffffffa1d88a78
[ 0.677052] pds: cpu #7 affinity check mask - coregroup 0x0000ff7f
[ 0.675212] PGD 1a0a067 P4D 1a0a067 PUD 1a0b063 PMD 0
[ 0.675218] Oops: 0000 [#1] PREEMPT SMP PTI
[ 0.675222] CPU: 5 PID: 0 Comm: PDS/5 Tainted: G T 4.19.0-pf7+ #1
[ 0.678307] pds: cpu #8 affinity check mask - smt 0x00000200
[ 0.676998] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-20181126_142135-anatol 04/01/2014
[ 0.677323] RIP: 0010:__schedule+0x476/0x11a0
[ 0.677323] Code: 75 c8 44 89 e7 e8 aa 50 fe ff 3b 05 58 99 6c 00 41 89 c4 0f 83 bc 03 00 00 49 63 d4 4c 89 e8 48 8b 1c d5 00 e3 9$
[ 0.677323] RSP: 0000:ffffc90000093e20 EFLAGS: 00010283
[ 0.677323] RAX: 00000000ffffffff RBX: ffff8880078c0000 RCX: 000000000000000b
[ 0.677323] RDX: 000000000000000b RSI: 0000000000000000 RDI: ffffc90000093e88
[ 0.677323] RBP: ffffc90000093ec0 R08: 0000000000000030 R09: 0000000000000005
[ 0.677323] R10: 0000000000000002 R11: de4d430bcda58cba R12: 000000000000000b
[ 0.677323] R13: 0000000000015004 R14: ffff8880070b1380 R15: ffff888007189d40
[ 0.677323] FS: 0000000000000000(0000) GS:ffff888007740000(0000) knlGS:0000000000000000
[ 0.677323] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.677323] CR2: ffffffffa1d88a78 CR3: 0000000001a08000 CR4: 00000000000006a0
[ 0.677323] Call Trace:
[ 0.677323] schedule_idle+0x19/0x30
[ 0.677323] do_idle+0x14e/0x220
[ 0.677323] cpu_startup_entry+0x6a/0x70
[ 0.677323] start_secondary+0x199/0x1d0
[ 0.677323] secondary_startup_64+0xa4/0xb0
[ 0.677323] CR2: ffffffffa1d88a78
[ 0.681323] BUG: unable to handle kernel paging request at ffffffffa1d88a78
[ 0.677323] ---[ end trace 1476a55eda43f0f8 ]---
crash> dis -lr __schedule+0x476
/home/pf/work/devel/linux/pf-kernel/kernel/sched/pds.c: 2889
0xffffffff8137ec45 <__schedule+0x465>: movslq %r12d,%rdx
0xffffffff8137ec48 <__schedule+0x468>: mov %r13,%rax
0xffffffff8137ec4b <__schedule+0x46b>: mov -0x7e6f1d00(,%rdx,8),%rbx
/home/pf/work/devel/linux/pf-kernel/./arch/x86/include/asm/bitops.h: 335
0xffffffff8137ec53 <__schedule+0x473>: mov (%rax,%rbx,1),%eax
0xffffffff8137ec56 <__schedule+0x476>: bt %rax,0xa09e22(%rip) # 0xffffffff81d88a80 <sched_rq_queued_masks>
2887 for_each_cpu(i, &chk) {
2888 /* skip the cpu which has idle slibing cpu */
2889 if (cpumask_test_cpu(per_cpu(sched_sibling_cpu, i),
2890 &sched_rq_queued_masks[SCHED_RQ_EMPTY]))
2891 continue;
2892 pds_sg_balance_trigger(i);
2893 }
Maybe, this will give you some clue.
2. No, mainline v4.19.7 with CFS is not affected.
[1] https://gist.github.com/53c0a88d5b1d4e9a8e9fcf4c23aaad46
[2] https://natalenko.name/myfiles/pds-mq_cmt_crash/vmcore.xz
[3] https://natalenko.name/myfiles/pds-mq_cmt_crash/vmlinux.xz
[4] https://natalenko.name/myfiles/pds-mq_cmt_crash/pds.o.xz
[5] https://gist.github.com/07c8375bf74e7262a5c0067660489650
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.