pfactum/investigation.txt

## investigation.txt
1. 100% on laptop and *almost* always on QEMU.

For QEMU I use the following command with a minimal kernel:

$ qemu-system-x86_64 -machine q35,accel=kvm -cpu core2duo -smp cores=8,threads=2 -kernel arch/x86/boot/bzImage -append "ignore_loglevel threadirqs nokaslr"

I've added "threadirqs", because it triggers the issue in the VM more reliably (lots of threads to schedule at early stages). On a real machine the issue is reproducible without this option too.

Minimal kernel config: [1]

I've also took a crashdump of this kernel from the VM:

vmcore: [2]
vmlinux: [3]
pds.o: [4]
dmesg: [5]

vmcore/vmlinux can be opened by `crash --minimal`.

The very first hit:

[    0.674269] pds: cpu #6 affinity check mask - coregroup 0x0000ffbf
[    0.675738] pds: cpu #7 affinity check mask - smt 0x00000040
[    0.673374] BUG: unable to handle kernel paging request at ffffffffa1d88a78
[    0.677052] pds: cpu #7 affinity check mask - coregroup 0x0000ff7f
[    0.675212] PGD 1a0a067 P4D 1a0a067 PUD 1a0b063 PMD 0
[    0.675218] Oops: 0000 [#1] PREEMPT SMP PTI
[    0.675222] CPU: 5 PID: 0 Comm: PDS/5 Tainted: G                T 4.19.0-pf7+ #1
[    0.678307] pds: cpu #8 affinity check mask - smt 0x00000200
[    0.676998] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-20181126_142135-anatol 04/01/2014
…
[    0.677323] RIP: 0010:__schedule+0x476/0x11a0
[    0.677323] Code: 75 c8 44 89 e7 e8 aa 50 fe ff 3b 05 58 99 6c 00 41 89 c4 0f 83 bc 03 00 00 49 63 d4 4c 89 e8 48 8b 1c d5 00 e3 9$
[    0.677323] RSP: 0000:ffffc90000093e20 EFLAGS: 00010283
[    0.677323] RAX: 00000000ffffffff RBX: ffff8880078c0000 RCX: 000000000000000b
[    0.677323] RDX: 000000000000000b RSI: 0000000000000000 RDI: ffffc90000093e88
[    0.677323] RBP: ffffc90000093ec0 R08: 0000000000000030 R09: 0000000000000005
[    0.677323] R10: 0000000000000002 R11: de4d430bcda58cba R12: 000000000000000b
[    0.677323] R13: 0000000000015004 R14: ffff8880070b1380 R15: ffff888007189d40
[    0.677323] FS:  0000000000000000(0000) GS:ffff888007740000(0000) knlGS:0000000000000000
[    0.677323] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.677323] CR2: ffffffffa1d88a78 CR3: 0000000001a08000 CR4: 00000000000006a0
[    0.677323] Call Trace:
[    0.677323]  schedule_idle+0x19/0x30
[    0.677323]  do_idle+0x14e/0x220
[    0.677323]  cpu_startup_entry+0x6a/0x70
[    0.677323]  start_secondary+0x199/0x1d0
[    0.677323]  secondary_startup_64+0xa4/0xb0
[    0.677323] CR2: ffffffffa1d88a78
[    0.681323] BUG: unable to handle kernel paging request at ffffffffa1d88a78
[    0.677323] ---[ end trace 1476a55eda43f0f8 ]---

crash> dis -lr __schedule+0x476
…
/home/pf/work/devel/linux/pf-kernel/kernel/sched/pds.c: 2889
0xffffffff8137ec45 <__schedule+0x465>:  movslq %r12d,%rdx
0xffffffff8137ec48 <__schedule+0x468>:  mov    %r13,%rax
0xffffffff8137ec4b <__schedule+0x46b>:  mov    -0x7e6f1d00(,%rdx,8),%rbx
/home/pf/work/devel/linux/pf-kernel/./arch/x86/include/asm/bitops.h: 335
0xffffffff8137ec53 <__schedule+0x473>:  mov    (%rax,%rbx,1),%eax
0xffffffff8137ec56 <__schedule+0x476>:  bt     %rax,0xa09e22(%rip)        # 0xffffffff81d88a80 <sched_rq_queued_masks>
…

2887     for_each_cpu(i, &chk) {
2888         /* skip the cpu which has idle slibing cpu */
2889         if (cpumask_test_cpu(per_cpu(sched_sibling_cpu, i),
2890                      &sched_rq_queued_masks[SCHED_RQ_EMPTY]))
2891             continue;
2892         pds_sg_balance_trigger(i);
2893     }

Maybe, this will give you some clue.

2. No, mainline v4.19.7 with CFS is not affected.

[1] https://gist.github.com/53c0a88d5b1d4e9a8e9fcf4c23aaad46
[2] https://natalenko.name/myfiles/pds-mq_cmt_crash/vmcore.xz
[3] https://natalenko.name/myfiles/pds-mq_cmt_crash/vmlinux.xz
[4] https://natalenko.name/myfiles/pds-mq_cmt_crash/pds.o.xz
[5] https://gist.github.com/07c8375bf74e7262a5c0067660489650
	1. 100% on laptop and almost always on QEMU.

	For QEMU I use the following command with a minimal kernel:

	$ qemu-system-x86_64 -machine q35,accel=kvm -cpu core2duo -smp cores=8,threads=2 -kernel arch/x86/boot/bzImage -append "ignore_loglevel threadirqs nokaslr"

	I've added "threadirqs", because it triggers the issue in the VM more reliably (lots of threads to schedule at early stages). On a real machine the issue is reproducible without this option too.

	Minimal kernel config: [1]

	I've also took a crashdump of this kernel from the VM:

	vmcore: [2]
	vmlinux: [3]
	pds.o: [4]
	dmesg: [5]

	vmcore/vmlinux can be opened by `crash --minimal`.

	The very first hit:

	[ 0.674269] pds: cpu #6 affinity check mask - coregroup 0x0000ffbf
	[ 0.675738] pds: cpu #7 affinity check mask - smt 0x00000040
	[ 0.673374] BUG: unable to handle kernel paging request at ffffffffa1d88a78
	[ 0.677052] pds: cpu #7 affinity check mask - coregroup 0x0000ff7f
	[ 0.675212] PGD 1a0a067 P4D 1a0a067 PUD 1a0b063 PMD 0
	[ 0.675218] Oops: 0000 [#1] PREEMPT SMP PTI
	[ 0.675222] CPU: 5 PID: 0 Comm: PDS/5 Tainted: G T 4.19.0-pf7+ #1
	[ 0.678307] pds: cpu #8 affinity check mask - smt 0x00000200
	[ 0.676998] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-20181126_142135-anatol 04/01/2014
	…
	[ 0.677323] RIP: 0010:__schedule+0x476/0x11a0
	[ 0.677323] Code: 75 c8 44 89 e7 e8 aa 50 fe ff 3b 05 58 99 6c 00 41 89 c4 0f 83 bc 03 00 00 49 63 d4 4c 89 e8 48 8b 1c d5 00 e3 9$
	[ 0.677323] RSP: 0000:ffffc90000093e20 EFLAGS: 00010283
	[ 0.677323] RAX: 00000000ffffffff RBX: ffff8880078c0000 RCX: 000000000000000b
	[ 0.677323] RDX: 000000000000000b RSI: 0000000000000000 RDI: ffffc90000093e88
	[ 0.677323] RBP: ffffc90000093ec0 R08: 0000000000000030 R09: 0000000000000005
	[ 0.677323] R10: 0000000000000002 R11: de4d430bcda58cba R12: 000000000000000b
	[ 0.677323] R13: 0000000000015004 R14: ffff8880070b1380 R15: ffff888007189d40
	[ 0.677323] FS: 0000000000000000(0000) GS:ffff888007740000(0000) knlGS:0000000000000000
	[ 0.677323] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	[ 0.677323] CR2: ffffffffa1d88a78 CR3: 0000000001a08000 CR4: 00000000000006a0
	[ 0.677323] Call Trace:
	[ 0.677323] schedule_idle+0x19/0x30
	[ 0.677323] do_idle+0x14e/0x220
	[ 0.677323] cpu_startup_entry+0x6a/0x70
	[ 0.677323] start_secondary+0x199/0x1d0
	[ 0.677323] secondary_startup_64+0xa4/0xb0
	[ 0.677323] CR2: ffffffffa1d88a78
	[ 0.681323] BUG: unable to handle kernel paging request at ffffffffa1d88a78
	[ 0.677323] ---[ end trace 1476a55eda43f0f8 ]---

	crash> dis -lr __schedule+0x476
	…
	/home/pf/work/devel/linux/pf-kernel/kernel/sched/pds.c: 2889
	0xffffffff8137ec45 <__schedule+0x465>: movslq %r12d,%rdx
	0xffffffff8137ec48 <__schedule+0x468>: mov %r13,%rax
	0xffffffff8137ec4b <__schedule+0x46b>: mov -0x7e6f1d00(,%rdx,8),%rbx
	/home/pf/work/devel/linux/pf-kernel/./arch/x86/include/asm/bitops.h: 335
	0xffffffff8137ec53 <__schedule+0x473>: mov (%rax,%rbx,1),%eax
	0xffffffff8137ec56 <__schedule+0x476>: bt %rax,0xa09e22(%rip) # 0xffffffff81d88a80 <sched_rq_queued_masks>
	…

	2887 for_each_cpu(i, &chk) {
	2888 /* skip the cpu which has idle slibing cpu */
	2889 if (cpumask_test_cpu(per_cpu(sched_sibling_cpu, i),
	2890 &sched_rq_queued_masks[SCHED_RQ_EMPTY]))
	2891 continue;
	2892 pds_sg_balance_trigger(i);
	2893 }

	Maybe, this will give you some clue.

	2. No, mainline v4.19.7 with CFS is not affected.

	[1] https://gist.github.com/53c0a88d5b1d4e9a8e9fcf4c23aaad46
	[2] https://natalenko.name/myfiles/pds-mq_cmt_crash/vmcore.xz
	[3] https://natalenko.name/myfiles/pds-mq_cmt_crash/vmlinux.xz
	[4] https://natalenko.name/myfiles/pds-mq_cmt_crash/pds.o.xz
	[5] https://gist.github.com/07c8375bf74e7262a5c0067660489650