Last active
February 28, 2017 13:18
-
-
Save Millnert/ecc10d8cc79c81b55d7f to your computer and use it in GitHub Desktop.
Stack trace and source code references from a RHEL 7.1 NFS lock bug
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Apr 18 21:01:26 tsm1 kernel: [<ffffffffa07045a0>] ? nfs_pageio_doio+0x50/0x50 [nfs] http://lxr.free-electrons.com/source/fs/nfs/pagelist.c?v=3.10#L394 | |
Apr 18 21:01:26 tsm1 kernel: [<ffffffff8160954d>] io_schedule+0x9d/0x130 http://lxr.free-electrons.com/source/kernel/sched/core.c?v=3.10#L4512 | |
Apr 18 21:01:26 tsm1 kernel: [<ffffffffa07045ae>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs] http://lxr.free-electrons.com/source/fs/nfs/pagelist.c?v=3.10#L261 | |
Apr 18 21:01:26 tsm1 kernel: [<ffffffff81607320>] __wait_on_bit+0x60/0x90 http://lxr.free-electrons.com/source/kernel/sched/wait.c#L387 | |
Apr 18 21:01:26 tsm1 kernel: [<ffffffffa07045a0>] ? nfs_pageio_doio+0x50/0x50 [nfs] http://lxr.free-electrons.com/source/fs/nfs/pagelist.c?v=3.10#L394 | |
Apr 18 21:01:26 tsm1 kernel: [<ffffffff816073d7>] out_of_line_wait_on_bit+0x87/0xb0 http://lxr.free-electrons.com/source/kernel/wait.c?v=3.10#L209 | |
Apr 18 21:01:26 tsm1 kernel: [<ffffffff81098260>] ? autoremove_wake_function+0x40/0x40 http://lxr.free-electrons.com/source/kernel/wait.c?v=3.10#L163 | |
Apr 18 21:01:26 tsm1 kernel: [<ffffffffa0705ad3>] nfs_wait_on_request+0x33/0x40 [nfs] http://lxr.free-electrons.com/source/fs/nfs/pagelist.c?v=3.10#L275 | |
Apr 18 21:01:26 tsm1 kernel: [<ffffffffa070a351>] nfs_updatepage+0x121/0x8a0 [nfs] http://lxr.free-electrons.com/source/fs/nfs/write.c?v=3.10#L897 | |
-> nfs_writepage_setup @ http://lxr.free-electrons.com/source/fs/nfs/write.c?v=3.10#L828 | |
-> nfs_setup_write_request via http://lxr.free-electrons.com/source/fs/nfs/write.c?v=3.10#L833 | |
func @ http://lxr.free-electrons.com/source/fs/nfs/write.c?v=3.10#L811 | |
-> nfs_mark_uptodate via http://lxr.free-electrons.com/source/fs/nfs/write.c?v=3.10#L838 | |
func @ http://lxr.free-electrons.com/source/fs/nfs/write.c?v=3.10#L215 | |
-> nfs_mark_request_dirty via http://lxr.free-electrons.com/source/fs/nfs/write.c?v=3.10#L839 | |
func @ http://lxr.free-electrons.com/source/fs/nfs/write.c?v=3.10#L466 | |
-> nfs_unlock_and_release_request via http://lxr.free-electrons.com/source/fs/nfs/write.c?v=3.10#L840 | |
func @ http://lxr.free-electrons.com/source/fs/nfs/pagelist.c?v=3.10#L206 | |
Apr 18 21:01:26 tsm1 kernel: [<ffffffffa06fac11>] nfs_write_end+0x121/0x350 [nfs] http://lxr.free-electrons.com/source/fs/nfs/file.c?v=3.10#L404 | |
Apr 18 21:01:26 tsm1 kernel: [<ffffffff811564e4>] generic_file_buffered_write+0x184/0x290 http://lxr.free-electrons.com/source/mm/filemap.c?v=3.10#L2393 | |
via http://lxr.free-electrons.com/source/mm/filemap.c?v=3.10#L2301 | |
via write_end: http://lxr.free-electrons.com/source/mm/filemap.c?v=3.10#L2357 | |
mapped as http://lxr.free-electrons.com/source/fs/nfs/file.c?v=3.10#L537 to nfs_write_end | |
Apr 18 21:01:26 tsm1 kernel: [<ffffffff811585e5>] __generic_file_aio_write+0x1d5/0x3e0 http://lxr.free-electrons.com/source/mm/filemap.c?v=3.10#L2432 | |
does not do O_DIRECT, but generic_file_buffered_write | |
Apr 18 21:01:26 tsm1 kernel: [<ffffffff8115884d>] generic_file_aio_write+0x5d/0xc0 http://lxr.free-electrons.com/source/mm/filemap.c?v=3.10#L2540 | |
Apr 18 21:01:26 tsm1 kernel: [<ffffffffa06f9d1b>] nfs_file_write+0xbb/0x1d0 [nfs] via http://lxr.free-electrons.com/source/fs/nfs/file.c?v=3.10#L925 to | |
http://lxr.free-electrons.com/source/fs/nfs/file.c?v=3.10#L613 | |
does *not* do O_DIRECT, O_APPEND write from here, but rather: generic_file_aio_write | |
Apr 18 21:01:26 tsm1 kernel: [<ffffffff811c5e2d>] do_sync_write+0x8d/0xd0 http://lxr.free-electrons.com/source/fs/read_write.c?v=3.10#L383 | |
Apr 18 21:01:26 tsm1 kernel: [<ffffffff811c65cd>] vfs_write+0xbd/0x1e0 http://lxr.free-electrons.com/source/fs/read_write.c?v=3.10#L430 | |
Apr 18 21:01:26 tsm1 kernel: [<ffffffff811c60b4>] ? generic_file_llseek+0x24/0x30 http://lxr.free-electrons.com/source/fs/read_write.c?v=3.10#L137 | |
Apr 18 21:01:26 tsm1 kernel: [<ffffffff811c7018>] SyS_write+0x58/0xb0 | |
Apr 18 21:01:26 tsm1 kernel: [<ffffffff81613da9>] system_call_fastpath+0x16/0x1b http://lxr.free-electrons.com/source/arch/x86/kernel/entry_64.S?v=3.10#L635 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I have a similar issue(and this issue has occured several times), a process hanging on NFS writing, the process came into D state(uninterruptable sleep state) and can't be killed.
The hanging process's stack is as follow:
[] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs]
[] nfs_wait_on_request+0x33/0x40 [nfs]
[] nfs_updatepage+0x121/0x8a0 [nfs]
[] nfs_write_end+0x121/0x350 [nfs]
[] generic_file_buffered_write+0x184/0x290
[] __generic_file_aio_write+0x1d5/0x3e0
[] generic_file_aio_write+0x5d/0xc0
[] nfs_file_write+0xbb/0x1d0 [nfs]
[] do_sync_write+0x8d/0xd0
[] vfs_write+0xbd/0x1e0
[] SyS_write+0x7f/0xe0
[] system_call_fastpath+0x16/0x1b
[] 0xffffffffffffffff
The process status is as follow:
Name: python
State: D (disk sleep)
Tgid: 12585
Ngid: 13108
Pid: 12585
PPid: 1
TracerPid: 0
Uid: 1001 1001 1001 1001
Gid: 1001 1001 1001 1001
FDSize: 64
Groups: 10 1001
VmPeak: 251223144 kB
VmSize: 251051100 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 23502332 kB
VmRSS: 148884 kB
VmData: 249587980 kB
VmStk: 136 kB
VmExe: 4 kB
VmLib: 409248 kB
VmPTE: 47444 kB
VmSwap: 22780916 kB
Threads: 1
SigQ: 10/256532
SigPnd: 0000000000040100
ShdPnd: 0000000000004322
SigBlk: 0000000000000000
SigIgn: 0000000001381000
SigCgt: 0000000180000202
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000001fffffffff
Seccomp: 0
Cpus_allowed: ffffff,ffffffff
Cpus_allowed_list: 0-55
Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000003
Mems_allowed_list: 0-1
voluntary_ctxt_switches: 2022575
nonvoluntary_ctxt_switches: 134484
The kernel version is RHEL7.2.1511 3.10.0-327.22.2.el7.x86_64
Someone already created an issue on RedHat a week ago, but not yet solved officially.
https://access.redhat.com/solutions/2245341