-
-
Save Millnert/ecc10d8cc79c81b55d7f to your computer and use it in GitHub Desktop.
Apr 18 21:01:26 tsm1 kernel: [<ffffffffa07045a0>] ? nfs_pageio_doio+0x50/0x50 [nfs] http://lxr.free-electrons.com/source/fs/nfs/pagelist.c?v=3.10#L394 | |
Apr 18 21:01:26 tsm1 kernel: [<ffffffff8160954d>] io_schedule+0x9d/0x130 http://lxr.free-electrons.com/source/kernel/sched/core.c?v=3.10#L4512 | |
Apr 18 21:01:26 tsm1 kernel: [<ffffffffa07045ae>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs] http://lxr.free-electrons.com/source/fs/nfs/pagelist.c?v=3.10#L261 | |
Apr 18 21:01:26 tsm1 kernel: [<ffffffff81607320>] __wait_on_bit+0x60/0x90 http://lxr.free-electrons.com/source/kernel/sched/wait.c#L387 | |
Apr 18 21:01:26 tsm1 kernel: [<ffffffffa07045a0>] ? nfs_pageio_doio+0x50/0x50 [nfs] http://lxr.free-electrons.com/source/fs/nfs/pagelist.c?v=3.10#L394 | |
Apr 18 21:01:26 tsm1 kernel: [<ffffffff816073d7>] out_of_line_wait_on_bit+0x87/0xb0 http://lxr.free-electrons.com/source/kernel/wait.c?v=3.10#L209 | |
Apr 18 21:01:26 tsm1 kernel: [<ffffffff81098260>] ? autoremove_wake_function+0x40/0x40 http://lxr.free-electrons.com/source/kernel/wait.c?v=3.10#L163 | |
Apr 18 21:01:26 tsm1 kernel: [<ffffffffa0705ad3>] nfs_wait_on_request+0x33/0x40 [nfs] http://lxr.free-electrons.com/source/fs/nfs/pagelist.c?v=3.10#L275 | |
Apr 18 21:01:26 tsm1 kernel: [<ffffffffa070a351>] nfs_updatepage+0x121/0x8a0 [nfs] http://lxr.free-electrons.com/source/fs/nfs/write.c?v=3.10#L897 | |
-> nfs_writepage_setup @ http://lxr.free-electrons.com/source/fs/nfs/write.c?v=3.10#L828 | |
-> nfs_setup_write_request via http://lxr.free-electrons.com/source/fs/nfs/write.c?v=3.10#L833 | |
func @ http://lxr.free-electrons.com/source/fs/nfs/write.c?v=3.10#L811 | |
-> nfs_mark_uptodate via http://lxr.free-electrons.com/source/fs/nfs/write.c?v=3.10#L838 | |
func @ http://lxr.free-electrons.com/source/fs/nfs/write.c?v=3.10#L215 | |
-> nfs_mark_request_dirty via http://lxr.free-electrons.com/source/fs/nfs/write.c?v=3.10#L839 | |
func @ http://lxr.free-electrons.com/source/fs/nfs/write.c?v=3.10#L466 | |
-> nfs_unlock_and_release_request via http://lxr.free-electrons.com/source/fs/nfs/write.c?v=3.10#L840 | |
func @ http://lxr.free-electrons.com/source/fs/nfs/pagelist.c?v=3.10#L206 | |
Apr 18 21:01:26 tsm1 kernel: [<ffffffffa06fac11>] nfs_write_end+0x121/0x350 [nfs] http://lxr.free-electrons.com/source/fs/nfs/file.c?v=3.10#L404 | |
Apr 18 21:01:26 tsm1 kernel: [<ffffffff811564e4>] generic_file_buffered_write+0x184/0x290 http://lxr.free-electrons.com/source/mm/filemap.c?v=3.10#L2393 | |
via http://lxr.free-electrons.com/source/mm/filemap.c?v=3.10#L2301 | |
via write_end: http://lxr.free-electrons.com/source/mm/filemap.c?v=3.10#L2357 | |
mapped as http://lxr.free-electrons.com/source/fs/nfs/file.c?v=3.10#L537 to nfs_write_end | |
Apr 18 21:01:26 tsm1 kernel: [<ffffffff811585e5>] __generic_file_aio_write+0x1d5/0x3e0 http://lxr.free-electrons.com/source/mm/filemap.c?v=3.10#L2432 | |
does not do O_DIRECT, but generic_file_buffered_write | |
Apr 18 21:01:26 tsm1 kernel: [<ffffffff8115884d>] generic_file_aio_write+0x5d/0xc0 http://lxr.free-electrons.com/source/mm/filemap.c?v=3.10#L2540 | |
Apr 18 21:01:26 tsm1 kernel: [<ffffffffa06f9d1b>] nfs_file_write+0xbb/0x1d0 [nfs] via http://lxr.free-electrons.com/source/fs/nfs/file.c?v=3.10#L925 to | |
http://lxr.free-electrons.com/source/fs/nfs/file.c?v=3.10#L613 | |
does *not* do O_DIRECT, O_APPEND write from here, but rather: generic_file_aio_write | |
Apr 18 21:01:26 tsm1 kernel: [<ffffffff811c5e2d>] do_sync_write+0x8d/0xd0 http://lxr.free-electrons.com/source/fs/read_write.c?v=3.10#L383 | |
Apr 18 21:01:26 tsm1 kernel: [<ffffffff811c65cd>] vfs_write+0xbd/0x1e0 http://lxr.free-electrons.com/source/fs/read_write.c?v=3.10#L430 | |
Apr 18 21:01:26 tsm1 kernel: [<ffffffff811c60b4>] ? generic_file_llseek+0x24/0x30 http://lxr.free-electrons.com/source/fs/read_write.c?v=3.10#L137 | |
Apr 18 21:01:26 tsm1 kernel: [<ffffffff811c7018>] SyS_write+0x58/0xb0 | |
Apr 18 21:01:26 tsm1 kernel: [<ffffffff81613da9>] system_call_fastpath+0x16/0x1b http://lxr.free-electrons.com/source/arch/x86/kernel/entry_64.S?v=3.10#L635 |
I have 10 machines that every day have this error
Jul 10 14:28:35 hpcnode004 kernel: INFO: task srun:20481 blocked for more than 120 seconds.
Jul 10 14:28:35 hpcnode004 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 10 14:28:35 hpcnode004 kernel: srun D ffff8810115bbb30 0 20481 20435 0x00000084
Jul 10 14:28:35 hpcnode004 kernel: ffff8810115bb9d0 0000000000000082 ffff881014a49700 ffff8810115bbfd8
Jul 10 14:28:35 hpcnode004 kernel: ffff8810115bbfd8 ffff8810115bbfd8 ffff881014a49700 ffff88103fc147c0
Jul 10 14:28:35 hpcnode004 kernel: 0000000000000000 7fffffffffffffff ffffffffa0630bb0 ffff8810115bbb30
Jul 10 14:28:35 hpcnode004 kernel: Call Trace:
Jul 10 14:28:35 hpcnode004 kernel: [] ? nfs_pgio_header_free+0x20/0x20 [nfs]
Jul 10 14:28:35 hpcnode004 kernel: [] schedule+0x29/0x70
Jul 10 14:28:35 hpcnode004 kernel: [] schedule_timeout+0x209/0x2d0
Jul 10 14:28:35 hpcnode004 kernel: [] ? dev_hard_start_xmit+0x171/0x3b0
Jul 10 14:28:35 hpcnode004 kernel: [] ? netif_skb_features+0xc7/0x210
Jul 10 14:28:35 hpcnode004 kernel: [] ? read_tsc+0x9/0x10
Jul 10 14:28:35 hpcnode004 kernel: [] ? nfs_pgio_header_free+0x20/0x20 [nfs]
Jul 10 14:28:35 hpcnode004 kernel: [] io_schedule_timeout+0xae/0x130
Jul 10 14:28:35 hpcnode004 kernel: [] io_schedule+0x18/0x20
Jul 10 14:28:35 hpcnode004 kernel: [] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs]
Jul 10 14:28:35 hpcnode004 kernel: [] __wait_on_bit+0x60/0x90
Jul 10 14:28:35 hpcnode004 kernel: [] ? nfs_pgio_header_free+0x20/0x20 [nfs]
Jul 10 14:28:35 hpcnode004 kernel: [] out_of_line_wait_on_bit+0x87/0xb0
Jul 10 14:28:35 hpcnode004 kernel: [] ? wake_atomic_t_function+0x40/0x40
Jul 10 14:28:35 hpcnode004 kernel: [] nfs_wait_on_request+0x33/0x40 [nfs]
Jul 10 14:28:35 hpcnode004 kernel: [] nfs_updatepage+0x121/0x8a0 [nfs]
Jul 10 14:28:35 hpcnode004 kernel: [] nfs_write_end+0x121/0x350 [nfs]
Jul 10 14:28:35 hpcnode004 kernel: [] generic_file_buffered_write+0x184/0x290
Jul 10 14:28:35 hpcnode004 kernel: [] __generic_file_aio_write+0x1d5/0x3e0
Jul 10 14:28:35 hpcnode004 kernel: [] ? inet_recvmsg+0x7b/0xa0
Jul 10 14:28:35 hpcnode004 kernel: [] generic_file_aio_write+0x5d/0xc0
Jul 10 14:28:35 hpcnode004 kernel: [] nfs_file_write+0xbb/0x1d0 [nfs]
Jul 10 14:28:35 hpcnode004 kernel: [] do_sync_write+0x8d/0xd0
Jul 10 14:28:35 hpcnode004 kernel: [] vfs_write+0xbd/0x1e0
Jul 10 14:28:35 hpcnode004 kernel: [] SyS_write+0x7f/0xe0
Jul 10 14:28:35 hpcnode004 kernel: [] system_call_fastpath+0x16/0x1b
Jul 10 14:28:35 hpcnode004 kernel: INFO: task srun:20502 blocked for more than 120 seconds.
Jul 10 14:28:35 hpcnode004 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 10 14:28:35 hpcnode004 kernel: srun D ffff880e98787b30 0 20502 20436 0x00000084
Jul 10 14:28:35 hpcnode004 kernel: ffff880e987879d0 0000000000000086 ffff88102303c500 ffff880e98787fd8
Jul 10 14:28:35 hpcnode004 kernel: ffff880e98787fd8 ffff880e98787fd8 ffff88102303c500 ffff88103fc147c0
Jul 10 14:28:35 hpcnode004 kernel: 0000000000000000 7fffffffffffffff ffffffffa0630bb0 ffff880e98787b30
Jul 10 14:28:35 hpcnode004 kernel: Call Trace:
Jul 10 14:28:35 hpcnode004 kernel: [] ? nfs_pgio_header_free+0x20/0x20 [nfs]
Jul 10 14:28:35 hpcnode004 kernel: [] schedule+0x29/0x70
Jul 10 14:28:35 hpcnode004 kernel: [] schedule_timeout+0x209/0x2d0
Jul 10 14:28:35 hpcnode004 kernel: [] ? dev_hard_start_xmit+0x171/0x3b0
Jul 10 14:28:35 hpcnode004 kernel: [] ? netif_skb_features+0xc7/0x210
Jul 10 14:28:35 hpcnode004 kernel: [] ? read_tsc+0x9/0x10
Jul 10 14:28:35 hpcnode004 kernel: [] ? nfs_pgio_header_free+0x20/0x20 [nfs]
Jul 10 14:28:35 hpcnode004 kernel: [] io_schedule_timeout+0xae/0x130
Jul 10 14:28:35 hpcnode004 kernel: [] io_schedule+0x18/0x20
Jul 10 14:28:35 hpcnode004 kernel: [] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs]
Jul 10 14:28:35 hpcnode004 kernel: [] __wait_on_bit+0x60/0x90
Jul 10 14:28:35 hpcnode004 kernel: [] ? nfs_pgio_header_free+0x20/0x20 [nfs]
Jul 10 14:28:35 hpcnode004 kernel: [] out_of_line_wait_on_bit+0x87/0xb0
Jul 10 14:28:35 hpcnode004 kernel: [] ? wake_atomic_t_function+0x40/0x40
Jul 10 14:28:35 hpcnode004 kernel: [] nfs_wait_on_request+0x33/0x40 [nfs]
Jul 10 14:28:35 hpcnode004 kernel: [] nfs_updatepage+0x121/0x8a0 [nfs]
Jul 10 14:28:35 hpcnode004 kernel: [] nfs_write_end+0x121/0x350 [nfs]
Jul 10 14:28:35 hpcnode004 kernel: [] generic_file_buffered_write+0x184/0x290
Jul 10 14:28:35 hpcnode004 kernel: [] __generic_file_aio_write+0x1d5/0x3e0
Jul 10 14:28:35 hpcnode004 kernel: [] ? inet_recvmsg+0x7b/0xa0
Jul 10 14:28:35 hpcnode004 kernel: [] generic_file_aio_write+0x5d/0xc0
Jul 10 14:28:35 hpcnode004 kernel: [] nfs_file_write+0xbb/0x1d0 [nfs]
Jul 10 14:28:35 hpcnode004 kernel: [] do_sync_write+0x8d/0xd0
Jul 10 14:28:35 hpcnode004 kernel: [] vfs_write+0xbd/0x1e0
Jul 10 14:28:35 hpcnode004 kernel: [] SyS_write+0x7f/0xe0
I have a similar issue(and this issue has occured several times), a process hanging on NFS writing, the process came into D state(uninterruptable sleep state) and can't be killed.
-
The hanging process's stack is as follow:
[] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs]
[] nfs_wait_on_request+0x33/0x40 [nfs]
[] nfs_updatepage+0x121/0x8a0 [nfs]
[] nfs_write_end+0x121/0x350 [nfs]
[] generic_file_buffered_write+0x184/0x290
[] __generic_file_aio_write+0x1d5/0x3e0
[] generic_file_aio_write+0x5d/0xc0
[] nfs_file_write+0xbb/0x1d0 [nfs]
[] do_sync_write+0x8d/0xd0
[] vfs_write+0xbd/0x1e0
[] SyS_write+0x7f/0xe0
[] system_call_fastpath+0x16/0x1b
[] 0xffffffffffffffff -
The process status is as follow:
Name: python
State: D (disk sleep)
Tgid: 12585
Ngid: 13108
Pid: 12585
PPid: 1
TracerPid: 0
Uid: 1001 1001 1001 1001
Gid: 1001 1001 1001 1001
FDSize: 64
Groups: 10 1001
VmPeak: 251223144 kB
VmSize: 251051100 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 23502332 kB
VmRSS: 148884 kB
VmData: 249587980 kB
VmStk: 136 kB
VmExe: 4 kB
VmLib: 409248 kB
VmPTE: 47444 kB
VmSwap: 22780916 kB
Threads: 1
SigQ: 10/256532
SigPnd: 0000000000040100
ShdPnd: 0000000000004322
SigBlk: 0000000000000000
SigIgn: 0000000001381000
SigCgt: 0000000180000202
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000001fffffffff
Seccomp: 0
Cpus_allowed: ffffff,ffffffff
Cpus_allowed_list: 0-55
Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000003
Mems_allowed_list: 0-1
voluntary_ctxt_switches: 2022575
nonvoluntary_ctxt_switches: 134484 -
The kernel version is RHEL7.2.1511 3.10.0-327.22.2.el7.x86_64
-
Someone already created an issue on RedHat a week ago, but not yet solved officially.
https://access.redhat.com/solutions/2245341
Issue remains on RHEL7.2 after NFS-utils update.
Feb 17 16:37:00 host kernel: INFO: task bowtie2-align-s:26812 blocked for more than 120 seconds.
Feb 17 16:37:00 host kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 17 16:37:00 host kernel: bowtie2-align-s D ffff88081ac53b00 0 26812 26805 0x00000080
Feb 17 16:37:00 host kernel: ffff88081ac539a0 0000000000000086 ffff8808263d6780 ffff88081ac53fd8
Feb 17 16:37:00 host kernel: ffff88081ac53fd8 ffff88081ac53fd8 ffff8808263d6780 ffff88082f094780
Feb 17 16:37:00 host kernel: 0000000000000000 7fffffffffffffff ffffffffa057dbc0 ffff88081ac53b00
Feb 17 16:37:00 host kernel: Call Trace:
Feb 17 16:37:00 host kernel: [] ? nfs_pgio_header_free+0x20/0x20 [nfs]
Feb 17 16:37:00 host kernel: [] schedule+0x29/0x70
Feb 17 16:37:00 host kernel: [] schedule_timeout+0x209/0x2d0
Feb 17 16:37:00 host kernel: [] ? native_sched_clock+0x35/0x80
Feb 17 16:37:00 host kernel: [] ? read_tsc+0x9/0x10
Feb 17 16:37:00 host kernel: [] ? nfs_pgio_header_free+0x20/0x20 [nfs]
Feb 17 16:37:00 host kernel: [] io_schedule_timeout+0xae/0x130
Feb 17 16:37:00 host kernel: [] io_schedule+0x18/0x20
Feb 17 16:37:00 host kernel: [] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs]
Feb 17 16:37:00 host kernel: [] __wait_on_bit+0x60/0x90
Feb 17 16:37:00 host kernel: [] ? nfs_pgio_header_free+0x20/0x20 [nfs]
Feb 17 16:37:00 host kernel: [] out_of_line_wait_on_bit+0x87/0xb0
Feb 17 16:37:00 host kernel: [] ? wake_atomic_t_function+0x40/0x40
Feb 17 16:37:00 host kernel: [] nfs_wait_on_request+0x33/0x40 [nfs]
Feb 17 16:37:00 host kernel: [] nfs_updatepage+0x121/0x8a0 [nfs]
Feb 17 16:37:00 host kernel: [] nfs_write_end+0x121/0x350 [nfs]
Feb 17 16:37:00 host kernel: [] generic_file_buffered_write+0x184/0x290
Feb 17 16:37:00 host kernel: [] __generic_file_aio_write+0x1d5/0x3e0
Feb 17 16:37:00 host kernel: [] generic_file_aio_write+0x5d/0xc0
Feb 17 16:37:00 host kernel: [] nfs_file_write+0xbb/0x1d0 [nfs]
Feb 17 16:37:00 host kernel: [] do_sync_write+0x8d/0xd0
Feb 17 16:37:00 host kernel: [] vfs_write+0xbd/0x1e0
Feb 17 16:37:00 host kernel: [] SyS_write+0x7f/0xe0
Feb 17 16:37:00 host kernel: [] symev_write+0x43/0x80 [symev_rh_ES_7_3_10_0_123_el7_x86_64]