Skip to content

Instantly share code, notes, and snippets.

@Millnert
Last active February 28, 2017 13:18
Show Gist options
  • Save Millnert/ecc10d8cc79c81b55d7f to your computer and use it in GitHub Desktop.
Save Millnert/ecc10d8cc79c81b55d7f to your computer and use it in GitHub Desktop.
Stack trace and source code references from a RHEL 7.1 NFS lock bug
Apr 18 21:01:26 tsm1 kernel: [<ffffffffa07045a0>] ? nfs_pageio_doio+0x50/0x50 [nfs] http://lxr.free-electrons.com/source/fs/nfs/pagelist.c?v=3.10#L394
Apr 18 21:01:26 tsm1 kernel: [<ffffffff8160954d>] io_schedule+0x9d/0x130 http://lxr.free-electrons.com/source/kernel/sched/core.c?v=3.10#L4512
Apr 18 21:01:26 tsm1 kernel: [<ffffffffa07045ae>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs] http://lxr.free-electrons.com/source/fs/nfs/pagelist.c?v=3.10#L261
Apr 18 21:01:26 tsm1 kernel: [<ffffffff81607320>] __wait_on_bit+0x60/0x90 http://lxr.free-electrons.com/source/kernel/sched/wait.c#L387
Apr 18 21:01:26 tsm1 kernel: [<ffffffffa07045a0>] ? nfs_pageio_doio+0x50/0x50 [nfs] http://lxr.free-electrons.com/source/fs/nfs/pagelist.c?v=3.10#L394
Apr 18 21:01:26 tsm1 kernel: [<ffffffff816073d7>] out_of_line_wait_on_bit+0x87/0xb0 http://lxr.free-electrons.com/source/kernel/wait.c?v=3.10#L209
Apr 18 21:01:26 tsm1 kernel: [<ffffffff81098260>] ? autoremove_wake_function+0x40/0x40 http://lxr.free-electrons.com/source/kernel/wait.c?v=3.10#L163
Apr 18 21:01:26 tsm1 kernel: [<ffffffffa0705ad3>] nfs_wait_on_request+0x33/0x40 [nfs] http://lxr.free-electrons.com/source/fs/nfs/pagelist.c?v=3.10#L275
Apr 18 21:01:26 tsm1 kernel: [<ffffffffa070a351>] nfs_updatepage+0x121/0x8a0 [nfs] http://lxr.free-electrons.com/source/fs/nfs/write.c?v=3.10#L897
-> nfs_writepage_setup @ http://lxr.free-electrons.com/source/fs/nfs/write.c?v=3.10#L828
-> nfs_setup_write_request via http://lxr.free-electrons.com/source/fs/nfs/write.c?v=3.10#L833
func @ http://lxr.free-electrons.com/source/fs/nfs/write.c?v=3.10#L811
-> nfs_mark_uptodate via http://lxr.free-electrons.com/source/fs/nfs/write.c?v=3.10#L838
func @ http://lxr.free-electrons.com/source/fs/nfs/write.c?v=3.10#L215
-> nfs_mark_request_dirty via http://lxr.free-electrons.com/source/fs/nfs/write.c?v=3.10#L839
func @ http://lxr.free-electrons.com/source/fs/nfs/write.c?v=3.10#L466
-> nfs_unlock_and_release_request via http://lxr.free-electrons.com/source/fs/nfs/write.c?v=3.10#L840
func @ http://lxr.free-electrons.com/source/fs/nfs/pagelist.c?v=3.10#L206
Apr 18 21:01:26 tsm1 kernel: [<ffffffffa06fac11>] nfs_write_end+0x121/0x350 [nfs] http://lxr.free-electrons.com/source/fs/nfs/file.c?v=3.10#L404
Apr 18 21:01:26 tsm1 kernel: [<ffffffff811564e4>] generic_file_buffered_write+0x184/0x290 http://lxr.free-electrons.com/source/mm/filemap.c?v=3.10#L2393
via http://lxr.free-electrons.com/source/mm/filemap.c?v=3.10#L2301
via write_end: http://lxr.free-electrons.com/source/mm/filemap.c?v=3.10#L2357
mapped as http://lxr.free-electrons.com/source/fs/nfs/file.c?v=3.10#L537 to nfs_write_end
Apr 18 21:01:26 tsm1 kernel: [<ffffffff811585e5>] __generic_file_aio_write+0x1d5/0x3e0 http://lxr.free-electrons.com/source/mm/filemap.c?v=3.10#L2432
does not do O_DIRECT, but generic_file_buffered_write
Apr 18 21:01:26 tsm1 kernel: [<ffffffff8115884d>] generic_file_aio_write+0x5d/0xc0 http://lxr.free-electrons.com/source/mm/filemap.c?v=3.10#L2540
Apr 18 21:01:26 tsm1 kernel: [<ffffffffa06f9d1b>] nfs_file_write+0xbb/0x1d0 [nfs] via http://lxr.free-electrons.com/source/fs/nfs/file.c?v=3.10#L925 to
http://lxr.free-electrons.com/source/fs/nfs/file.c?v=3.10#L613
does *not* do O_DIRECT, O_APPEND write from here, but rather: generic_file_aio_write
Apr 18 21:01:26 tsm1 kernel: [<ffffffff811c5e2d>] do_sync_write+0x8d/0xd0 http://lxr.free-electrons.com/source/fs/read_write.c?v=3.10#L383
Apr 18 21:01:26 tsm1 kernel: [<ffffffff811c65cd>] vfs_write+0xbd/0x1e0 http://lxr.free-electrons.com/source/fs/read_write.c?v=3.10#L430
Apr 18 21:01:26 tsm1 kernel: [<ffffffff811c60b4>] ? generic_file_llseek+0x24/0x30 http://lxr.free-electrons.com/source/fs/read_write.c?v=3.10#L137
Apr 18 21:01:26 tsm1 kernel: [<ffffffff811c7018>] SyS_write+0x58/0xb0
Apr 18 21:01:26 tsm1 kernel: [<ffffffff81613da9>] system_call_fastpath+0x16/0x1b http://lxr.free-electrons.com/source/arch/x86/kernel/entry_64.S?v=3.10#L635
@gniuk
Copy link

gniuk commented Jan 12, 2017

I have a similar issue(and this issue has occured several times), a process hanging on NFS writing, the process came into D state(uninterruptable sleep state) and can't be killed.

  1. The hanging process's stack is as follow:
    [] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs]
    [] nfs_wait_on_request+0x33/0x40 [nfs]
    [] nfs_updatepage+0x121/0x8a0 [nfs]
    [] nfs_write_end+0x121/0x350 [nfs]
    [] generic_file_buffered_write+0x184/0x290
    [] __generic_file_aio_write+0x1d5/0x3e0
    [] generic_file_aio_write+0x5d/0xc0
    [] nfs_file_write+0xbb/0x1d0 [nfs]
    [] do_sync_write+0x8d/0xd0
    [] vfs_write+0xbd/0x1e0
    [] SyS_write+0x7f/0xe0
    [] system_call_fastpath+0x16/0x1b
    [] 0xffffffffffffffff

  2. The process status is as follow:
    Name: python
    State: D (disk sleep)
    Tgid: 12585
    Ngid: 13108
    Pid: 12585
    PPid: 1
    TracerPid: 0
    Uid: 1001 1001 1001 1001
    Gid: 1001 1001 1001 1001
    FDSize: 64
    Groups: 10 1001
    VmPeak: 251223144 kB
    VmSize: 251051100 kB
    VmLck: 0 kB
    VmPin: 0 kB
    VmHWM: 23502332 kB
    VmRSS: 148884 kB
    VmData: 249587980 kB
    VmStk: 136 kB
    VmExe: 4 kB
    VmLib: 409248 kB
    VmPTE: 47444 kB
    VmSwap: 22780916 kB
    Threads: 1
    SigQ: 10/256532
    SigPnd: 0000000000040100
    ShdPnd: 0000000000004322
    SigBlk: 0000000000000000
    SigIgn: 0000000001381000
    SigCgt: 0000000180000202
    CapInh: 0000000000000000
    CapPrm: 0000000000000000
    CapEff: 0000000000000000
    CapBnd: 0000001fffffffff
    Seccomp: 0
    Cpus_allowed: ffffff,ffffffff
    Cpus_allowed_list: 0-55
    Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000003
    Mems_allowed_list: 0-1
    voluntary_ctxt_switches: 2022575
    nonvoluntary_ctxt_switches: 134484

  3. The kernel version is RHEL7.2.1511 3.10.0-327.22.2.el7.x86_64

  4. Someone already created an issue on RedHat a week ago, but not yet solved officially.
    https://access.redhat.com/solutions/2245341

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment