Skip to content

Instantly share code, notes, and snippets.

@Supermathie
Created December 16, 2013 23:34
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Supermathie/7996969 to your computer and use it in GitHub Desktop.
Save Supermathie/7996969 to your computer and use it in GitHub Desktop.
bug 1043693
Description of problem:
A specific customer workload is causing abnormally high system load on RHEL6, whereas the code did not cause the same under RHEL4. The abnormal load relates to the dnotify / inotify subsystems.
Version-Release number of selected component (if applicable):
Tests run on 2.6.32-279.2.1.el6.x86_64
How reproducible:
100%
Steps to Reproduce:
1. Run test case: ./msys_sim -c 50 -m 1024 -d /mnt/tmp
1. wait; load goes up over time
1. See http://i.imgur.com/LJFbt99.png
Actual results:
1. latency across the entire system is seriously affected
1. cpu system % goes up constantly
1. kernel is spending nearly all of its time contending for a spin lock
1. the rest of the time kernel is in __fsnotify_update_child_dentry_flags
Expected results:
1. OS should cruise along smoothly (http://i.imgur.com/TLElwsw.png)
Additional info:
* mount line for filesystem used for test is:
/dev/mapper/sysvg-msys /mnt/tmp ext4 rw,seclabel,relatime,barrier=1,data=ordered 0 0
* Test case has been generated as a mockup of a real production system
* The high load is generated inside the close() call on the inotify fd or presumably DN_CREATE in the dnotify case:
system which has been running this test case for a while:
inotify_init() = 4 <0.000020>
inotify_add_watch(4, "/mnt/tmp/msys_sim/QUEUES/Child_032", IN_CREATE) = 1 <0.040385>
write(1, "Child [032] sleeping\n", 21) = 21 <0.000903>
read(4, "\1\0\0\0\0\1\0\0\0\0\0\0\20\0\0\0SrcFile.mQgUSh\0\0", 512) = 32 <0.023423>
inotify_rm_watch(4, 1) = 0 <0.000012>
close(4) = 0 <0.528736>
* it is possible to avoid the problem by using inotify without re-initializing it every time - this avoids the teardown
* it is possible to avoid the problem by using dnotify and DN_MULTISHOT - again avoiding the teardown
* unfortunately, avoiding the problem using either of the previous two workarounds will not work for the production application
* the test case generates 256K files in a single directory. Modifying the test case to use two-level buckets instead of a single directory reduces the amount of user% consumed but DOES NOT AFFECT the system% cpu
* strangely, calling mount with MS_REMOUNT seems to clean up the problem - the system% drops down to zero
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment