Supermathie/gist:7996969

## gistfile1.txt
Description of problem:
A specific customer workload is causing abnormally high system load on RHEL6, whereas the code did not cause the same under RHEL4. The abnormal load relates to the dnotify / inotify subsystems.

Version-Release number of selected component (if applicable):
Tests run on 2.6.32-279.2.1.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Run test case: ./msys_sim -c 50 -m 1024 -d /mnt/tmp
1. wait; load goes up over time
1. See http://i.imgur.com/LJFbt99.png

Actual results:
1. latency across the entire system is seriously affected
1. cpu system % goes up constantly
1. kernel is spending nearly all of its time contending for a spin lock
1. the rest of the time kernel is in __fsnotify_update_child_dentry_flags

Expected results:
1. OS should cruise along smoothly (http://i.imgur.com/TLElwsw.png)

Additional info:
* mount line for filesystem used for test is:
/dev/mapper/sysvg-msys /mnt/tmp ext4 rw,seclabel,relatime,barrier=1,data=ordered 0 0

* Test case has been generated as a mockup of a real production system
* The high load is generated inside the close() call on the inotify fd or presumably DN_CREATE in the dnotify case:

system which has been running this test case for a while:
inotify_init()                          = 4 <0.000020>
inotify_add_watch(4, "/mnt/tmp/msys_sim/QUEUES/Child_032", IN_CREATE) = 1 <0.040385>
write(1, "Child [032] sleeping\n", 21)  = 21 <0.000903>
read(4, "\1\0\0\0\0\1\0\0\0\0\0\0\20\0\0\0SrcFile.mQgUSh\0\0", 512) = 32 <0.023423>
inotify_rm_watch(4, 1)                  = 0 <0.000012>
close(4)                                = 0 <0.528736>

* it is possible to avoid the problem by using inotify without re-initializing it every time - this avoids the teardown
* it is possible to avoid the problem by using dnotify and DN_MULTISHOT - again avoiding the teardown
* unfortunately, avoiding the problem using either of the previous two workarounds will not work for the production application
* the test case generates 256K files in a single directory. Modifying the test case to use two-level buckets instead of a single directory reduces the amount of user% consumed but DOES NOT AFFECT the system% cpu
* strangely, calling mount with MS_REMOUNT seems to clean up the problem - the system% drops down to zero
	Description of problem:
	A specific customer workload is causing abnormally high system load on RHEL6, whereas the code did not cause the same under RHEL4. The abnormal load relates to the dnotify / inotify subsystems.

	Version-Release number of selected component (if applicable):
	Tests run on 2.6.32-279.2.1.el6.x86_64

	How reproducible:
	100%

	Steps to Reproduce:
	1. Run test case: ./msys_sim -c 50 -m 1024 -d /mnt/tmp
	1. wait; load goes up over time
	1. See http://i.imgur.com/LJFbt99.png

	Actual results:
	1. latency across the entire system is seriously affected
	1. cpu system % goes up constantly
	1. kernel is spending nearly all of its time contending for a spin lock
	1. the rest of the time kernel is in __fsnotify_update_child_dentry_flags

	Expected results:
	1. OS should cruise along smoothly (http://i.imgur.com/TLElwsw.png)

	Additional info:
	* mount line for filesystem used for test is:
	/dev/mapper/sysvg-msys /mnt/tmp ext4 rw,seclabel,relatime,barrier=1,data=ordered 0 0

	* Test case has been generated as a mockup of a real production system
	* The high load is generated inside the close() call on the inotify fd or presumably DN_CREATE in the dnotify case:

	system which has been running this test case for a while:
	inotify_init() = 4 <0.000020>
	inotify_add_watch(4, "/mnt/tmp/msys_sim/QUEUES/Child_032", IN_CREATE) = 1 <0.040385>
	write(1, "Child [032] sleeping\n", 21) = 21 <0.000903>
	read(4, "\1\0\0\0\0\1\0\0\0\0\0\0\20\0\0\0SrcFile.mQgUSh\0\0", 512) = 32 <0.023423>
	inotify_rm_watch(4, 1) = 0 <0.000012>
	close(4) = 0 <0.528736>

	* it is possible to avoid the problem by using inotify without re-initializing it every time - this avoids the teardown
	* it is possible to avoid the problem by using dnotify and DN_MULTISHOT - again avoiding the teardown
	* unfortunately, avoiding the problem using either of the previous two workarounds will not work for the production application
	* the test case generates 256K files in a single directory. Modifying the test case to use two-level buckets instead of a single directory reduces the amount of user% consumed but DOES NOT AFFECT the system% cpu
	* strangely, calling mount with MS_REMOUNT seems to clean up the problem - the system% drops down to zero