cs2018ncsa/gist:cdf90f5f541a31ad5d9740e9124752bb

## gistfile1.txt
Yeah, the "FG wakeup overhead per log file sync" derived metric definitely shows that your foregrounds wait for log file syncs way longer than the corresponding LGWR write duration. That metric divides the "redo synch time overhead (usec)" with "redo synch writes" to get an average extra latency per foreground log file sync. Normally it's just microseconds, but in the output you sent it was close to 900 milliseconds per sync in some cases.

I've slightly edited one of the snapper output sections and removed some less interesting lines:

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 SID @INST, USERNAME  , TYPE, STATISTIC                               ,         DELTA, HDELTA/SEC,    %TIME, GRAPH       , NUM_WAITS,  WAITS/SEC,   AVERAGES
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  1717  @3, GEN3_MGOWE, STAT, user commits                            ,             6,        .49,         ,             ,          ,           ,        .08 per execution
  1717  @3, GEN3_MGOWE, STAT, messages sent                           ,             6,        .49,         ,             ,          ,           ,        .13 per execution
  1717  @3, GEN3_MGOWE, STAT, calls to kcmgas                         ,             6,        .49,         ,             ,          ,           ,        .27 per execution
  1717  @3, GEN3_MGOWE, STAT, redo entries                            ,            64,       5.21,         ,             ,          ,           ,       1.89 per execution
  1717  @3, GEN3_MGOWE, STAT, redo size                               ,         22532,      1.83k,         ,             ,          ,           ,      5.63k bytes per user commit
  1717  @3, GEN3_MGOWE, STAT, redo synch time                         ,          1190,      96.85,         ,             ,          ,           ,       3.35 per execution
  1717  @3, GEN3_MGOWE, STAT, redo synch time (usec)                  ,      11902840,    968.69k,         ,             ,          ,           ,     33.53k per execution
  1717  @3, GEN3_MGOWE, STAT, redo synch time overhead (usec)         ,       7933229,    645.63k,         ,             ,          ,           ,   274.72ms FG wakeup overhead per log file sync
  1717  @3, GEN3_MGOWE, STAT, redo synch time overhead count (  2ms)  ,             2,        .16,         ,             ,          ,           ,        .06 per execution
  1717  @3, GEN3_MGOWE, STAT, redo synch time overhead count (inf)    ,             4,        .33,         ,             ,          ,           ,        .01 per execution
  1717  @3, GEN3_MGOWE, STAT, redo synch writes                       ,             6,        .49,         ,             ,          ,           ,         .1 per execution
  1717  @3, GEN3_MGOWE, STAT, redo synch long waits                   ,             4,        .33,         ,             ,          ,           ,        .02 per execution
  1717  @3, GEN3_MGOWE, STAT, redo write info find                    ,             6,        .49,         ,             ,          ,           ,        .08 per execution
  1717  @3, GEN3_MGOWE, STAT, undo change vector size                 ,          5664,     460.95,         ,             ,          ,           ,     101.03 per execution
  1717  @3, GEN3_MGOWE, STAT, table fetch by rowid                    ,             6,        .49,         ,             ,          ,           ,       3.33 per execution
  1717  @3, GEN3_MGOWE, STAT, rows fetched via callback               ,             6,        .49,         ,             ,          ,           ,        .35 per execution
  1717  @3, GEN3_MGOWE, WAIT, log file sync                           ,      12202261,   993.06ms,    99.3%, [WWWWWWWWWW],         6,        .49,      2.03s average wait
  1717  @3, GEN3_MGOWE, WAIT, SQL*Net message to client               ,            98,     7.98us,      .0%, [          ],        60,       4.88,     1.63us average wait
  1717  @3, GEN3_MGOWE, WAIT, SQL*Net message from client             ,         94111,     7.66ms,      .8%, [W         ],        60,       4.88,     1.57ms average wait

--  End of Stats snap 247, end=2020-02-05 17:41:57, seconds=12.3

LGWR (as seen in another trace file) was almost completely idle during the problem time.

Also, from the wait events and stats we can see that LGWR did all the work by itself, LG* slaves weren't active (other than polling the current SCN very briefly in one snapshot).

  1954  @3, (LG00)    , WAIT, LGWR worker group idle ,      12000000,    962.4ms,    96.2%, [WWWWWWWWWW],         0,          0,        12s average wait
  2198  @3, (LG01)    , WAIT, LGWR worker group idle ,      12000000,    962.4ms,    96.2%, [WWWWWWWWWW],         0,          0,        12s average wait

Since we don't see any FG metrics like redo synch polls and redo synch poll writes it looks like you're using the plain old post/wait mechanism with LGWR. This looks like a case of missed post to me.

As I understand this is not production, so you could use X$TRACE to dig deeper into POST/WAIT messages. X$TRACE is actually always enabled anyway (just not all events) so it should be safe for prod too, at least I haven't crashed anything with it yet.

You'd need to enable KST event 10005 at level 4 for just one test session and LGWR. I've used the Oracle PID below - 64 is the Oracle PID (not SPID) that my session is using. Would need to do the same for LGWR.

SQL> ALTER TRACING ENABLE "10005:4:64";

Tracing altered.

And since X$TRACE is a circular buffer, you'd need to quickly dump the X$TRACE buckets to disk or just CTAS it somewhere whenever you see a long log file sync. You could use some PL/SQL loop that polls ASH or V$SESSION and does such an INSERT-SELECT from X$TRACE whenever it sees anyone waiting for "log file sync" for longer than 1 sec. You could also make the X$TRACE buffers bigger to have a bit more time to react.

The output would give you a timeline of session wait events and POSTS sent and received. You can use the "relative_time_sec" column to see when the POSTs were actually sent by your FG and received by LGWR ... and vice versa.

SQL> @xt sid=509

  relative_time_sec   event file.c         FUNCTION                     sid data
------------------- ------- -------------- ------------------------ ------- ------------------------------------------------------------------------------------------------------------------------------------
   634363269.049747   10005 ksl2.c@3349    kslwtbctx                    509 KSL WAIT BEG [SQL*Net message from client] 1952673792/0x74637000 1/0x1 0/0x0 wait_id=2764 seq_num=2766 snap_id=1
   634363279.784638   10005 ksl2.c@3492    kslwtectx                    509 KSL WAIT END [SQL*Net message from client] 1952673792/0x74637000 1/0x1 0/0x0 wait_id=2764 seq_num=2766 snap_id=1
   634363279.784641   10005 ksl2.c@3496    kslwtectx                    509 KSL WAIT END wait times (usecs) - snap=10734887, exc=10734887, tot=10734887
   634363279.784850   10005 ksl2.c@19892   ksl_update_post_stats        509 KSL POST SENT postee=20 num=3 loc='ksa2.h LINE:292 ID:ksasnd' flags=0x0 id1=0 id2=0
   634363279.784983   10005 ksl2.c@3349    kslwtbctx                    509 KSL WAIT BEG [Disk file operations I/O] 8/0x8 0/0x0 8/0x8 wait_id=2765 seq_num=2767 snap_id=1
   634363279.785058   10005 ksl2.c@3492    kslwtectx                    509 KSL WAIT END [Disk file operations I/O] 8/0x8 0/0x0 8/0x8 wait_id=2765 seq_num=2767 snap_id=1
   634363279.785060   10005 ksl2.c@3496    kslwtectx                    509 KSL WAIT END wait times (usecs) - snap=77, exc=77, tot=77
   634363279.785207   10005 ksl2.c@3349    kslwtbctx                    509 KSL WAIT BEG [log file sync] 4172/0x104c 143746492/0x89165bc 0/0x0 wait_id=2766 seq_num=2768 snap_id=1
   634363279.785888   10005 ksl2.c@18157   ksliwat                      509 KSL FACILITY WAIT fac#=3 time_waited_usecs=678
   634363279.785895   10005 ksl2.c@18276   ksliwat                      509 KSL POST RCVD poster=22 num=8332 loc='ksl2.h LINE:4384 ID:kslpsr' id1=162 id2=0 name=EV type=0 fac#=3 posted=0x3 may_be_posted=1
   634363279.785904   10005 ksl2.c@3492    kslwtectx                    509 KSL WAIT END [log file sync] 4172/0x104c 143746492/0x89165bc 0/0x0 wait_id=2766 seq_num=2768 snap_id=1
   634363279.785905   10005 ksl2.c@3496    kslwtectx                    509 KSL WAIT END wait times (usecs) - snap=685, exc=685, tot=685
   634363279.785948   10005 ksl2.c@3349    kslwtbctx                    509 KSL WAIT BEG [SQL*Net message to client] 1952673792/0x74637000 1/0x1 0/0x0 wait_id=2767 seq_num=2769 snap_id=1
   634363279.785952   10005 ksl2.c@3492    kslwtectx                    509 KSL WAIT END [SQL*Net message to client] 1952673792/0x74637000 1/0x1 0/0x0 wait_id=2767 seq_num=2769 snap_id=1
   634363279.785953   10005 ksl2.c@3496    kslwtectx                    509 KSL WAIT END wait times (usecs) - snap=6, exc=6, tot=6
   634363279.785964   10005 ksl2.c@3349    kslwtbctx                    509 KSL WAIT BEG [SQL*Net message from client] 1952673792/0x74637000 1/0x1 0/0x0 wait_id=2768 seq_num=2770 snap_id=1

The KSL POST SENT postee=20 shows you the Oracle PID of LGWR in the above example. My FG session sent a wakeup message to LGWR.

The KSL POST RCVD postee=22 shows that the "OK" reply came back from another process (Oracle PID 22). This was the LG00 log writer slave in my case. The LGnn slaves can do post-back to the foregrounds for scalability (so that LGWR itself wouldn't spend so much time in the semaphore operation system calls).

At the end of the day, this would still just confirm that LGWR misses a post for some reason, you'd still need to find which bug causes it.

This doesn't seem to be Exadata, right? LGWR and commit processing with Exafusion & Smart Fusion Block Transfer has gotten more complex (and faster), the GC layer doesn't have to wait for log file sync completion before shipping a modified block to remote instance, but commits on that remote instance still have to confirm that the log write on the original instance had completed. So there are additional potential synchronization/wait points in that configuration. This feature should be exadata-only, but maybe some artifacts from that implementation affect the non-exa config too.

One thing of interest, FG snapper output shows a short "IPC group service call" wait event right after a long log file sync ended and that's related to the Exafusion KSIPC layer.

P.S. For people wondering, I talk about X$TRACE and advanced wait event & post-wait topics in this hacking session: https://www.youtube.com/watch?v=mkmvZv58W6w

## gistfile2.txt
 don't see a clear direct connection between OS directory entry cache and log file sync duration (when it looked like LGWR was idle). Of course sometimes there are surprises and it makes sense in hindsight, but this would be a pretty complex chain of unfortunate events then :)

1) Dentry cache size (and any slowness) would only affect system calls when trying to find, open files etc. Once the file is open, dentries should not be in play as further system calls refer to the file via their file descriptor that points directly to the inode.

2) Or perhaps a large dentry cache just consumes lots of kernel (SLAB/SLUB allocator) memory and when reaching some limit, other system calls that also hope to quickly allocate a standard-sized chunk from SLAB, end up waiting (or even trying to reclaim other kernel memory!)

You could systematically measure this by running pSnapper on LGWR and one of the test FG session processes at the same time. You can sample just those 2 PIDs at first, sampling everything will make pSnapper slower.

You can run something like this in a loop (change 123,456 to your PIDs of interest) - and it would show if those processes were in some uninterruptible sleep (D) when executing a semop system call.

sudo psn -a -G pid,syscall,wchan,kstack -p 123,456

P.S. For others benefit, if you haven't used the Linux Process Snapper yet, you should! ;-)
Examples here: https://tanelpoder.com/psnapper/
	Yeah, the "FG wakeup overhead per log file sync" derived metric definitely shows that your foregrounds wait for log file syncs way longer than the corresponding LGWR write duration. That metric divides the "redo synch time overhead (usec)" with "redo synch writes" to get an average extra latency per foreground log file sync. Normally it's just microseconds, but in the output you sent it was close to 900 milliseconds per sync in some cases.

	I've slightly edited one of the snapper output sections and removed some less interesting lines:

	---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
	SID @INST, USERNAME , TYPE, STATISTIC , DELTA, HDELTA/SEC, %TIME, GRAPH , NUM_WAITS, WAITS/SEC, AVERAGES
	---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
	1717 @3, GEN3_MGOWE, STAT, user commits , 6, .49, , , , , .08 per execution
	1717 @3, GEN3_MGOWE, STAT, messages sent , 6, .49, , , , , .13 per execution
	1717 @3, GEN3_MGOWE, STAT, calls to kcmgas , 6, .49, , , , , .27 per execution
	1717 @3, GEN3_MGOWE, STAT, redo entries , 64, 5.21, , , , , 1.89 per execution
	1717 @3, GEN3_MGOWE, STAT, redo size , 22532, 1.83k, , , , , 5.63k bytes per user commit
	1717 @3, GEN3_MGOWE, STAT, redo synch time , 1190, 96.85, , , , , 3.35 per execution
	1717 @3, GEN3_MGOWE, STAT, redo synch time (usec) , 11902840, 968.69k, , , , , 33.53k per execution
	1717 @3, GEN3_MGOWE, STAT, redo synch time overhead (usec) , 7933229, 645.63k, , , , , 274.72ms FG wakeup overhead per log file sync
	1717 @3, GEN3_MGOWE, STAT, redo synch time overhead count ( 2ms) , 2, .16, , , , , .06 per execution
	1717 @3, GEN3_MGOWE, STAT, redo synch time overhead count (inf) , 4, .33, , , , , .01 per execution
	1717 @3, GEN3_MGOWE, STAT, redo synch writes , 6, .49, , , , , .1 per execution
	1717 @3, GEN3_MGOWE, STAT, redo synch long waits , 4, .33, , , , , .02 per execution
	1717 @3, GEN3_MGOWE, STAT, redo write info find , 6, .49, , , , , .08 per execution
	1717 @3, GEN3_MGOWE, STAT, undo change vector size , 5664, 460.95, , , , , 101.03 per execution
	1717 @3, GEN3_MGOWE, STAT, table fetch by rowid , 6, .49, , , , , 3.33 per execution
	1717 @3, GEN3_MGOWE, STAT, rows fetched via callback , 6, .49, , , , , .35 per execution
	1717 @3, GEN3_MGOWE, WAIT, log file sync , 12202261, 993.06ms, 99.3%, [WWWWWWWWWW], 6, .49, 2.03s average wait
	1717 @3, GEN3_MGOWE, WAIT, SQL*Net message to client , 98, 7.98us, .0%, [ ], 60, 4.88, 1.63us average wait
	1717 @3, GEN3_MGOWE, WAIT, SQL*Net message from client , 94111, 7.66ms, .8%, [W ], 60, 4.88, 1.57ms average wait

	-- End of Stats snap 247, end=2020-02-05 17:41:57, seconds=12.3

	LGWR (as seen in another trace file) was almost completely idle during the problem time.

	Also, from the wait events and stats we can see that LGWR did all the work by itself, LG* slaves weren't active (other than polling the current SCN very briefly in one snapshot).

	1954 @3, (LG00) , WAIT, LGWR worker group idle , 12000000, 962.4ms, 96.2%, [WWWWWWWWWW], 0, 0, 12s average wait
	2198 @3, (LG01) , WAIT, LGWR worker group idle , 12000000, 962.4ms, 96.2%, [WWWWWWWWWW], 0, 0, 12s average wait

	Since we don't see any FG metrics like redo synch polls and redo synch poll writes it looks like you're using the plain old post/wait mechanism with LGWR. This looks like a case of missed post to me.

	As I understand this is not production, so you could use X$TRACE to dig deeper into POST/WAIT messages. X$TRACE is actually always enabled anyway (just not all events) so it should be safe for prod too, at least I haven't crashed anything with it yet.

	You'd need to enable KST event 10005 at level 4 for just one test session and LGWR. I've used the Oracle PID below - 64 is the Oracle PID (not SPID) that my session is using. Would need to do the same for LGWR.

	SQL> ALTER TRACING ENABLE "10005:4:64";

	Tracing altered.

	And since X$TRACE is a circular buffer, you'd need to quickly dump the X$TRACE buckets to disk or just CTAS it somewhere whenever you see a long log file sync. You could use some PL/SQL loop that polls ASH or V$SESSION and does such an INSERT-SELECT from X$TRACE whenever it sees anyone waiting for "log file sync" for longer than 1 sec. You could also make the X$TRACE buffers bigger to have a bit more time to react.

	The output would give you a timeline of session wait events and POSTS sent and received. You can use the "relative_time_sec" column to see when the POSTs were actually sent by your FG and received by LGWR ... and vice versa.

	SQL> @xt sid=509

	relative_time_sec event file.c FUNCTION sid data
	------------------- ------- -------------- ------------------------ ------- ------------------------------------------------------------------------------------------------------------------------------------
	634363269.049747 10005 ksl2.c@3349 kslwtbctx 509 KSL WAIT BEG [SQL*Net message from client] 1952673792/0x74637000 1/0x1 0/0x0 wait_id=2764 seq_num=2766 snap_id=1
	634363279.784638 10005 ksl2.c@3492 kslwtectx 509 KSL WAIT END [SQL*Net message from client] 1952673792/0x74637000 1/0x1 0/0x0 wait_id=2764 seq_num=2766 snap_id=1
	634363279.784641 10005 ksl2.c@3496 kslwtectx 509 KSL WAIT END wait times (usecs) - snap=10734887, exc=10734887, tot=10734887
	634363279.784850 10005 ksl2.c@19892 ksl_update_post_stats 509 KSL POST SENT postee=20 num=3 loc='ksa2.h LINE:292 ID:ksasnd' flags=0x0 id1=0 id2=0
	634363279.784983 10005 ksl2.c@3349 kslwtbctx 509 KSL WAIT BEG [Disk file operations I/O] 8/0x8 0/0x0 8/0x8 wait_id=2765 seq_num=2767 snap_id=1
	634363279.785058 10005 ksl2.c@3492 kslwtectx 509 KSL WAIT END [Disk file operations I/O] 8/0x8 0/0x0 8/0x8 wait_id=2765 seq_num=2767 snap_id=1
	634363279.785060 10005 ksl2.c@3496 kslwtectx 509 KSL WAIT END wait times (usecs) - snap=77, exc=77, tot=77
	634363279.785207 10005 ksl2.c@3349 kslwtbctx 509 KSL WAIT BEG [log file sync] 4172/0x104c 143746492/0x89165bc 0/0x0 wait_id=2766 seq_num=2768 snap_id=1
	634363279.785888 10005 ksl2.c@18157 ksliwat 509 KSL FACILITY WAIT fac#=3 time_waited_usecs=678
	634363279.785895 10005 ksl2.c@18276 ksliwat 509 KSL POST RCVD poster=22 num=8332 loc='ksl2.h LINE:4384 ID:kslpsr' id1=162 id2=0 name=EV type=0 fac#=3 posted=0x3 may_be_posted=1
	634363279.785904 10005 ksl2.c@3492 kslwtectx 509 KSL WAIT END [log file sync] 4172/0x104c 143746492/0x89165bc 0/0x0 wait_id=2766 seq_num=2768 snap_id=1
	634363279.785905 10005 ksl2.c@3496 kslwtectx 509 KSL WAIT END wait times (usecs) - snap=685, exc=685, tot=685
	634363279.785948 10005 ksl2.c@3349 kslwtbctx 509 KSL WAIT BEG [SQL*Net message to client] 1952673792/0x74637000 1/0x1 0/0x0 wait_id=2767 seq_num=2769 snap_id=1
	634363279.785952 10005 ksl2.c@3492 kslwtectx 509 KSL WAIT END [SQL*Net message to client] 1952673792/0x74637000 1/0x1 0/0x0 wait_id=2767 seq_num=2769 snap_id=1
	634363279.785953 10005 ksl2.c@3496 kslwtectx 509 KSL WAIT END wait times (usecs) - snap=6, exc=6, tot=6
	634363279.785964 10005 ksl2.c@3349 kslwtbctx 509 KSL WAIT BEG [SQL*Net message from client] 1952673792/0x74637000 1/0x1 0/0x0 wait_id=2768 seq_num=2770 snap_id=1

	The KSL POST SENT postee=20 shows you the Oracle PID of LGWR in the above example. My FG session sent a wakeup message to LGWR.

	The KSL POST RCVD postee=22 shows that the "OK" reply came back from another process (Oracle PID 22). This was the LG00 log writer slave in my case. The LGnn slaves can do post-back to the foregrounds for scalability (so that LGWR itself wouldn't spend so much time in the semaphore operation system calls).

	At the end of the day, this would still just confirm that LGWR misses a post for some reason, you'd still need to find which bug causes it.

	This doesn't seem to be Exadata, right? LGWR and commit processing with Exafusion & Smart Fusion Block Transfer has gotten more complex (and faster), the GC layer doesn't have to wait for log file sync completion before shipping a modified block to remote instance, but commits on that remote instance still have to confirm that the log write on the original instance had completed. So there are additional potential synchronization/wait points in that configuration. This feature should be exadata-only, but maybe some artifacts from that implementation affect the non-exa config too.

	One thing of interest, FG snapper output shows a short "IPC group service call" wait event right after a long log file sync ended and that's related to the Exafusion KSIPC layer.

	P.S. For people wondering, I talk about X$TRACE and advanced wait event & post-wait topics in this hacking session: https://www.youtube.com/watch?v=mkmvZv58W6w
	don't see a clear direct connection between OS directory entry cache and log file sync duration (when it looked like LGWR was idle). Of course sometimes there are surprises and it makes sense in hindsight, but this would be a pretty complex chain of unfortunate events then :)

	1) Dentry cache size (and any slowness) would only affect system calls when trying to find, open files etc. Once the file is open, dentries should not be in play as further system calls refer to the file via their file descriptor that points directly to the inode.

	2) Or perhaps a large dentry cache just consumes lots of kernel (SLAB/SLUB allocator) memory and when reaching some limit, other system calls that also hope to quickly allocate a standard-sized chunk from SLAB, end up waiting (or even trying to reclaim other kernel memory!)

	You could systematically measure this by running pSnapper on LGWR and one of the test FG session processes at the same time. You can sample just those 2 PIDs at first, sampling everything will make pSnapper slower.

	You can run something like this in a loop (change 123,456 to your PIDs of interest) - and it would show if those processes were in some uninterruptible sleep (D) when executing a semop system call.

	sudo psn -a -G pid,syscall,wchan,kstack -p 123,456

	P.S. For others benefit, if you haven't used the Linux Process Snapper yet, you should! ;-)
	Examples here: https://tanelpoder.com/psnapper/