With csysdig
:
Viewing: System Calls For: whole machine
Source: Live System Filter: (proc.name=rsync ) and (syscall.type exists)
CALLS/S TOT TIME AVG TIME SYSCALL
924.00 2.00s 1.08ms lstat
1.50 0ns 0ns sysdigevent
With strace
:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.076780 30 2524 lstat
------ ----------- ----------- --------- --------- ----------------
100.00 0.076780 2524 total
real 0m5.451s
I don't trust the output above, 30 usecs/call
is way too fast for a network roundtrip.
And it doesn't agree with what strace T
shows:
lstat("file1", {st_mode=S_IFREG|0644, st_size=32820, ...}) = 0 <0.005523>
lstat("file2", {st_mode=S_IFREG|0644, st_size=20816, ...}) = 0 <0.001529>
lstat("file3", {st_mode=S_IFREG|0644, st_size=1828312, ...}) = 0 <0.001991>
lstat("file4", {st_mode=S_IFREG|0644, st_size=1823258, ...}) = 0 <0.001326>
lstat("file5", {st_mode=S_IFREG|0644, st_size=32820, ...}) = 0 <0.006562>
lstat("file6", {st_mode=S_IFREG|0644, st_size=22578, ...}) = 0 <0.002151>
lstat("file7", {st_mode=S_IFREG|0644, st_size=32835, ...}) = 0 <0.001705>
lstat("file8", {st_mode=S_IFREG|0644, st_size=25493, ...}) = 0 <0.001492>
lstat("file9", {st_mode=S_IFREG|0644, st_size=1783930, ...}) = 0 <0.001974>
That's about ~2 ms per call, completely different to what -c
shows. No idea why. Now I do: strace -c
measures only CPU time, not wall time, and I should have used strace -c -w
instead.
With csysdig
:
Viewing: System Calls For: whole machine
Source: Live System Filter: (proc.name=glusterfsd ) and (syscall.type exists)
CALLS/S TOT TIME AVG TIME SYSCALL
12.74K 35ms 1.37us lgetxattr
5.50K 21ms 1.90us read
5.09K 11ms 1.06us lstat
2.96K 27ms 4.52us write
2.55K 5.78s 1.10ms futex
1.70K 5.65ms 1.62us llistxattr
871.00 1.92s 1.10ms poll
414.00 2.14ms 2.58us open
414.00 360us 434ns close
12.50 0ns 0ns sysdigevent
6.00 6.00s 500ms select
2.00 2.00s 500ms nanosleep
With strace
:
(This strace is useless, see comment about strace above.)
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
41.39 3.876000 215333 18 select
30.61 2.866877 217 13227 2237 futex
12.22 1.144000 228800 5 nanosleep
11.45 1.072000 178667 6 3 restart_syscall
4.32 0.404365 141 2876 poll
0.01 0.000667 0 43137 25905 lgetxattr
0.01 0.000472 0 18611 read
0.00 0.000417 0 9996 write
0.00 0.000226 0 17231 lstat
0.00 0.000138 0 5753 llistxattr
0.00 0.000077 0 1349 open
0.00 0.000000 0 1349 close
------ ----------- ----------- --------- --------- ----------------
100.00 9.365239 113558 28145 total
real 0m5.988s
I'm seeing this exact issue with gluster, rsync and xfs.
Here's the craziest finding so far. If while rsync is running (or before), I run
/bin/ls
orfind
on the same gluster dirs, it immediately speeds up rsync by a factor of 100 or maybe even 1000. It's absolutely insane.I'm stracing the rsync run, and the slow lstat calls flood at an incredible speed as soon as ls or find run. Several hundred of files per minute (excruciatingly slow) becomes thousands or even tens of thousands of files a second.
Filed this:
http://lists.gluster.org/pipermail/gluster-users/2018-February/033432.html
http://lists.gluster.org/pipermail/gluster-users/2018-February/033433.html