Skip to content

Instantly share code, notes, and snippets.

@nh2
Last active February 4, 2018 21:35
Show Gist options
  • Save nh2/1836415489e2132cf85ed3832105fcc1 to your computer and use it in GitHub Desktop.
Save nh2/1836415489e2132cf85ed3832105fcc1 to your computer and use it in GitHub Desktop.
rsync being slow on lstat() syscalls on GlusterFS (on XFS, not sure if it matters)

rsync syscalls

With csysdig:

Viewing: System Calls For: whole machine
Source: Live System Filter: (proc.name=rsync ) and (syscall.type exists)
  CALLS/S  TOT TIME  AVG TIME SYSCALL
   924.00     2.00s    1.08ms lstat
     1.50       0ns       0ns sysdigevent

With strace:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.076780          30      2524           lstat
------ ----------- ----------- --------- --------- ----------------
100.00    0.076780                  2524           total

real	0m5.451s

strace -c is fishy is the wrong invocation

I don't trust the output above, 30 usecs/call is way too fast for a network roundtrip.

And it doesn't agree with what strace T shows:

lstat("file1", {st_mode=S_IFREG|0644, st_size=32820,   ...}) = 0 <0.005523>
lstat("file2", {st_mode=S_IFREG|0644, st_size=20816,   ...}) = 0 <0.001529>
lstat("file3", {st_mode=S_IFREG|0644, st_size=1828312, ...}) = 0 <0.001991>
lstat("file4", {st_mode=S_IFREG|0644, st_size=1823258, ...}) = 0 <0.001326>
lstat("file5", {st_mode=S_IFREG|0644, st_size=32820,   ...}) = 0 <0.006562>
lstat("file6", {st_mode=S_IFREG|0644, st_size=22578,   ...}) = 0 <0.002151>
lstat("file7", {st_mode=S_IFREG|0644, st_size=32835,   ...}) = 0 <0.001705>
lstat("file8", {st_mode=S_IFREG|0644, st_size=25493,   ...}) = 0 <0.001492>
lstat("file9", {st_mode=S_IFREG|0644, st_size=1783930, ...}) = 0 <0.001974>

That's about ~2 ms per call, completely different to what -c shows. No idea why. Now I do: strace -c measures only CPU time, not wall time, and I should have used strace -c -w instead.

glusterfsd syscalls

With csysdig:

Viewing: System Calls For: whole machine
Source: Live System Filter: (proc.name=glusterfsd ) and (syscall.type exists)
  CALLS/S  TOT TIME  AVG TIME SYSCALL
   12.74K      35ms    1.37us lgetxattr
    5.50K      21ms    1.90us read
    5.09K      11ms    1.06us lstat
    2.96K      27ms    4.52us write
    2.55K     5.78s    1.10ms futex
    1.70K    5.65ms    1.62us llistxattr
   871.00     1.92s    1.10ms poll
   414.00    2.14ms    2.58us open
   414.00     360us     434ns close
    12.50       0ns       0ns sysdigevent
     6.00     6.00s     500ms select
     2.00     2.00s     500ms nanosleep

With strace:

(This strace is useless, see comment about strace above.)

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 41.39    3.876000      215333        18           select
 30.61    2.866877         217     13227      2237 futex
 12.22    1.144000      228800         5           nanosleep
 11.45    1.072000      178667         6         3 restart_syscall
  4.32    0.404365         141      2876           poll
  0.01    0.000667           0     43137     25905 lgetxattr
  0.01    0.000472           0     18611           read
  0.00    0.000417           0      9996           write
  0.00    0.000226           0     17231           lstat
  0.00    0.000138           0      5753           llistxattr
  0.00    0.000077           0      1349           open
  0.00    0.000000           0      1349           close
------ ----------- ----------- --------- --------- ----------------
100.00    9.365239                113558     28145 total

real	0m5.988s
@archon810
Copy link

archon810 commented Feb 4, 2018

I'm seeing this exact issue with gluster, rsync and xfs.

Here's the craziest finding so far. If while rsync is running (or before), I run /bin/ls or find on the same gluster dirs, it immediately speeds up rsync by a factor of 100 or maybe even 1000. It's absolutely insane.

I'm stracing the rsync run, and the slow lstat calls flood at an incredible speed as soon as ls or find run. Several hundred of files per minute (excruciatingly slow) becomes thousands or even tens of thousands of files a second.

Filed this:
http://lists.gluster.org/pipermail/gluster-users/2018-February/033432.html
http://lists.gluster.org/pipermail/gluster-users/2018-February/033433.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment