Skip to content

Instantly share code, notes, and snippets.

@joshtrichards
Last active October 2, 2023 12:24
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save joshtrichards/e245aa5cd402b8c0c485a3945ba3ce77 to your computer and use it in GitHub Desktop.
Save joshtrichards/e245aa5cd402b8c0c485a3945ba3ce77 to your computer and use it in GitHub Desktop.
WebDAV Client Small Files Performance Testing

Rankings

Results, ranked so far (executed via unix time command where possible; balance based on wall clock observation cross-verified with log files):

  1. Cyberduck: 12 seconds (!!!)
  2. cURL + NC's Bulk File API: 50 seconds
  3. Duck.sh: 1 minute 10 seconds
  4. cURL (parallel mode aka -Z): 2 minutes 20 seconds
  5. Windows 10: 4 minutes 45 seconds
  6. Rclone: 7 minutes 41 seconds
  7. cURL w/o parallel mode: 8 minutes
  8. davfs2: ~17 minutes
  9. WinSCP: ~26 minutes (ouch)
  • Average: 7 minutes 55 seconds (both rclone and unoptimized cURL seem about average)
  • Median: 4 minutes 45 seconds (Windows 10)
  • Best: 12 seconds (Cyberduck)
  • Worst: 26 minutes (WinSCP)

The basics of the setup

  • 1000 files
  • single folder
  • Nextcloud v26
    • NC v26.0.1 Community Docker (nextcloud:26-apache)
    • Redis 7 (redis:7.0-bullseye)
    • MariaDB 10.6 (mariadb:10.6)
  • HTTP
  • No RP
  • Test case: upload all files into an empty folder on the destination

Couple upfront notes here:

  • The absolute numbers aren't the most important thing here since variances in environments will cause them to vary somewhat
  • The relative numbers are relevant as they suggest differences among implementations and - more importantly - opportunities for optimization
  • WebDAV Server implementations, configurations, and capabilities certainly matter a great deal and can impact performance dramatically, but nothing server-side can overcome clients that are not well optimized

Some observations so far:

  • Some clients get good performance while heavily saturating the backend while others get poor performance while doing so - still others seem to find a better mix overall
  • While performance matters on both ends - client and server - WebDAV is a protocol highly sensitive to client-side design choices no matter how reasonable the server is
  • Some use cases are easier to optimize for than others, either within the client implementation or by the user through client configuration (though it's not always obvious what should be adjusted without getting into the weeds a bit)

Clients evaluated (so far):

  • cURL v7.81.0
  • rclone v1.53.3, v1.62.2, v1.63.0-beta-6867, v1.63-0-beta-6974
  • CyberDuck v8.5.9
  • Duck.sh v8.5.9 (Cyberduck's CLI)
  • Windows 10 Pro (v22H2 / 19045.2846)
  • WinSCP v5.21.7
  • davfs2 v1.61

Upcoming:

  • Mountain Duck
  • ??? <-- insert your favorite client/implementation here
  • Other server implementations
  • Nextcloud official Desktop client
  • Nextcloud official Android client
  • Nextcloud official iOS client

Test files

#!/bin/bash

for i in {0001..1000}
do
  echo "some some some some some some some some some some some text" > "file_${i}.txt"
done

Observations

General

  • The performance differences among client implementations (and, in some cases, client configuration parameters) are not minor and often well beyond order of magnitude.
  • Unless you are only transfering a file or two (or upload/write performance... and CPU/memory impact on the backend environment is irrelevant in your use case) doing some optimization in terms of client choice or configuration is probably worthwhile

Key differences among clients so far that seem to play a role in performance:

  • Frequency / choices regarding how/when to call MKCOL
  • Frequency of / choices regarding how/when to call PROPFIND
  • Depth decisions
  • Default use cases (and user awareness of trade-offs)
  • Authenticated session management (or lack thereof)
  • Parallel jobs
  • Third-party WebDAV libraries

Additional speculation:

  • XML processing efficiency

Notes arising from testing specific clients

Rclone

Used a copy, but also tried many variations of --transfers and --checkers etc with no serious variance in performance

  • Flags such as --use-server-modtime don't appear to do anything in the WebDAV backend
  • Using --no-check-dest doesn't cut down on MKCOL and PROPFIND calls before/after every upload
  • The parent directory is requested to be remade (via a MKCOL) prior to every individual file upload which seems excessive and unnecessary - and it's not a light call to make to the backend

Perhaps some flags would change behavior? Not sure.

cURL (parallel mode)

time curl -Z --parallel-max 8 -u USERNAME:PASSWORD -T "file_[0001-1000].txt" "http://nc-test.local:3680/remote.php/dav/files/USERNAME/curl_multi_test/"

duck.sh

I expected this to be comparable in performance to Cyberduck since they seem to use the same backend code, but that was not the case. Perhaps some flags would change behavior? Not sure.


Tips for Isolating Performance Issues

  • Check transaction times on server-side logs to determine what clients are doing differently in terms of WebDAV queries
  • Wireshark or similar necessary to get into details of session state, HTTP modes, etc.
  • Server side performance baselines for single file transfers can be assessed in quick and dirty manner with cURL since all WebDAV transactions can be implemented around it
  • For transactions involving many files that are either small or a mixed size range (and certainly mounts), clients that manage session state and transactions in parallel will nearly always perform better
  • Match your use case to the tool (i.e. pick a WebDAV client focused on your use case or make sure you're using it's available configuration options in a way appropriate to your use case)
  • Some clients (and use cases) do things to make things appear "fast" from the user experience side of things, but the data isn't fully in sync on the server side and thus I don't consider it "uploaded" (e.g. davfs2) until the data is safely stored where I expect it to be (i.e. on the server)
  • TLS (HTTPS) will impact performance a great deal if you're a home/small organization user hosting things on a Raspberry Pi or a materially older or particularly low-end x86 CPU that lacks AES-NI. These can certainly be overcome by swapping to more capable hardware, but another option may be HTTPS offloading or switching to HTTP-only restricted to running solely over a high performance (even on low end hardware) VPN implementation like Wireguard. It also may be possible to use/force the use of ChaCha20-Poly1305 for HTTPS transactions
  • Intermediate proxy (e.g. an RP) settings likely can impact performance a great deal
  • Try to avoid complexity: e.g. having lots of small file transactions via WebDAV that go to a server that has an underlying volume hosted over an sshfs mount is going to inherently not just make things more fragile, but impact performance server-side no matter what clients do (but that's no excuse for clients to be so widely varied in optimization as the above testing has shown!)
  • Seemingly tiny transactions (like extra MKCOLs) that add ~0.250ms each (an educated guess not a precise figure) add up over 1000s of files
  • Be careful with extraneous server-side logging and client-side debugging/verbosity options impacting performance - particularly if you're looking to do absolute performance comparisons/assessments

TODO

  • Test additional client implementations
  • Dig deeper into the 90th percentile high performing ones
  • Dig deeper into the lowest percentile low performing ones
  • Submit suggestions to maintainers where appropriate
  • Submit PRs to maintainers where an option
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment