Skip to content

Instantly share code, notes, and snippets.

@indygreg
Last active November 9, 2022 15:53
Show Gist options
  • Star 5 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save indygreg/a50e187f5372807cdcab5ac12bc2feea to your computer and use it in GitHub Desktop.
Save indygreg/a50e187f5372807cdcab5ac12bc2feea to your computer and use it in GitHub Desktop.
Slow readdir() or lstat() behavior for parallel directory walks
(This is the content of https://bugreport.apple.com/web/?problemID=45648013.)
Area:
Something not on this list
Summary: Calling readdir() from multiple threads apparently acquires a global kernel lock, making directory traversal operations from multiple processes extremely slow as the number of parallel I/O operations increases.
Steps to Reproduce:
I have a Gist at https://gist.github.com/indygreg/a50e187f5372807cdcab5ac12bc2feea that demonstrates the issue using Python.
Expected Results:
It would be nice if read-only parallel I/O scaled linearly (within reason). Other filesystems (like EXT4) don't exhibit excessive kernel CPU time performing the same type of I/O operations in parallel.
In addition, it would be useful if the APFS developer documentation documented which I/O operations are subject to global locks so developers know how to optimize parallel I/O under APFS.
Actual Results:
Parallel I/O performing readdir() results in excessive CPU time being spent in the kernel acquiring locks.
Version/Build:
I've reproduced on macOS 10.13.6 and 10.14 on a MacBook Pro 15.1. Others have reproduced on other devices running macOS. I assume the issue is intrinsic to APFS.
Performance on 10.14 is noticeably better than 10.13. But performance is still slower lagging, especially when compared to Linux/EXT4.
Configuration:
Nothing special. Was able to reproduce on a fresh 2018 MacBook Pro straight from Apple.
#!/usr/bin/env python
# Any copyright is dedicated to the Public Domain.
# http://creativecommons.org/publicdomain/zero/1.0/
import argparse
import multiprocessing
import os
import sys
import time
def walk(path):
for entry in os.listdir(path):
full = os.path.join(path, entry)
if os.path.isdir(full):
walk(full)
parser = argparse.ArgumentParser()
parser.add_argument('-j', '--jobs', default=multiprocessing.cpu_count(),
type=int,
help='Number of parallel processes')
parser.add_argument('-l', '--limit', default=100,
type=int,
help='Number of recursive walks to perform')
parser.add_argument('path',
help='Directory to walk')
args = parser.parse_args()
pool = multiprocessing.Pool(processes=args.jobs)
t_start = time.time()
for _ in range(args.limit):
pool.apply_async(walk, (args.path,))
pool.close()
pool.join()
t_end = time.time()
duration = t_end - t_start
print('ran %d walks across %d processes in %.3fs' % (
args.limit, args.jobs, duration))
@jessegrosjean
Copy link

Have you got any response to this or found any workarounds?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment