Skip to content

Instantly share code, notes, and snippets.

Last active May 19, 2022
What would you like to do?
Slow readdir() or lstat() behavior for parallel directory walks
(This is the content of
Something not on this list
Summary: Calling readdir() from multiple threads apparently acquires a global kernel lock, making directory traversal operations from multiple processes extremely slow as the number of parallel I/O operations increases.
Steps to Reproduce:
I have a Gist at that demonstrates the issue using Python.
Expected Results:
It would be nice if read-only parallel I/O scaled linearly (within reason). Other filesystems (like EXT4) don't exhibit excessive kernel CPU time performing the same type of I/O operations in parallel.
In addition, it would be useful if the APFS developer documentation documented which I/O operations are subject to global locks so developers know how to optimize parallel I/O under APFS.
Actual Results:
Parallel I/O performing readdir() results in excessive CPU time being spent in the kernel acquiring locks.
I've reproduced on macOS 10.13.6 and 10.14 on a MacBook Pro 15.1. Others have reproduced on other devices running macOS. I assume the issue is intrinsic to APFS.
Performance on 10.14 is noticeably better than 10.13. But performance is still slower lagging, especially when compared to Linux/EXT4.
Nothing special. Was able to reproduce on a fresh 2018 MacBook Pro straight from Apple.
#!/usr/bin/env python
# Any copyright is dedicated to the Public Domain.
import argparse
import multiprocessing
import os
import sys
import time
def walk(path):
for entry in os.listdir(path):
full = os.path.join(path, entry)
if os.path.isdir(full):
parser = argparse.ArgumentParser()
parser.add_argument('-j', '--jobs', default=multiprocessing.cpu_count(),
help='Number of parallel processes')
parser.add_argument('-l', '--limit', default=100,
help='Number of recursive walks to perform')
help='Directory to walk')
args = parser.parse_args()
pool = multiprocessing.Pool(
t_start = time.time()
for _ in range(args.limit):
pool.apply_async(walk, (args.path,))
t_end = time.time()
duration = t_end - t_start
print('ran %d walks across %d processes in %.3fs' % (
args.limit,, duration))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment