Instantly share code, notes, and snippets.

Embed
What would you like to do?
Slow readdir() or lstat() behavior for parallel directory walks
(This is the content of https://bugreport.apple.com/web/?problemID=45648013.)
Area:
Something not on this list
Summary: Calling readdir() from multiple threads apparently acquires a global kernel lock, making directory traversal operations from multiple processes extremely slow as the number of parallel I/O operations increases.
Steps to Reproduce:
I have a Gist at https://gist.github.com/indygreg/a50e187f5372807cdcab5ac12bc2feea that demonstrates the issue using Python.
Expected Results:
It would be nice if read-only parallel I/O scaled linearly (within reason). Other filesystems (like EXT4) don't exhibit excessive kernel CPU time performing the same type of I/O operations in parallel.
In addition, it would be useful if the APFS developer documentation documented which I/O operations are subject to global locks so developers know how to optimize parallel I/O under APFS.
Actual Results:
Parallel I/O performing readdir() results in excessive CPU time being spent in the kernel acquiring locks.
Version/Build:
I've reproduced on macOS 10.13.6 and 10.14 on a MacBook Pro 15.1. Others have reproduced on other devices running macOS. I assume the issue is intrinsic to APFS.
Performance on 10.14 is noticeably better than 10.13. But performance is still slower lagging, especially when compared to Linux/EXT4.
Configuration:
Nothing special. Was able to reproduce on a fresh 2018 MacBook Pro straight from Apple.
#!/usr/bin/env python
# Any copyright is dedicated to the Public Domain.
# http://creativecommons.org/publicdomain/zero/1.0/
import argparse
import multiprocessing
import os
import sys
import time
def walk(path):
for entry in os.listdir(path):
full = os.path.join(path, entry)
if os.path.isdir(full):
walk(full)
parser = argparse.ArgumentParser()
parser.add_argument('-j', '--jobs', default=multiprocessing.cpu_count(),
type=int,
help='Number of parallel processes')
parser.add_argument('-l', '--limit', default=100,
type=int,
help='Number of recursive walks to perform')
parser.add_argument('path',
help='Directory to walk')
args = parser.parse_args()
pool = multiprocessing.Pool(processes=args.jobs)
t_start = time.time()
for _ in range(args.limit):
pool.apply_async(walk, (args.path,))
pool.close()
pool.join()
t_end = time.time()
duration = t_end - t_start
print('ran %d walks across %d processes in %.3fs' % (
args.limit, args.jobs, duration))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment