pkgsrc builds on macOS have historically been significantly more unreliable than all other operating systems.
In my most recent builds there were over 400 instances of the error:
pkg_add: no pkg found for '<pkgname>', sorry.
This never happens on other OS, even though I use NFS for the packages directory on illumos, Linux, NetBSD, and macOS.
Two separate chroots. Each has the packages directory NFS mounted.
In the first session, repeatedly update the directory:
$ while true; do
dd if=/dev/random of=/Volumes/data/packages/Darwin/12.3/arm64/All/blah.tgz count=1000
rm -f /Volumes/data/packages/Darwin/12.3/arm64/All/blah.tgz
done
Then attempt to simply install a package in the other session:
$ pkg_add digest-20220214
pkg_add: no pkg found for 'digest-20220214', sorry.
pkg_add: 1 package addition failed
dtruss(1) shines some light:
$ dtruss pkg_add digest-20220214
...
open_nocancel("/Volumes/data/packages/Darwin/12.3/arm64/All\0", 0x1100004, 0x0) = 3 0
fstatfs64(0x3, 0x16FCAED50, 0x0) = 0 0
getdirentries64(0x3, 0x147809000, 0x2000) = 8112 0
getdirentries64(0x3, 0x147809000, 0x2000) = 8168 0
getdirentries64(0x3, 0x147809000, 0x2000) = 8112 0
getdirentries64(0x3, 0x147809000, 0x2000) = 8176 0
getdirentries64(0x3, 0x147809000, 0x2000) = 8136 0
getdirentries64(0x3, 0x147809000, 0x2000) = 8128 0
getdirentries64(0x3, 0x147809000, 0x2000) = -1 Err#2
close_nocancel(0x3) = 0 0
write_nocancel(0x2, "pkg_add: \0", 0x9) = 9 0
write_nocancel(0x2, "no pkg found for 'digest-20220214', sorry.\0", 0x2A) = 42 0
write_nocancel(0x2, "\n\0", 0x1) = 1 0
write_nocancel(0x2, "pkg_add: \0", 0x9) = 9 0
write_nocancel(0x2, "1 package addition failed\0", 0x19) = 25 0
write_nocancel(0x2, "\n\0", 0x1) = 1 0
dtrace for the stack:
$ dtrace -n 'syscall::getdirentries64:return/errno != 0/ {ustack();}' -c "pkg_add digest-20220214"
dtrace: description 'syscall::getdirentries64:return' matched 1 probe
pkg_add: no pkg found for 'digest-20220214', sorry.
pkg_add: 1 package addition failed
dtrace: pid 14747 has exited
CPU ID FUNCTION:NAME
2 855 getdirentries64:return
libsystem_kernel.dylib`__getdirentries64+0x8
libsystem_c.dylib`readdir+0x2c
pkg_add`fetchListFile+0x4c
pkg_add`find_best_package_int+0x114
pkg_add`find_best_package+0x78
pkg_add`find_archive+0x118
pkg_add`pkg_do+0x4c
pkg_add`pkg_perform+0x38
pkg_add`main+0x314
dyld`start+0x8bc
getdirentries64()
is returning ENOENT
due to the readdir being invalidated with the directory being updated.
fetchListFile()
is just doing a bog-standard opendir()
/readdir()
/closedir()
loop the same as many other pieces of software, as well as manual page examples, with no ENOENT
handling whatsoever.
Why is this only a problem on macOS?
Looking around in lots of third party code, as well as manual page examples, none of them have any handling for retrying readdir()
loops after checking for ENOENT
.
The macOS manual page doesn't even list ENOENT
as a valid errno!
Stupid workaround is to increase NFS sizes:
$ mount_nfs -o rwsize=1048576,dsize=1048576
which doesn't make the problem go away but does make it significantly less likely.
Yes please!