So, ps is running somewhat slowly...
asgard # time ps -ef | wc -l
173
real 0m10.195s
user 0m0.023s
sys 0m9.779s
Further investigation suggests that it's slow only while trying to print lines for certain processes. Further, further investigation suggests that the processes which cause the epic slowdown appear with two question marks in the TTY column instead of only one. Thus:
slow:
asgard # time ps -fp 28507
UID PID PPID C STIME TTY TIME CMD
leftwing 28507 28504 0 12:39:37 ?? 0:00 -bash
real 0m0.749s
user 0m0.003s
sys 0m0.720s
fast:
asgard # time ps -fp 2433
UID PID PPID C STIME TTY TIME CMD
daemon 2433 1 0 Dec 08 ? 0:03 /usr/lib/nfs/nfsmapid
real 0m0.009s
user 0m0.002s
sys 0m0.007s
what are we doing when we're slow...
asgard # dtrace -qn 'syscall:::entry { @[probefunc] = count(); }' -c "ps -fp 28507"
UID PID PPID C STIME TTY TIME CMD
leftwing 28507 28504 0 12:39:37 ?? 0:00 -bash
lwp_exit 1
sigsuspend 1
writev 1
getpeername 2
getsockname 2
mprotect 2
pipe 2
pwrite 2
recv 2
semsys 2
send 2
access 3
accept 4
forksys 4
getgid 4
getsockopt 4
open64 4
setpgrp 4
sigpending 4
sigprocmask 4
umask 4
rexit 5
sysi86 5
setsockopt 6
exece 7
readv 7
getrlimit 8
schedctl 8
connect 10
systeminfo 10
priocntlsys 12
so_socket 12
munmap 13
readlink 16
waitsys 16
getuid 19
lstat 21
setcontext 21
llseek 25
lwp_park 26
sysconfig 29
lseek 30
sigaction 30
portfs 32
pollsys 33
doorfs 45
write 48
getpid 49
lwp_sigmask 62
openat 63
fstat64 66
nanosleep 71
sigwait 76
mmapobj 77
memcntl 85
read 88
clock_gettime 98
mmap 105
resolvepath 107
fcntl 118
brk 126
getdents 126
stat64 130
fstat 181
gtime 183
pread 185
open 199
p_online 256
close 266
ioctl 755
stat 921
and when we're fast:
asgard # dtrace -qn 'syscall:::entry { @[probefunc] = count(); }' -c "ps -fp 2433"
UID PID PPID C STIME TTY TIME CMD
daemon 2433 1 0 Dec 08 ? 0:03 /usr/lib/nfs/nfsmapid
access 1
exece 1
fcntl 1
lwp_exit 1
mmapobj 1
rexit 1
sysi86 1
fstat64 2
getrlimit 2
mprotect 2
pwrite 2
systeminfo 2
doorfs 3
getpid 3
getuid 3
llseek 3
lwp_park 3
memcntl 3
setcontext 4
sigaction 4
munmap 5
pollsys 5
sigwait 5
sysconfig 5
lseek 6
readlink 14
resolvepath 17
mmap 18
lstat 19
stat 19
brk 20
gtime 22
read 22
lwp_sigmask 25
fstat 27
close 32
open 36
write 38
pread 185
p_online 256
ioctl 291
So, clearly there are some additional stat()s going on when we're slow:
asgard # dtrace -qn 'syscall::stat:entry { @[ustack()] = count(); }' -c "ps -fp 28507"
UID PID PPID C STIME TTY TIME CMD
leftwing 28507 28504 0 12:39:37 ?? 0:00 -bash
...
(snip)
...
libc.so.1`_syscall6+0x1b
libc.so.1`srch_dir+0x3db
libc.so.1`srch_dir+0x4e0
libc.so.1`srch_dir+0x4e0
libc.so.1`_ttyname_common+0x2a2
libc.so.1`_ttyname_dev+0x26
ps`gettty+0x3d
ps`prfind+0x4b
ps`print_proc+0x27f
ps`stdmain+0x1221
ps`0x40317c
239
libc.so.1`_syscall6+0x1b
libc.so.1`srch_dir+0x3db
libc.so.1`_ttyname_common+0x2a2
libc.so.1`_ttyname_dev+0x26
ps`gettty+0x3d
ps`prfind+0x4b
ps`print_proc+0x27f
ps`stdmain+0x1221
ps`0x40317c
247
libc.so.1`_syscall6+0x1b
libc.so.1`srch_dir+0x3db
libc.so.1`srch_dir+0x4e0
libc.so.1`_ttyname_common+0x2a2
libc.so.1`_ttyname_dev+0x26
ps`gettty+0x3d
ps`prfind+0x4b
ps`print_proc+0x27f
ps`stdmain+0x1221
ps`0x40317c
264
So now that I look at it, all of the slow/double question mark processes are processes that do have ttys and also happen to live in non-global zones....
asgard # time ps -z global | wc -l
79
real 0m0.017s
user 0m0.005s
sys 0m0.018s
asgard # time ps -z tutor | wc -l
68
real 0m10.222s
user 0m0.021s
sys 0m9.810s
looking into ps.c... it looks like if a process is identified as having a pty, we'll go and look up the number of that pty by searching for the appropriate device nodes in places listed in /etc/ttysrch. On my box, that file contains:
/dev/pts
/dev/vt
/dev/term
/dev/zcons
As we don't find it there, we do an exhaustive search (hence all the stat() through /dev), decide we can't find the pty, give up and return "?", which is then displayed as the second ? in the TTY column.
If run from within the zone, ps is able to see the device nodes in the in-zone /dev/pts and everything works fine.