You can use strace on a specific pid to figure out what a specific process is doing, e.g.:
strace -fp <pid>
You might see something like:
select(9, [3 5 8], [], [], {0, 999999}) = 0 (Timeout)
In this case, 3, 5 and 8 are the file descriptors select() may read from, and the 9 will be ([highest FD] + 1).
{0, 999999}
is a time struct which says that select will wait just under one second to timeout.
= 0 (Timeout)
is the return value of select, indicating that none of the file descriptors were ready to read from.
One can also filter for specific events with the -e
flag:
strace -fp <pid> -e pread,select
It is also possible to exclude certain events:
strace -fp <pid> -e \!futex
Aggregating over the events can also be useful. This can be done with the -c
flag, by starting the trace, leaving it running for as long as desired, and then stopping it, at which time profiling stats will be displayed per event type.
strace -cfp <pid>
Now to figure out what these specific file descriptors are.
As root, run:
lsof -p <pid> -ad <file_handles>
to see what it's doing, like waiting for a response over a socket. You can also separate file handles with a comma:
[root@ops-2-portal ~]# lsof -p 2947 -ad 3,5,8
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
mongrel_r 2947 deploy 3u IPv4 57390385 TCP *:vcom-tunnel (LISTEN)
mongrel_r 2947 deploy 5u IPv4 57390749 TCP ops-2-portal:42717 (LISTEN)
mongrel_r 2947 deploy 8u IPv4 58983912 TCP ops-2-portal:35191->ops-2-websvc:7077 (ESTABLISHED)
As you can see, select() was looking for data on these file handles, and with the presence of FD 8, you can determine that this mongrel has a TCP connection established to ops-2-websvc:7077, but isn't reading any data.
- Using lsof in the Real World
- Finding open files with lsof
- 5 simple ways to troubleshoot using strace
Pyflame and flamegraph.pl can be used together to profile a running Python process.
sudo pyflame -p 27316 -s 60 | ./flamegraph.pl > flame.svg