Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Using strace and lsof

Using strace and lsof to debug blocked processes

You can use strace on a specific pid to figure out what a specific process is doing, e.g.:

strace -fp <pid>

You might see something like:

select(9, [3 5 8], [], [], {0, 999999})   = 0 (Timeout)

In this case, 3, 5 and 8 are the file descriptors select() may read from, and the 9 will be ([highest FD] + 1).

{0, 999999} is a time struct which says that select will wait just under one second to timeout.

= 0 (Timeout) is the return value of select, indicating that none of the file descriptors were ready to read from.

Now to figure out what these specific file descriptors are.

As root, run:

lsof -p <pid> -ad <file_handles>

to see what it's doing, like waiting for a response over a socket. You can also separate file handles with a comma:

[root@ops-2-portal ~]# lsof -p 2947 -ad 3,5,8
COMMAND    PID  USER   FD   TYPE   DEVICE SIZE NODE NAME
mongrel_r 2947 deploy    3u  IPv4 57390385       TCP *:vcom-tunnel (LISTEN)
mongrel_r 2947 deploy    5u  IPv4 57390749       TCP ops-2-portal:42717 (LISTEN)
mongrel_r 2947 deploy    8u  IPv4 58983912       TCP ops-2-portal:35191->ops-2-websvc:7077 (ESTABLISHED)

As you can see, select() was looking for data on these file handles, and with the presence of FD 8, you can determine that this mongrel has a TCP connection established to ops-2-websvc:7077, but isn't reading any data.

Resources

@sfgeorge

This comment has been minimized.

Copy link

commented Oct 11, 2018

Extremely helpful. Thank you, Tony!

@Thymopat

This comment has been minimized.

Copy link

commented Dec 15, 2018

Thanks for providing this useful gistfile!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.