Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
Using strace and lsof

Using strace and lsof to debug blocked processes

You can use strace on a specific pid to figure out what a specific process is doing, e.g.:

strace -fp <pid>

You might see something like:

select(9, [3 5 8], [], [], {0, 999999})   = 0 (Timeout)

In this case, 3, 5 and 8 are the file descriptors select() may read from, and the 9 will be ([highest FD] + 1).

{0, 999999} is a time struct which says that select will wait just under one second to timeout.

= 0 (Timeout) is the return value of select, indicating that none of the file descriptors were ready to read from.

Now to figure out what these specific file descriptors are.

As root, run:

lsof -p <pid> -ad <file_handles>

to see what it's doing, like waiting for a response over a socket. You can also separate file handles with a comma:

[root@ops-2-portal ~]# lsof -p 2947 -ad 3,5,8
mongrel_r 2947 deploy    3u  IPv4 57390385       TCP *:vcom-tunnel (LISTEN)
mongrel_r 2947 deploy    5u  IPv4 57390749       TCP ops-2-portal:42717 (LISTEN)
mongrel_r 2947 deploy    8u  IPv4 58983912       TCP ops-2-portal:35191->ops-2-websvc:7077 (ESTABLISHED)

As you can see, select() was looking for data on these file handles, and with the presence of FD 8, you can determine that this mongrel has a TCP connection established to ops-2-websvc:7077, but isn't reading any data.



This comment has been minimized.

Copy link

@sfgeorge sfgeorge commented Oct 11, 2018

Extremely helpful. Thank you, Tony!


This comment has been minimized.

Copy link

@Thymopat Thymopat commented Dec 15, 2018

Thanks for providing this useful gistfile!


This comment has been minimized.

Copy link

@toplyf toplyf commented Jun 8, 2020

Fxuking helpful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment