Skip to content

Instantly share code, notes, and snippets.

Last active June 3, 2024 15:34
Show Gist options
  • Save tonyc/1384523 to your computer and use it in GitHub Desktop.
Save tonyc/1384523 to your computer and use it in GitHub Desktop.
Using strace and lsof

Using strace and lsof to debug blocked processes

You can use strace on a specific pid to figure out what a specific process is doing, e.g.:

strace -fp <pid>

You might see something like:

select(9, [3 5 8], [], [], {0, 999999})   = 0 (Timeout)

In this case, 3, 5 and 8 are the file descriptors select() may read from, and the 9 will be ([highest FD] + 1).

{0, 999999} is a time struct which says that select will wait just under one second to timeout.

= 0 (Timeout) is the return value of select, indicating that none of the file descriptors were ready to read from.

Now to figure out what these specific file descriptors are.

As root, run:

lsof -p <pid> -ad <file_handles>

to see what it's doing, like waiting for a response over a socket. You can also separate file handles with a comma:

[root@ops-2-portal ~]# lsof -p 2947 -ad 3,5,8
mongrel_r 2947 deploy    3u  IPv4 57390385       TCP *:vcom-tunnel (LISTEN)
mongrel_r 2947 deploy    5u  IPv4 57390749       TCP ops-2-portal:42717 (LISTEN)
mongrel_r 2947 deploy    8u  IPv4 58983912       TCP ops-2-portal:35191->ops-2-websvc:7077 (ESTABLISHED)

As you can see, select() was looking for data on these file handles, and with the presence of FD 8, you can determine that this mongrel has a TCP connection established to ops-2-websvc:7077, but isn't reading any data.


Copy link

Extremely helpful. Thank you, Tony!

Copy link

ghost commented Dec 15, 2018

Thanks for providing this useful gistfile!

Copy link

toplyf commented Jun 8, 2020

Fxuking helpful!

Copy link

Great note, man

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment