Checking for qpid deadlock https://issues.apache.org/jira/browse/QPID-7317
These steps can be used to check for a qpid deadlock when tasks are stuck in Katello
- In the dynflow console, click the step that the task is stuck on and find the pulp worker it is assigned to in "worker_name" or "queue"
- i.e.
queue: reserved_resource_worker-1@foo.example.com.dq
- i.e.
- run
ps -awfux | grep celery | grep reserved_resource_worker-X
where X is the worker number- i.e.
ps -awfux | grep celery | grep reserved_resource_worker-1
- i.e.
- Check the child process (usually the bottom one) and get that pid
- for example:
apache 6978 0.0 1.2 664360 56164 ? Ssl 09:19 0:01 /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-1@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-1.pid --heartbeat-interval=30
apache 7025 42.8 3.7 1062004 167124 ? Sl 09:19 9:14 \_ /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-1@%h -A pulp.server.async.app -c 1 --events --umask 18 --pidfile=/var/run/pulp/reserved_resource_worker-1.pid --heartbeat-interval=30
- the pid of the child process for that pulp worker would be 7025
- run
top -p <pid-from-step-3> -H
and check the number of threads. There should be 4 threads running if that worker is processing tasks.
If you have less than 4 threads:
Unfortunately, this is not conclusive that you are seeing https://issues.apache.org/jira/browse/QPID-7317 as a worker can have less than 4 threads normally as well. However, this is an easy first check for this issue, since if you are seeing 4 threads you are not running into this bug. To conclusively check for this bug, we must look at a core dump of the child worker process.