Skip to content

Instantly share code, notes, and snippets.

@Agoniii
Agoniii / horovod_threadinfo_node2.log
Created July 10, 2018 01:26
Hanging when run tensorflow_mnist.py on CPU machines, the output of 'gdb -p pid, thread apply all bt' of two nodes
Thread 110 (Thread 0x7f0fdc885700 (LWP 111590)):
#0 0x00007f101dd53945 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
#1 0x00007f0fdc8fe36f in th_worker(void*) () from /home/jinglu/anaconda2/lib/python2.7/site-packages/numexpr/interpreter.so
#2 0x00007f101dd4fe25 in start_thread () from /usr/lib64/libpthread.so.0
#3 0x00007f101d37434d in clone () from /usr/lib64/libc.so.6
Thread 109 (Thread 0x7f0fdc084700 (LWP 111591)):
#0 0x00007f101dd53945 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
#1 0x00007f0fdc8fe36f in th_worker(void*) () from /home/jinglu/anaconda2/lib/python2.7/site-packages/numexpr/interpreter.so
@Agoniii
Agoniii / horovod_threadinfo_node1.log
Last active July 10, 2018 01:24
Hanging when run tensorflow_mnist.py on CPU machines, the output of 'gdb -p pid, thread apply all bt' of two nodes
Thread 111 (Thread 0x7fb847ac5700 (LWP 24445)):
#0 0x00007fb888f89945 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
#1 0x00007fb847b3e36f in th_worker(void*) () from /home/jinglu/anaconda2/lib/python2.7/site-packages/numexpr/interpreter.so
#2 0x00007fb888f85e25 in start_thread () from /usr/lib64/libpthread.so.0
#3 0x00007fb8885aa34d in clone () from /usr/lib64/libc.so.6
Thread 110 (Thread 0x7fb8472c4700 (LWP 24446)):
#0 0x00007fb888f89945 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
#1 0x00007fb847b3e36f in th_worker(void*) () from /home/jinglu/anaconda2/lib/python2.7/site-packages/numexpr/interpreter.so