Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
Tensorflow Internals Debugging Techniques

Machine Setup August 2016

Linux Ubuntu 2016.

  • 1080 GTX
  • SDK 8.0
  • CuDNN 5.1

ENABLE Core dumps

ulimit -c unlimited

Enable CUDA Dumps

Test core C++ tests (small only)

bazel test --test_size_filters=small --compilation_mode dbg -c dbg -c opt --config=cuda //tensorflow/core/...

Run gdb on a test.

To find the right binary to run I had to first run bazel test -s and then look inside to see what file was the test_setup.h executable was invoking.

gdb bazel-out/local-dbg/bin/tensorflow/core/ops_math_ops_test.runfiles/org_tensorflow/tensorflow/core/ops_math_ops_test

Enable .bazelrc

Build with debug information

bazel build -c opt --config cuda -c dbg --strip=never  //tensorflow/tools/pip_package:build_pip_package

run gdb

setup tensorflow for development

mkdir _python_build
cd _python_build
ln -s ../bazel-bin/tensorflow/tools/pip_package/build_pip_package.runfiles/org_tensorflow/* .
ln -s ../tensorflow/tools/pip_package/* .
python develop
cd _python_
gdb python 
set env PYTHONPATH . 
run <path to test> <TestName>

run cuda-gdb straight from bazel test

bazel test --run_under="/usr/local/cuda/bin/cuda-gdb --args $(which python) -m unittest" -s --test_output=streamed --verbose_failures -c opt -c dbg --config cuda --strip=never --test_arg SelectOpTest.testScalar //tensorflow/python/kernel_tests:cwise_ops_test

run Python Test

bazel test --test_output=streamed --verbose_failures -c opt --config cuda //tensorflow/python/kernel_tests:cwise_ops_test

TensorFlow's C++ code executes in the same process as the Python code that calls it (or, if you are using the distributed version, in the same process as one of the Python programs that created a tf.GrpcServer).

The simplest interface between Python and C++ is the pure-C API in tensor_c_api.h. To intercept one of these calls, you can attach gdb to the process ID of the Python interpreter that is running TensorFlow, and create a breakpoint on one of these functions.

For example, using an interactive Python session, in the first terminal enter:

$ python
>>> import tensorflow
>>> import os
>>> os.getpid()

Then, in another terminal, start gdb:

$ gdb -p 14680
(gdb) break TF_NewSession
Breakpoint 1 at 0x7f15f450a4d0
(gdb) continue

Back in the Python interpreter, create a new session:

>>> sess = tf.Session()

The interpreter will pause, and your debugger will print something like the following:

Breakpoint 1, 0x00007f15f450a4d0 in TF_NewSession () from [...]/tensorflow/python/
(gdb) backtrace
#0  0x00007f15f450a4d0 in TF_NewSession () from [...]/tensorflow/python/
#1  0x00007f15f3ac5cdb in _wrap_TF_NewSession () from [...]/tensorflow/python/
#2  0x000000000049968d in PyEval_EvalFrameEx ()
#3  0x00000000004a090c in PyEval_EvalCodeEx ()
#4  0x0000000000499a52 in PyEval_EvalFrameEx ()

You can now use the full power of gdb to debug TensorFlow.


This comment has been minimized.

Copy link

@Nayana-ibm Nayana-ibm commented Oct 27, 2016

@Mistobaan We are executing TensorFlow tests using command bazel test. We are observing testcase failures in python module of TensorFlow.
As per above document, you have mentioned about debugging TensorFlow using an interactive python session.
Could you please guide us on how to debug python tests ( for example, bazel test //tensorflow/python/kernel_tests:sparse_split_op_test)?


This comment has been minimized.

Copy link

@loyvon loyvon commented Sep 3, 2017

I have tried the process of debugging with gdb on mac os 10.12.6, after I attached lldb/gdb to the python process, the process was stopped.


This comment has been minimized.

Copy link

@ydp ydp commented Apr 3, 2018

The lastest code should break at TF_NewDeprecatedSession

break TF_NewDeprecatedSession

This comment has been minimized.

Copy link

@adikshit adikshit commented Feb 22, 2019

Is this only valid for cuda ? I don't have a cuda based system but I was hoping steps would be more or less same.
Here are my steps

bazel build -c opt --config=monolithic -c dbg --strip=never  //tensorflow/tools/pip_package:build_pip_package
mkdir _python_build
cd _python_build
ln -s ../bazel-bin/tensorflow/tools/pip_package/build_pip_package.runfiles/org_tensorflow/* .
ln -s ../tensorflow/tools/pip_package/* .
python develop

I was able to attach to the process using the commands suggested above.
But when I try and add breakpoints I get this message

(gdb) break TF_NewDeprecatedSession
Function "TF_NewDeprecatedSession" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (TF_NewDeprecatedSession) pending.
(gdb) break TF_NewSession
Function "TF_NewSession" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 2 (TF_NewSession) pending.

Any suggestions ?


This comment has been minimized.

Copy link

@wangxiang2713 wangxiang2713 commented Mar 12, 2020

can i add breakpoint through: break tensorflow/c/ It fails to work, how can i debug C++ code step by step?


This comment has been minimized.

Copy link

@wieczyk wieczyk commented May 12, 2020

-c dbg does not convince bazel to not add -O2 -g0 to compiler command line.


This comment has been minimized.

Copy link

@tanzhenyu tanzhenyu commented Aug 26, 2021

It's odd that -c opt and -c dbg co-exist...does the latter override the first one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment