Skip to content

Instantly share code, notes, and snippets.

@Mistobaan
Last active June 29, 2023 06:32
Show Gist options
  • Save Mistobaan/738e76c3a5bb1f9bcc52e2809a23a7a1 to your computer and use it in GitHub Desktop.
Save Mistobaan/738e76c3a5bb1f9bcc52e2809a23a7a1 to your computer and use it in GitHub Desktop.
Tensorflow Internals Debugging Techniques

Machine Setup August 2016

Linux Ubuntu 2016.

  • 1080 GTX
  • SDK 8.0
  • CuDNN 5.1

ENABLE Core dumps

ulimit -c unlimited

Enable CUDA Dumps

Test core C++ tests (small only)

bazel test --test_size_filters=small --compilation_mode dbg -c dbg -c opt --config=cuda //tensorflow/core/...

Run gdb on a test.

To find the right binary to run I had to first run bazel test -s and then look inside to see what file was the test_setup.h executable was invoking.

gdb bazel-out/local-dbg/bin/tensorflow/core/ops_math_ops_test.runfiles/org_tensorflow/tensorflow/core/ops_math_ops_test

Enable .bazelrc

Build with debug information

bazel build -c opt --config cuda -c dbg --strip=never  //tensorflow/tools/pip_package:build_pip_package

run gdb

setup tensorflow for development

mkdir _python_build
cd _python_build
ln -s ../bazel-bin/tensorflow/tools/pip_package/build_pip_package.runfiles/org_tensorflow/* .
ln -s ../tensorflow/tools/pip_package/* .
python setup.py develop
cd _python_
gdb python 
set env PYTHONPATH . 
run <path to test> <TestName>

run cuda-gdb straight from bazel test

bazel test --run_under="/usr/local/cuda/bin/cuda-gdb --args $(which python) -m unittest" -s --test_output=streamed --verbose_failures -c opt -c dbg --config cuda --strip=never --test_arg SelectOpTest.testScalar //tensorflow/python/kernel_tests:cwise_ops_test

run Python Test

bazel test --test_output=streamed --verbose_failures -c opt --config cuda //tensorflow/python/kernel_tests:cwise_ops_test

TensorFlow's C++ code executes in the same process as the Python code that calls it (or, if you are using the distributed version, in the same process as one of the Python programs that created a tf.GrpcServer).

The simplest interface between Python and C++ is the pure-C API in tensor_c_api.h. To intercept one of these calls, you can attach gdb to the process ID of the Python interpreter that is running TensorFlow, and create a breakpoint on one of these functions.

For example, using an interactive Python session, in the first terminal enter:

$ python
>>> import tensorflow
>>> import os
>>> os.getpid()
14680

Then, in another terminal, start gdb:

$ gdb -p 14680
[...]
(gdb) break TF_NewSession
Breakpoint 1 at 0x7f15f450a4d0
(gdb) continue
Continuing.

Back in the Python interpreter, create a new session:

>>> sess = tf.Session()

The interpreter will pause, and your debugger will print something like the following:

Breakpoint 1, 0x00007f15f450a4d0 in TF_NewSession () from [...]/tensorflow/python/_pywrap_tensorflow.so
(gdb) backtrace
#0  0x00007f15f450a4d0 in TF_NewSession () from [...]/tensorflow/python/_pywrap_tensorflow.so
#1  0x00007f15f3ac5cdb in _wrap_TF_NewSession () from [...]/tensorflow/python/_pywrap_tensorflow.so
#2  0x000000000049968d in PyEval_EvalFrameEx ()
#3  0x00000000004a090c in PyEval_EvalCodeEx ()
#4  0x0000000000499a52 in PyEval_EvalFrameEx ()
[...]

You can now use the full power of gdb to debug TensorFlow.

@Nayana-ibm
Copy link

@Mistobaan We are executing TensorFlow tests using command bazel test. We are observing testcase failures in python module of TensorFlow.
As per above document, you have mentioned about debugging TensorFlow using an interactive python session.
Could you please guide us on how to debug python tests ( for example, bazel test //tensorflow/python/kernel_tests:sparse_split_op_test)?

@loyvon
Copy link

loyvon commented Sep 3, 2017

I have tried the process of debugging with gdb on mac os 10.12.6, after I attached lldb/gdb to the python process, the process was stopped.

@ydp
Copy link

ydp commented Apr 3, 2018

The lastest code should break at TF_NewDeprecatedSession

break TF_NewDeprecatedSession

@adikshit
Copy link

Is this only valid for cuda ? I don't have a cuda based system but I was hoping steps would be more or less same.
Here are my steps

bazel build -c opt --config=monolithic -c dbg --strip=never  //tensorflow/tools/pip_package:build_pip_package
mkdir _python_build
cd _python_build
ln -s ../bazel-bin/tensorflow/tools/pip_package/build_pip_package.runfiles/org_tensorflow/* .
ln -s ../tensorflow/tools/pip_package/* .
python setup.py develop

I was able to attach to the process using the commands suggested above.
But when I try and add breakpoints I get this message

(gdb) break TF_NewDeprecatedSession
Function "TF_NewDeprecatedSession" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (TF_NewDeprecatedSession) pending.
(gdb) break TF_NewSession
Function "TF_NewSession" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 2 (TF_NewSession) pending.

Any suggestions ?

@wangxiang2713
Copy link

Hello,
can i add breakpoint through: break tensorflow/c/c_api.cc:544? It fails to work, how can i debug C++ code step by step?

@wieczyk
Copy link

wieczyk commented May 12, 2020

-c dbg does not convince bazel to not add -O2 -g0 to compiler command line.

@tanzhenyu
Copy link

It's odd that -c opt and -c dbg co-exist...does the latter override the first one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment