Skip to content

Instantly share code, notes, and snippets.

@ydp
Forked from Mistobaan/TENSORFLOW_DEBUG.md
Created April 3, 2018 10:05
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ydp/fd7923aaa84a41ad707a3f0356baab81 to your computer and use it in GitHub Desktop.
Save ydp/fd7923aaa84a41ad707a3f0356baab81 to your computer and use it in GitHub Desktop.
Tensorflow Internals Debugging Techniques

Machine Setup August 2016

Linux Ubuntu 2016.

  • 1080 GTX
  • SDK 8.0
  • CuDNN 5.1

ENABLE Core dumps

ulimit -c unlimited

Enable CUDA Dumps

Test core C++ tests (small only)

bazel test --test_size_filters=small --compilation_mode dbg -c dbg -c opt --config=cuda //tensorflow/core/...

Run gdb on a test.

To find the right binary to run I had to first run bazel test -s and then look inside to see what file was the test_setup.h executable was invoking.

gdb bazel-out/local-dbg/bin/tensorflow/core/ops_math_ops_test.runfiles/org_tensorflow/tensorflow/core/ops_math_ops_test

Enable .bazelrc

Build with debug information

bazel build -c opt --config cuda -c dbg --strip=never  //tensorflow/tools/pip_package:build_pip_package

run gdb

setup tensorflow for development

mkdir _python_build
cd _python_build
ln -s ../bazel-bin/tensorflow/tools/pip_package/build_pip_package.runfiles/org_tensorflow/* .
ln -s ../tensorflow/tools/pip_package/* .
python setup.py develop
cd _python_
gdb python 
set env PYTHONPATH . 
run <path to test> <TestName>

run cuda-gdb straight from bazel test

bazel test --run_under="/usr/local/cuda/bin/cuda-gdb --args $(which python) -m unittest" -s --test_output=streamed --verbose_failures -c opt -c dbg --config cuda --strip=never --test_arg SelectOpTest.testScalar //tensorflow/python/kernel_tests:cwise_ops_test

run Python Test

bazel test --test_output=streamed --verbose_failures -c opt --config cuda //tensorflow/python/kernel_tests:cwise_ops_test

TensorFlow's C++ code executes in the same process as the Python code that calls it (or, if you are using the distributed version, in the same process as one of the Python programs that created a tf.GrpcServer).

The simplest interface between Python and C++ is the pure-C API in tensor_c_api.h. To intercept one of these calls, you can attach gdb to the process ID of the Python interpreter that is running TensorFlow, and create a breakpoint on one of these functions.

For example, using an interactive Python session, in the first terminal enter:

$ python
>>> import tensorflow
>>> import os
>>> os.getpid()
14680

Then, in another terminal, start gdb:

$ gdb -p 14680
[...]
(gdb) break TF_NewSession
Breakpoint 1 at 0x7f15f450a4d0
(gdb) continue
Continuing.

Back in the Python interpreter, create a new session:

>>> sess = tf.Session()

The interpreter will pause, and your debugger will print something like the following:

Breakpoint 1, 0x00007f15f450a4d0 in TF_NewSession () from [...]/tensorflow/python/_pywrap_tensorflow.so
(gdb) backtrace
#0  0x00007f15f450a4d0 in TF_NewSession () from [...]/tensorflow/python/_pywrap_tensorflow.so
#1  0x00007f15f3ac5cdb in _wrap_TF_NewSession () from [...]/tensorflow/python/_pywrap_tensorflow.so
#2  0x000000000049968d in PyEval_EvalFrameEx ()
#3  0x00000000004a090c in PyEval_EvalCodeEx ()
#4  0x0000000000499a52 in PyEval_EvalFrameEx ()
[...]

You can now use the full power of gdb to debug TensorFlow.

@ydp
Copy link
Author

ydp commented Apr 3, 2018

The lastest code should break at TF_NewDeprecatedSession

break TF_NewDeprecatedSession

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment