Based on Summary Trace API,
device_name = tf.test.gpu_device_name()
if not tf.test.is_gpu_available():
raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))
os.makedirs(os.path.join(args.exp_dir, 'plugins/profile'), exist_ok=True)
tf.summary.trace_on(graph=True, profiler=True)
tracing_params = params.copy()
tracing_params.epochs = 3
tracing_estimator = tf.estimator.Estimator(model_fn, params=tracing_params, config=config)
tracing_train_spec = tf.estimator.TrainSpec(lambda: input_fn(tracing_params), hooks=[auc_hook, loss_hook])
tracing_eval_spec = tf.estimator.EvalSpec(lambda: input_fn(tracing_params, valid=True))
tf.estimator.train_and_evaluate(tracing_estimator, train_spec=tracing_train_spec, eval_spec=tracing_eval_spec)
tf.summary.trace_export("profiling", profiler_outdir=args.exp_dir)
This records both CPU and GPU ops but shows only one device: CPU in Tensorboard
On the other hand, using ProfilerHook
requires disabling Eager Execution:
tf.compat.v1.disable_eager_execution()
profiler_hook = tf.estimator.ProfilerHook(save_steps=500, output_dir=os.path.join(args.exp_dir, 'profile'))
train_spec = tf.estimator.TrainSpec(lambda: input_fn(params), hooks=[profiler_hook])
eval_spec = tf.estimator.EvalSpec(lambda: input_fn(params, valid=True), throttle_secs=20, start_delay_secs=0)
tf.estimator.train_and_evaluate(estimator, train_spec=train_spec, eval_spec=eval_spec)
It should also be noticed that the files created by ProfilerHook
cannot be inspected in the Tensorboard.
chrome://tracing should be used to view these files