Skip to content

Instantly share code, notes, and snippets.

@AlejoCR
Created August 28, 2019 04:30
Show Gist options
  • Save AlejoCR/ca3ee40bba8b6ee430fa4cdd33953089 to your computer and use it in GitHub Desktop.
Save AlejoCR/ca3ee40bba8b6ee430fa4cdd33953089 to your computer and use it in GitHub Desktop.
==21650== NVPROF is profiling process 21650, command: /root/.virtualenvs/cv/bin/python -m cProfile -o out.prof retrain/label_image.py --graph=retrain/output_graph.pb --labels=retrain/output_labels.txt --input_layer=Placeholder --output_layer=final_result --image=test-images/lego1.jpeg
==21650== Warning: Unified Memory Profiling is not supported on the underlying platform. System requirements for unified memory can be found at: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-requirements
==21650== Profiling application: /root/.virtualenvs/cv/bin/python -m cProfile -o out.prof retrain/label_image.py --graph=retrain/output_graph.pb --labels=retrain/output_labels.txt --input_layer=Placeholder --output_layer=final_result --image=test-images/lego1.jpeg
==21650== Profiling result:
Type Time(%) Time Calls Avg Min Max Name
GPU activities: 58.41% 815.49ms 128 6.3710ms 789.76us 25.076ms maxwell_gcgemm_64x64_nt
5.90% 82.383ms 43 1.9159ms 315.48us 3.1324ms maxwell_gcgemm_32x32_nt
4.31% 60.214ms 34 1.7710ms 391.21us 9.5360ms void cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
3.52% 49.210ms 43 1.1444ms 283.86us 11.849ms void cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
3.41% 47.667ms 94 507.09us 9.5310us 6.5893ms void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
2.77% 38.702ms 31 1.2484ms 865.60us 4.6935ms maxwell_scudnn_128x64_relu_small_nn
2.24% 31.237ms 23 1.3581ms 237.61us 5.1656ms void cudnn::detail::implicit_convolve_sgemm<float, float, int=128, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=128, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
2.21% 30.828ms 14 2.2020ms 803.09us 5.9564ms maxwell_scudnn_winograd_128x128_ldg1_ldg4_relu_tile148m_nt_v1
2.04% 28.478ms 291 97.862us 781ns 1.9287ms [CUDA memcpy HtoD]
2.00% 27.904ms 38 734.33us 211.78us 7.6406ms maxwell_scudnn_128x64_relu_interior_nn
1.45% 20.178ms 20 1.0089ms 162.04us 3.4523ms void fft1d_r2c_32<float, float, float2, bool=1, bool=0>(float2*, float const *, int, int3, int3, int2, int2)
1.44% 20.137ms 12 1.6781ms 727.10us 4.8328ms maxwell_gcgemm_64x32_nt
1.21% 16.844ms 56 300.79us 140.68us 1.5008ms void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
1.11% 15.516ms 13 1.1935ms 178.13us 6.2719ms void cudnn::detail::explicit_convolve_sgemm<float, int, int=1024, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=1024, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
1.11% 15.439ms 1 15.439ms 15.439ms 15.439ms void cudnn::detail::implicit_convolve_sgemm<float, float, int=512, int=6, int=8, int=3, int=3, int=5, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=512, int=6, int=8, int=3, int=3, int=5, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
0.71% 9.9113ms 8 1.2389ms 137.19us 2.8095ms maxwell_scudnn_128x32_relu_interior_nn
0.66% 9.2331ms 8 1.1541ms 429.44us 3.8745ms maxwell_scudnn_128x128_relu_interior_nn
0.63% 8.7713ms 9 974.59us 362.20us 1.4958ms void cudnn::detail::pooling_fw_4d_kernel<float, float, cudnn::detail::averpooling_func<float>, int=2, bool=0>(cudnnTensorStruct, float const *, cudnn::detail::pooling_fw_4d_kernel<float, float, cudnn::detail::averpooling_func<float>, int=2, bool=0>, cudnnTensorStruct*, cudnnPoolingStruct, float, cudnnPoolingStruct, int, cudnn::reduced_divisor, float)
0.56% 7.8533ms 24 327.22us 117.50us 605.75us void fft1d_r2c_32<float, float, float2, bool=0, bool=0>(float2*, float const *, int, int3, int3, int2, int2)
0.54% 7.5676ms 100 75.676us 23.647us 260.27us void fft2d_c2r_32x32<float, bool=1, bool=0, unsigned int=0, bool=0, bool=0>(float*, float2 const *, int, int, int, int, int, int, int, int, int, float, float, cudnn::reduced_divisor, bool, float*, float*, int2, int, int)
0.48% 6.6918ms 9 743.53us 508.92us 2.2855ms maxwell_scudnn_128x128_relu_small_nn
0.42% 5.8852ms 154 38.215us 17.137us 112.56us void fft2d_r2c_32x32<float, bool=0, unsigned int=0, bool=0>(float2*, float const *, int, int, int, int, int, int, int, int, int, cudnn::reduced_divisor, bool, int2, int, int)
0.31% 4.3325ms 94 46.089us 6.7200us 635.12us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
0.31% 4.2700ms 94 45.425us 7.2920us 620.43us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
0.28% 3.9582ms 63 62.829us 4.4790us 598.25us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
0.21% 2.9790ms 1 2.9790ms 2.9790ms 2.9790ms void tensorflow::_GLOBAL__N__71_tmpxft_0000409a_00000000_8_resize_bilinear_op_gpu_cu_compute_72_cpp1_ii_f402459c::ResizeBilinearKernel<float>(int, float const *, float, float, int, int, int, int, int, int, float*)
0.21% 2.9213ms 54 54.098us 26.979us 74.221us void fft2d_c2r_32x32<float, bool=0, bool=0, unsigned int=0, bool=0, bool=0>(float*, float2 const *, int, int, int, int, int, int, int, int, int, float, float, cudnn::reduced_divisor, bool, float*, float*, int2, int, int)
0.20% 2.8421ms 3 947.37us 665.85us 1.4427ms void fft2d_r2c_32x32<float, bool=0, unsigned int=1, bool=1>(float2*, float const *, int, int, int, int, int, int, int, int, int, cudnn::reduced_divisor, bool, int2, int, int)
0.19% 2.7086ms 1 2.7086ms 2.7086ms 2.7086ms void tensorflow::functor::SwapDimension1And2InTensor3UsingTiles<unsigned int, int=1024, int=1024, int=2, bool=0>(unsigned int const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::SwapDimension1And2InTensor3UsingTiles<unsigned int, int=1024, int=1024, int=2, bool=0>*)
0.17% 2.3575ms 2 1.1788ms 964.71us 1.3928ms maxwell_scudnn_128x32_relu_small_nn
0.17% 2.3476ms 1 2.3476ms 2.3476ms 2.3476ms void fft2d_r2c_32x32<float, bool=0, unsigned int=5, bool=1>(float2*, float const *, int, int, int, int, int, int, int, int, int, cudnn::reduced_divisor, bool, int2, int, int)
0.17% 2.3200ms 4 579.99us 169.54us 1.0956ms void cudnn::detail::pooling_fw_4d_kernel<float, float, cudnn::detail::maxpooling_func<float, cudnnNanPropagation_t=0>, int=0, bool=0>(cudnnTensorStruct, float const *, cudnn::detail::pooling_fw_4d_kernel<float, float, cudnn::detail::maxpooling_func<float, cudnnNanPropagation_t=0>, int=0, bool=0>, cudnnTensorStruct*, cudnnPoolingStruct, float, cudnnPoolingStruct, int, cudnn::reduced_divisor, float)
0.13% 1.7544ms 2 877.19us 870.39us 883.98us void cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=6, int=7, int=3, int=3, int=5, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=6, int=7, int=3, int=3, int=5, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
0.10% 1.4648ms 46 31.844us 5.9390us 67.710us [CUDA memcpy DtoD]
0.09% 1.2835ms 24 53.478us 20.885us 140.52us void fft1d_c2r_32<float2, float, float, bool=0, bool=1, bool=0, bool=0>(float*, float2 const *, int, int3, int3, int2, int, float, float, float*, float*)
0.07% 982.16us 1 982.16us 982.16us 982.16us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const , Eigen::GpuDevice>, long>(float, int=1)
0.05% 726.63us 14 51.902us 14.272us 102.24us void cudnn::winograd::generateWinogradTilesKernel<int=1, float, float>(cudnn::winograd::GenerateWinogradTilesParams<float, float>)
0.05% 661.27us 2 330.63us 329.80us 331.47us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseUnaryOp<Eigen::internal::scalar_right<float, float, Eigen::internal::scalar_product_op<float, float>>, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, int>, int=16, Eigen::MakePointer> const > const > const , Eigen::GpuDevice>, int>(float, int=1)
0.04% 507.98us 2 253.99us 2.0310us 505.95us [CUDA memcpy DtoH]
0.03% 449.08us 5 89.815us 78.179us 103.02us void fft1d_r2c_32<float, float, float2, bool=0, bool=1>(float2*, float const *, int, int3, int3, int2, int2)
0.03% 441.27us 96 4.5960us 1.6140us 136.20us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
0.02% 335.32us 1 335.32us 335.32us 335.32us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseUnaryOp<Eigen::internal::scalar_right<float, float, Eigen::internal::scalar_difference_op<float, float>>, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, int>, int=16, Eigen::MakePointer> const > const > const , Eigen::GpuDevice>, int>(float, int=1)
0.02% 244.43us 5 48.886us 40.731us 53.386us void fft1d_c2r_32<float2, float, float, bool=0, bool=1, bool=0, bool=1>(float*, float2 const *, int, int3, int3, int2, int, float, float, float*, float*)
0.01% 83.336us 1 83.336us 83.336us 83.336us void tensorflow::functor::RowReduceKernel<float*, tensorflow::TransformOutputIterator<float, float, tensorflow::functor::DividesBy<float, float>, long>, tensorflow::functor::Sum<float>>(float*, float, int, int, float, std::iterator_traits<tensorflow::functor::RowReduceKernel<float*, tensorflow::TransformOutputIterator<float, float, tensorflow::functor::DividesBy<float, float>, long>, tensorflow::functor::Sum<float>>>::value_type)
0.00% 57.397us 1 57.397us 57.397us 57.397us void gemv2N_kernel_val<int, int, float, float, float, int=128, int=32, int=4, int=4, int=1, cublasGemvParams<cublasGemvTensor<float const >, cublasGemvTensor<float>, float>>(float, float, float const )
0.00% 21.565us 6 3.5940us 2.2390us 6.0430us [CUDA memset]
0.00% 3.9060us 1 3.9060us 3.9060us 3.9060us void tensorflow::functor::RowReduceKernel<cub::TransformInputIterator<float, tensorflow::_GLOBAL__N__63_tmpxft_00002a79_00000000_8_softmax_op_gpu_cu_compute_72_cpp1_ii_c381a214::SubtractAndExpFunctor<float, float>, cub::CountingInputIterator<int, long>, long>, float*, cub::Sum>(float, float, int, int, float, std::iterator_traits<tensorflow::functor::RowReduceKernel<cub::TransformInputIterator<float, tensorflow::_GLOBAL__N__63_tmpxft_00002a79_00000000_8_softmax_op_gpu_cu_compute_72_cpp1_ii_c381a214::SubtractAndExpFunctor<float, float>, cub::CountingInputIterator<int, long>, long>, float*, cub::Sum>>::value_type)
0.00% 3.4900us 1 3.4900us 3.4900us 3.4900us void tensorflow::_GLOBAL__N__63_tmpxft_00002a79_00000000_8_softmax_op_gpu_cu_compute_72_cpp1_ii_c381a214::GenerateNormalizedProb<float, float>(float const *, float const *, float const , tensorflow::_GLOBAL__N__63_tmpxft_00002a79_00000000_8_softmax_op_gpu_cu_compute_72_cpp1_ii_c381a214::GenerateNormalizedProb<float, float>*, int, int, bool)
0.00% 3.0210us 1 3.0210us 3.0210us 3.0210us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const , Eigen::GpuDevice>, long>(float, int=1)
0.00% 2.7610us 1 2.7610us 2.7610us 2.7610us void tensorflow::functor::RowReduceKernel<float const *, float*, cub::Max>(float const *, float*, int, int, cub::Max, std::iterator_traits<tensorflow::functor::RowReduceKernel<float const *, float*, cub::Max>>::value_type)
API calls: 60.45% 19.5511s 8 2.44388s 48.126us 19.5505s cudaStreamCreateWithFlags
30.81% 9.96617s 5 1.99323s 11.250us 7.30106s cudaFree
3.67% 1.18846s 156 7.6183ms 64.116us 705.81ms cuEventSynchronize
1.71% 554.17ms 1335 415.11us 32.396us 265.07ms cudaLaunchKernel
1.48% 478.13ms 1 478.13ms 478.13ms 478.13ms cuDevicePrimaryCtxRetain
0.39% 127.57ms 46 2.7732ms 47.865us 124.55ms cudaMemcpyAsync
0.22% 69.847ms 691 101.08us 4.3230us 62.548ms cudaEventRecord
0.21% 66.377ms 1 66.377ms 66.377ms 66.377ms cuMemAlloc
0.17% 56.141ms 172 326.40us 10.625us 51.862ms cudaBindTexture
0.16% 51.184ms 894 57.252us 2.6570us 20.350ms cuEventRecord
0.14% 45.703ms 4 11.426ms 1.0539ms 34.539ms cuMemHostAlloc
0.13% 43.121ms 2 21.561ms 66.512us 43.055ms cuMemcpyDtoHAsync
0.11% 35.435ms 1061 33.398us 468ns 18.956ms cudaGetLastError
0.08% 25.086ms 325 77.187us 2.0310us 13.448ms cuEventCreate
0.07% 23.363ms 2 11.681ms 2.5338ms 20.829ms cuCtxSynchronize
0.06% 20.069ms 289 69.444us 35.730us 1.0197ms cuMemcpyHtoDAsync
0.05% 15.720ms 2 7.8598ms 75.002us 15.645ms cudaMemcpy
0.01% 4.4175ms 10 441.75us 48.283us 1.3596ms cudaMalloc
0.01% 3.9770ms 567 7.0140us 3.6980us 54.324us cudaStreamWaitEvent
0.01% 3.0328ms 715 4.2410us 1.9270us 60.522us cuEventQuery
0.01% 2.1989ms 11 199.90us 18.854us 1.6367ms cuStreamCreate
0.00% 1.4825ms 1 1.4825ms 1.4825ms 1.4825ms cudaHostAlloc
0.00% 1.4552ms 4 363.81us 38.699us 1.2778ms cudaStreamCreateWithPriority
0.00% 1.4166ms 295 4.8020us 1.0410us 89.533us cuDeviceGetAttribute
0.00% 1.2984ms 172 7.5480us 4.0100us 81.252us cudaUnbindTexture
0.00% 1.1075ms 291 3.8050us 2.3960us 53.491us cuStreamWaitEvent
0.00% 817.99us 314 2.6050us 1.2500us 50.938us cuEventDestroy
0.00% 738.14us 156 4.7310us 2.7610us 24.271us cuEventElapsedTime
0.00% 308.71us 4 77.176us 27.188us 214.33us cudaMemsetAsync
0.00% 266.00us 44 6.0450us 3.6450us 40.105us cudaEventCreateWithFlags
0.00% 247.51us 3 82.502us 80.314us 86.825us cudaGetDeviceProperties
0.00% 245.32us 40 6.1320us 2.3440us 16.511us cudaDeviceGetAttribute
0.00% 243.65us 24 10.152us 7.5000us 30.417us cudaEventCreate
0.00% 236.93us 24 9.8720us 5.7300us 43.855us cudaEventDestroy
0.00% 180.47us 1 180.47us 180.47us 180.47us cuDeviceGetProperties
0.00% 177.82us 2 88.908us 86.512us 91.304us cuMemsetD32
0.00% 117.35us 2 58.673us 3.7500us 113.60us cudaGetDeviceCount
0.00% 114.85us 4 28.711us 6.6670us 48.178us cuDeviceTotalMem
0.00% 92.346us 5 18.469us 5.2090us 45.626us cudaGetDevice
0.00% 88.285us 3 29.428us 18.751us 46.200us cuMemGetInfo
0.00% 86.513us 3 28.837us 3.1780us 72.137us cuInit
0.00% 86.043us 7 12.291us 5.1560us 36.302us cuCtxSetCurrent
0.00% 60.627us 1 60.627us 60.627us 60.627us cudaDeviceGetStreamPriorityRange
0.00% 53.803us 18 2.9890us 1.3020us 7.7080us cuDeviceGetCount
0.00% 44.481us 1 44.481us 44.481us 44.481us cudaHostGetDevicePointer
0.00% 32.605us 4 8.1510us 2.5000us 19.219us cuDriverGetVersion
0.00% 30.834us 1 30.834us 30.834us 30.834us cuDeviceGetPCIBusId
0.00% 30.522us 4 7.6300us 2.4480us 10.834us cuDeviceGetName
0.00% 24.271us 4 6.0670us 4.1150us 8.2290us cudaSetDevice
0.00% 19.741us 5 3.9480us 1.6140us 7.0840us cuDeviceGet
0.00% 9.7930us 3 3.2640us 1.3550us 4.4790us cuDeviceGetUuid
0.00% 6.2500us 1 6.2500us 6.2500us 6.2500us cuDeviceComputeCapability
0.00% 2.8650us 1 2.8650us 2.8650us 2.8650us cuDevicePrimaryCtxGetState
0.00% 1.7710us 1 1.7710us 1.7710us 1.7710us cuCtxGetCurrent
==21650== NVTX result:
==21650== Thread "<unnamed>" (id = 2271260688)
==21650== Domain "<unnamed>"
==21650== Range "Const: file_reader/filename"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 5.2600us 1 5.2600us 5.2600us 5.2600us Const: file_reader/filename
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "DecodeJpeg: jpeg_reader"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 8.7798ms 1 8.7798ms 8.7798ms 8.7798ms DecodeJpeg: jpeg_reader
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "NoOp: _SOURCE"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 10.208us 1 10.208us 10.208us 10.208us NoOp: _SOURCE
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "ReadFile: file_reader"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 15.557ms 1 15.557ms 15.557ms 15.557ms ReadFile: file_reader
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "_Send: _send_jpeg_reader_0"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 30.470us 1 30.470us 30.470us 30.470us _Send: _send_jpeg_reader_0
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Thread "<unnamed>" (id = 4071789040)
==21650== Domain "<unnamed>"
==21650== Range "Add: import/final_retrain_ops/Wx_plus_b/add"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 26.149ms 1 26.149ms 26.149ms 26.149ms Add: import/final_retrain_ops/Wx_plus_b/add
GPU activities: 100.00% 3.0210us 1 3.0210us 3.0210us 3.0210us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 209.95us 1 209.95us 209.95us 209.95us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 82.837ms 1 82.837ms 82.837ms 82.837ms Add: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 345.84us 1 345.84us 345.84us 345.84us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 452.46us 1 452.46us 452.46us 452.46us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_2a_3x3/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 142.61us 1 142.61us 142.61us 142.61us Add: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_2a_3x3/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 313.60us 1 313.60us 313.60us 313.60us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 73.387us 1 73.387us 73.387us 73.387us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_2b_3x3/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 105.47us 1 105.47us 105.47us 105.47us Add: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_2b_3x3/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 635.12us 1 635.12us 635.12us 635.12us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 72.397us 1 72.397us 72.397us 72.397us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_3b_1x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 155.21us 1 155.21us 155.21us 155.21us Add: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_3b_1x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 200.89us 1 200.89us 200.89us 200.89us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 65.158us 1 65.158us 65.158us 65.158us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_4a_3x3/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 110.21us 1 110.21us 110.21us 110.21us Add: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_4a_3x3/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 431.21us 1 431.21us 431.21us 431.21us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 72.762us 1 72.762us 72.762us 72.762us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 100.42us 1 100.42us 100.42us 100.42us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 39.115us 1 39.115us 39.115us 39.115us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 77.971us 1 77.971us 77.971us 77.971us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 113.81us 1 113.81us 113.81us 113.81us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 29.636us 1 29.636us 29.636us 29.636us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 91.304us 1 91.304us 91.304us 91.304us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_1/Conv2d_0b_5x5/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 87.294us 1 87.294us 87.294us 87.294us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_1/Conv2d_0b_5x5/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 39.063us 1 39.063us 39.063us 39.063us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 61.668us 1 61.668us 61.668us 61.668us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 123.34us 1 123.34us 123.34us 123.34us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 40.626us 1 40.626us 40.626us 40.626us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 95.471us 1 95.471us 95.471us 95.471us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 141.31us 1 141.31us 141.31us 141.31us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 60.991us 1 60.991us 60.991us 60.991us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 79.846us 1 79.846us 79.846us 79.846us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0c_3x3/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 150.63us 1 150.63us 150.63us 150.63us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0c_3x3/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 59.533us 1 59.533us 59.533us 59.533us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 107.61us 1 107.61us 107.61us 107.61us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 106.41us 1 106.41us 106.41us 106.41us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 20.052us 1 20.052us 20.052us 20.052us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 83.648us 1 83.648us 83.648us 83.648us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 81.460us 1 81.460us 81.460us 81.460us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 38.491us 1 38.491us 38.491us 38.491us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 55.730us 1 55.730us 55.730us 55.730us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_1/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 78.595us 1 78.595us 78.595us 78.595us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_1/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 29.585us 1 29.585us 29.585us 29.585us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 54.272us 1 54.272us 54.272us 54.272us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_1/Conv_1_0c_5x5/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 77.346us 1 77.346us 77.346us 77.346us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_1/Conv_1_0c_5x5/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 39.532us 1 39.532us 39.532us 39.532us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 55.053us 1 55.053us 55.053us 55.053us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 101.83us 1 101.83us 101.83us 101.83us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 40.991us 1 40.991us 40.991us 40.991us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 69.012us 1 69.012us 69.012us 69.012us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 87.710us 1 87.710us 87.710us 87.710us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 57.919us 1 57.919us 57.919us 57.919us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 63.127us 1 63.127us 63.127us 63.127us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0c_3x3/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 75.679us 1 75.679us 75.679us 75.679us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0c_3x3/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 57.814us 1 57.814us 57.814us 57.814us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 52.241us 1 52.241us 52.241us 52.241us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 74.429us 1 74.429us 74.429us 74.429us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 38.282us 1 38.282us 38.282us 38.282us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 51.720us 1 51.720us 51.720us 51.720us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 79.637us 1 79.637us 79.637us 79.637us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 37.969us 1 37.969us 37.969us 37.969us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 55.210us 1 55.210us 55.210us 55.210us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 72.815us 1 72.815us 72.815us 72.815us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 29.584us 1 29.584us 29.584us 29.584us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 51.928us 1 51.928us 51.928us 51.928us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_1/Conv2d_0b_5x5/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 70.314us 1 70.314us 70.314us 70.314us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_1/Conv2d_0b_5x5/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 37.657us 1 37.657us 37.657us 37.657us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 51.512us 1 51.512us 51.512us 51.512us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 96.044us 1 96.044us 96.044us 96.044us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 38.803us 1 38.803us 38.803us 38.803us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 68.022us 1 68.022us 68.022us 68.022us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 75.366us 1 75.366us 75.366us 75.366us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 57.502us 1 57.502us 57.502us 57.502us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 53.074us 1 53.074us 53.074us 53.074us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0c_3x3/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 81.408us 1 81.408us 81.408us 81.408us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0c_3x3/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 58.751us 1 58.751us 58.751us 58.751us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 58.074us 1 58.074us 58.074us 58.074us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 108.54us 1 108.54us 108.54us 108.54us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 38.386us 1 38.386us 38.386us 38.386us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 60.418us 1 60.418us 60.418us 60.418us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_0/Conv2d_1a_1x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 84.065us 1 84.065us 84.065us 84.065us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_0/Conv2d_1a_1x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 56.668us 1 56.668us 56.668us 56.668us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 57.293us 1 57.293us 57.293us 57.293us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 101.83us 1 101.83us 101.83us 101.83us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 42.137us 1 42.137us 42.137us 42.137us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 69.012us 1 69.012us 69.012us 69.012us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 90.262us 1 90.262us 90.262us 90.262us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 58.959us 1 58.959us 58.959us 58.959us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 61.355us 1 61.355us 61.355us 61.355us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_1a_1x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 98.231us 1 98.231us 98.231us 98.231us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_1a_1x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 10.781us 1 10.781us 10.781us 10.781us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 65.470us 1 65.470us 65.470us 65.470us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 106.83us 1 106.83us 106.83us 106.83us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 29.897us 1 29.897us 29.897us 29.897us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 74.689us 1 74.689us 74.689us 74.689us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 180.58us 1 180.58us 180.58us 180.58us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 19.897us 1 19.897us 19.897us 19.897us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 151.31us 1 151.31us 151.31us 151.31us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0b_1x7/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 96.981us 1 96.981us 96.981us 96.981us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0b_1x7/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 18.855us 1 18.855us 18.855us 18.855us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 64.481us 1 64.481us 64.481us 64.481us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0c_7x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 130.89us 1 130.89us 130.89us 130.89us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0c_7x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 30.626us 1 30.626us 30.626us 30.626us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 66.408us 1 66.408us 66.408us 66.408us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 103.86us 1 103.86us 103.86us 103.86us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 22.760us 1 22.760us 22.760us 22.760us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 71.460us 1 71.460us 71.460us 71.460us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0b_7x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 582.36us 1 582.36us 582.36us 582.36us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0b_7x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 21.875us 1 21.875us 21.875us 21.875us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 93.648us 1 93.648us 93.648us 93.648us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0c_1x7/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 100.21us 1 100.21us 100.21us 100.21us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0c_1x7/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 23.385us 1 23.385us 23.385us 23.385us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 68.699us 1 68.699us 68.699us 68.699us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0d_7x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 99.222us 1 99.222us 99.222us 99.222us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0d_7x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 13.073us 1 13.073us 13.073us 13.073us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 68.023us 1 68.023us 68.023us 68.023us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0e_1x7/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 105.94us 1 105.94us 105.94us 105.94us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0e_1x7/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 22.397us 1 22.397us 22.397us 22.397us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 70.366us 1 70.366us 70.366us 70.366us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 104.59us 1 104.59us 104.59us 104.59us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 29.011us 1 29.011us 29.011us 29.011us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 73.908us 1 73.908us 73.908us 73.908us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 85.939us 1 85.939us 85.939us 85.939us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 30.313us 1 30.313us 30.313us 30.313us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 58.543us 1 58.543us 58.543us 58.543us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 82.189us 1 82.189us 82.189us 82.189us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 23.959us 1 23.959us 23.959us 23.959us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 56.876us 1 56.876us 56.876us 56.876us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0b_1x7/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 77.710us 1 77.710us 77.710us 77.710us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0b_1x7/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 23.542us 1 23.542us 23.542us 23.542us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 53.283us 1 53.283us 53.283us 53.283us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0c_7x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 127.76us 1 127.76us 127.76us 127.76us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0c_7x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 28.074us 1 28.074us 28.074us 28.074us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 72.033us 1 72.033us 72.033us 72.033us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 92.241us 1 92.241us 92.241us 92.241us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 24.740us 1 24.740us 24.740us 24.740us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 63.439us 1 63.439us 63.439us 63.439us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0b_7x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 93.804us 1 93.804us 93.804us 93.804us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0b_7x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 26.564us 1 26.564us 26.564us 26.564us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 62.710us 1 62.710us 62.710us 62.710us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0c_1x7/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 94.221us 1 94.221us 94.221us 94.221us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0c_1x7/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 25.157us 1 25.157us 25.157us 25.157us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 58.595us 1 58.595us 58.595us 58.595us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0d_7x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 81.981us 1 81.981us 81.981us 81.981us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0d_7x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 15.469us 1 15.469us 15.469us 15.469us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 58.179us 1 58.179us 58.179us 58.179us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0e_1x7/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 88.596us 1 88.596us 88.596us 88.596us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0e_1x7/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 22.970us 1 22.970us 22.970us 22.970us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 59.012us 1 59.012us 59.012us 59.012us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 86.148us 1 86.148us 86.148us 86.148us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 27.970us 1 27.970us 27.970us 27.970us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 61.512us 1 61.512us 61.512us 61.512us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 76.981us 1 76.981us 76.981us 76.981us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 28.282us 1 28.282us 28.282us 28.282us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 50.626us 1 50.626us 50.626us 50.626us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 78.856us 1 78.856us 78.856us 78.856us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 24.063us 1 24.063us 24.063us 24.063us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 55.678us 1 55.678us 55.678us 55.678us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0b_1x7/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 77.241us 1 77.241us 77.241us 77.241us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0b_1x7/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 23.855us 1 23.855us 23.855us 23.855us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 54.532us 1 54.532us 54.532us 54.532us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0c_7x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 90.471us 1 90.471us 90.471us 90.471us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0c_7x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 29.323us 1 29.323us 29.323us 29.323us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 51.459us 1 51.459us 51.459us 51.459us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 78.387us 1 78.387us 78.387us 78.387us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 26.512us 1 26.512us 26.512us 26.512us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 54.845us 1 54.845us 54.845us 54.845us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0b_7x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 93.075us 1 93.075us 93.075us 93.075us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0b_7x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 24.897us 1 24.897us 24.897us 24.897us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 70.314us 1 70.314us 70.314us 70.314us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0c_1x7/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 71.460us 1 71.460us 71.460us 71.460us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0c_1x7/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 26.303us 1 26.303us 26.303us 26.303us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 50.938us 1 50.938us 50.938us 50.938us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0d_7x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 80.627us 1 80.627us 80.627us 80.627us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0d_7x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 16.043us 1 16.043us 16.043us 16.043us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 54.533us 1 54.533us 54.533us 54.533us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0e_1x7/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 78.075us 1 78.075us 78.075us 78.075us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0e_1x7/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 20.418us 1 20.418us 20.418us 20.418us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 53.335us 1 53.335us 53.335us 53.335us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 76.877us 1 76.877us 76.877us 76.877us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 27.292us 1 27.292us 27.292us 27.292us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 55.575us 1 55.575us 55.575us 55.575us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 81.304us 1 81.304us 81.304us 81.304us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 28.491us 1 28.491us 28.491us 28.491us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 54.532us 1 54.532us 54.532us 54.532us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 73.647us 1 73.647us 73.647us 73.647us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 27.553us 1 27.553us 27.553us 27.553us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 51.355us 1 51.355us 51.355us 51.355us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0b_1x7/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 70.679us 1 70.679us 70.679us 70.679us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0b_1x7/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 29.063us 1 29.063us 29.063us 29.063us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 50.314us 1 50.314us 50.314us 50.314us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0c_7x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 87.138us 1 87.138us 87.138us 87.138us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0c_7x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 29.063us 1 29.063us 29.063us 29.063us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 61.824us 1 61.824us 61.824us 61.824us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 82.867us 1 82.867us 82.867us 82.867us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 30.625us 1 30.625us 30.625us 30.625us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 58.543us 1 58.543us 58.543us 58.543us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0b_7x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 87.190us 1 87.190us 87.190us 87.190us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0b_7x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 31.720us 1 31.720us 31.720us 31.720us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 62.242us 1 62.242us 62.242us 62.242us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0c_1x7/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 85.940us 1 85.940us 85.940us 85.940us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0c_1x7/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 31.459us 1 31.459us 31.459us 31.459us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 62.033us 1 62.033us 62.033us 62.033us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0d_7x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 69.898us 1 69.898us 69.898us 69.898us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0d_7x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 19.739us 1 19.739us 19.739us 19.739us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 49.585us 1 49.585us 49.585us 49.585us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0e_1x7/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 70.210us 1 70.210us 70.210us 70.210us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0e_1x7/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 19.322us 1 19.322us 19.322us 19.322us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 48.751us 1 48.751us 48.751us 48.751us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 74.377us 1 74.377us 74.377us 74.377us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 28.595us 1 28.595us 28.595us 28.595us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 55.418us 1 55.418us 55.418us 55.418us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 259.85us 1 259.85us 259.85us 259.85us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 28.073us 1 28.073us 28.073us 28.073us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 219.38us 1 219.38us 219.38us 219.38us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_0/Conv2d_1a_3x3/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 92.606us 1 92.606us 92.606us 92.606us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_0/Conv2d_1a_3x3/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 11.198us 1 11.198us 11.198us 11.198us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 64.689us 1 64.689us 64.689us 64.689us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 102.92us 1 102.92us 102.92us 102.92us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 31.043us 1 31.043us 31.043us 31.043us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 78.544us 1 78.544us 78.544us 78.544us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0b_1x7/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 582.62us 1 582.62us 582.62us 582.62us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0b_1x7/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 28.543us 1 28.543us 28.543us 28.543us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 58.439us 1 58.439us 58.439us 58.439us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0c_7x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 81.930us 1 81.930us 81.930us 81.930us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0c_7x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 21.303us 1 21.303us 21.303us 21.303us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 57.606us 1 57.606us 57.606us 57.606us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_1a_3x3/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 88.960us 1 88.960us 88.960us 88.960us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_1a_3x3/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 6.7200us 1 6.7200us 6.7200us 6.7200us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 63.126us 1 63.126us 63.126us 63.126us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 90.158us 1 90.158us 90.158us 90.158us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 11.094us 1 11.094us 11.094us 11.094us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 61.148us 1 61.148us 61.148us 61.148us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 110.78us 1 110.78us 110.78us 110.78us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 12.606us 1 12.606us 12.606us 12.606us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 78.491us 1 78.491us 78.491us 78.491us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_1/Conv2d_0b_1x3/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 82.502us 1 82.502us 82.502us 82.502us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_1/Conv2d_0b_1x3/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 12.656us 1 12.656us 12.656us 12.656us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 58.960us 1 58.960us 58.960us 58.960us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_1/Conv2d_0b_3x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 73.595us 1 73.595us 73.595us 73.595us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_1/Conv2d_0b_3x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 13.855us 1 13.855us 13.855us 13.855us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 52.137us 1 52.137us 52.137us 52.137us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 237.09us 1 237.09us 237.09us 237.09us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 18.751us 1 18.751us 18.751us 18.751us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 209.38us 1 209.38us 209.38us 209.38us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 89.116us 1 89.116us 89.116us 89.116us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 14.010us 1 14.010us 14.010us 14.010us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 61.147us 1 61.147us 61.147us 61.147us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0c_1x3/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 75.991us 1 75.991us 75.991us 75.991us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0c_1x3/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 9.2180us 1 9.2180us 9.2180us 9.2180us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 53.543us 1 53.543us 53.543us 53.543us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0d_3x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 81.304us 1 81.304us 81.304us 81.304us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0d_3x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 9.0630us 1 9.0630us 9.0630us 9.0630us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 59.532us 1 59.532us 59.532us 59.532us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 96.565us 1 96.565us 96.565us 96.565us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 7.0840us 1 7.0840us 7.0840us 7.0840us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 68.647us 1 68.647us 68.647us 68.647us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 68.700us 1 68.700us 68.700us 68.700us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 10.989us 1 10.989us 10.989us 10.989us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 48.959us 1 48.959us 48.959us 48.959us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 75.731us 1 75.731us 75.731us 75.731us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 13.229us 1 13.229us 13.229us 13.229us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 55.626us 1 55.626us 55.626us 55.626us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_1/Conv2d_0b_1x3/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 75.367us 1 75.367us 75.367us 75.367us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_1/Conv2d_0b_1x3/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 12.501us 1 12.501us 12.501us 12.501us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 53.334us 1 53.334us 53.334us 53.334us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_1/Conv2d_0c_3x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 73.596us 1 73.596us 73.596us 73.596us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_1/Conv2d_0c_3x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 13.385us 1 13.385us 13.385us 13.385us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 54.429us 1 54.429us 54.429us 54.429us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 86.096us 1 86.096us 86.096us 86.096us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 17.292us 1 17.292us 17.292us 17.292us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 61.720us 1 61.720us 61.720us 61.720us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 71.929us 1 71.929us 71.929us 71.929us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 14.532us 1 14.532us 14.532us 14.532us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 50.053us 1 50.053us 50.053us 50.053us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0c_1x3/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 77.606us 1 77.606us 77.606us 77.606us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0c_1x3/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 8.7500us 1 8.7500us 8.7500us 8.7500us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 56.460us 1 56.460us 56.460us 56.460us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0d_3x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 113.60us 1 113.60us 113.60us 113.60us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0d_3x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 10.156us 1 10.156us 10.156us 10.156us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 75.314us 1 75.314us 75.314us 75.314us cudaLaunchKernel
==21650== Range "Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 110.73us 1 110.73us 110.73us 110.73us Add: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm
GPU activities: 100.00% 7.7610us 1 7.7610us 7.7610us 7.7610us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 86.148us 1 86.148us 86.148us 86.148us cudaLaunchKernel
==21650== Range "AvgPool: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_3/AvgPool_0a_3x3/AvgPool"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 13.021ms 1 13.021ms 13.021ms 13.021ms AvgPool: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_3/AvgPool_0a_3x3/AvgPool
GPU activities: 100.00% 872.58us 1 872.58us 872.58us 872.58us void cudnn::detail::pooling_fw_4d_kernel<float, float, cudnn::detail::averpooling_func<float>, int=2, bool=0>(cudnnTensorStruct, float const *, cudnn::detail::pooling_fw_4d_kernel<float, float, cudnn::detail::averpooling_func<float>, int=2, bool=0>, cudnnTensorStruct*, cudnnPoolingStruct, float, cudnnPoolingStruct, int, cudnn::reduced_divisor, float)
API calls: 100.00% 118.75us 1 118.75us 118.75us 118.75us cudaLaunchKernel
==21650== Range "AvgPool: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_3/AvgPool_0a_3x3/AvgPool"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 445.69us 1 445.69us 445.69us 445.69us AvgPool: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_3/AvgPool_0a_3x3/AvgPool
GPU activities: 100.00% 1.3128ms 1 1.3128ms 1.3128ms 1.3128ms void cudnn::detail::pooling_fw_4d_kernel<float, float, cudnn::detail::averpooling_func<float>, int=2, bool=0>(cudnnTensorStruct, float const *, cudnn::detail::pooling_fw_4d_kernel<float, float, cudnn::detail::averpooling_func<float>, int=2, bool=0>, cudnnTensorStruct*, cudnnPoolingStruct, float, cudnnPoolingStruct, int, cudnn::reduced_divisor, float)
API calls: 100.00% 147.87us 1 147.87us 147.87us 147.87us cudaLaunchKernel
==21650== Range "AvgPool: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_3/AvgPool_0a_3x3/AvgPool"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 152.45us 1 152.45us 152.45us 152.45us AvgPool: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_3/AvgPool_0a_3x3/AvgPool
GPU activities: 100.00% 1.4958ms 1 1.4958ms 1.4958ms 1.4958ms void cudnn::detail::pooling_fw_4d_kernel<float, float, cudnn::detail::averpooling_func<float>, int=2, bool=0>(cudnnTensorStruct, float const *, cudnn::detail::pooling_fw_4d_kernel<float, float, cudnn::detail::averpooling_func<float>, int=2, bool=0>, cudnnTensorStruct*, cudnnPoolingStruct, float, cudnnPoolingStruct, int, cudnn::reduced_divisor, float)
API calls: 100.00% 85.523us 1 85.523us 85.523us 85.523us cudaLaunchKernel
==21650== Range "AvgPool: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_3/AvgPool_0a_3x3/AvgPool"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 184.12us 1 184.12us 184.12us 184.12us AvgPool: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_3/AvgPool_0a_3x3/AvgPool
GPU activities: 100.00% 980.96us 1 980.96us 980.96us 980.96us void cudnn::detail::pooling_fw_4d_kernel<float, float, cudnn::detail::averpooling_func<float>, int=2, bool=0>(cudnnTensorStruct, float const *, cudnn::detail::pooling_fw_4d_kernel<float, float, cudnn::detail::averpooling_func<float>, int=2, bool=0>, cudnnTensorStruct*, cudnnPoolingStruct, float, cudnnPoolingStruct, int, cudnn::reduced_divisor, float)
API calls: 100.00% 75.627us 1 75.627us 75.627us 75.627us cudaLaunchKernel
==21650== Range "AvgPool: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_3/AvgPool_0a_3x3/AvgPool"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 217.25us 1 217.25us 217.25us 217.25us AvgPool: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_3/AvgPool_0a_3x3/AvgPool
GPU activities: 100.00% 1.0451ms 1 1.0451ms 1.0451ms 1.0451ms void cudnn::detail::pooling_fw_4d_kernel<float, float, cudnn::detail::averpooling_func<float>, int=2, bool=0>(cudnnTensorStruct, float const *, cudnn::detail::pooling_fw_4d_kernel<float, float, cudnn::detail::averpooling_func<float>, int=2, bool=0>, cudnnTensorStruct*, cudnnPoolingStruct, float, cudnnPoolingStruct, int, cudnn::reduced_divisor, float)
API calls: 100.00% 96.721us 1 96.721us 96.721us 96.721us cudaLaunchKernel
==21650== Range "AvgPool: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_3/AvgPool_0a_3x3/AvgPool"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 138.86us 1 138.86us 138.86us 138.86us AvgPool: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_3/AvgPool_0a_3x3/AvgPool
GPU activities: 100.00% 1.0442ms 1 1.0442ms 1.0442ms 1.0442ms void cudnn::detail::pooling_fw_4d_kernel<float, float, cudnn::detail::averpooling_func<float>, int=2, bool=0>(cudnnTensorStruct, float const *, cudnn::detail::pooling_fw_4d_kernel<float, float, cudnn::detail::averpooling_func<float>, int=2, bool=0>, cudnnTensorStruct*, cudnnPoolingStruct, float, cudnnPoolingStruct, int, cudnn::reduced_divisor, float)
API calls: 100.00% 67.033us 1 67.033us 67.033us 67.033us cudaLaunchKernel
==21650== Range "AvgPool: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_3/AvgPool_0a_3x3/AvgPool"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 121.98us 1 121.98us 121.98us 121.98us AvgPool: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_3/AvgPool_0a_3x3/AvgPool
GPU activities: 100.00% 1.0434ms 1 1.0434ms 1.0434ms 1.0434ms void cudnn::detail::pooling_fw_4d_kernel<float, float, cudnn::detail::averpooling_func<float>, int=2, bool=0>(cudnnTensorStruct, float const *, cudnn::detail::pooling_fw_4d_kernel<float, float, cudnn::detail::averpooling_func<float>, int=2, bool=0>, cudnnTensorStruct*, cudnnPoolingStruct, float, cudnnPoolingStruct, int, cudnn::reduced_divisor, float)
API calls: 100.00% 56.303us 1 56.303us 56.303us 56.303us cudaLaunchKernel
==21650== Range "AvgPool: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_3/AvgPool_0a_3x3/AvgPool"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 138.49us 1 138.49us 138.49us 138.49us AvgPool: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_3/AvgPool_0a_3x3/AvgPool
GPU activities: 100.00% 362.20us 1 362.20us 362.20us 362.20us void cudnn::detail::pooling_fw_4d_kernel<float, float, cudnn::detail::averpooling_func<float>, int=2, bool=0>(cudnnTensorStruct, float const *, cudnn::detail::pooling_fw_4d_kernel<float, float, cudnn::detail::averpooling_func<float>, int=2, bool=0>, cudnnTensorStruct*, cudnnPoolingStruct, float, cudnnPoolingStruct, int, cudnn::reduced_divisor, float)
API calls: 100.00% 70.262us 1 70.262us 70.262us 70.262us cudaLaunchKernel
==21650== Range "AvgPool: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_3/AvgPool_0a_3x3/AvgPool"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 138.13us 1 138.13us 138.13us 138.13us AvgPool: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_3/AvgPool_0a_3x3/AvgPool
GPU activities: 100.00% 614.18us 1 614.18us 614.18us 614.18us void cudnn::detail::pooling_fw_4d_kernel<float, float, cudnn::detail::averpooling_func<float>, int=2, bool=0>(cudnnTensorStruct, float const *, cudnn::detail::pooling_fw_4d_kernel<float, float, cudnn::detail::averpooling_func<float>, int=2, bool=0>, cudnnTensorStruct*, cudnnPoolingStruct, float, cudnnPoolingStruct, int, cudnn::reduced_divisor, float)
API calls: 100.00% 69.169us 1 69.169us 69.169us 69.169us cudaLaunchKernel
==21650== Range "Cast: ArithmeticOptimizer/ReorderCastLikeAndValuePreserving_float_Cast"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 245.06us 1 245.06us 245.06us 245.06us Cast: ArithmeticOptimizer/ReorderCastLikeAndValuePreserving_float_Cast
GPU activities: 100.00% 982.16us 1 982.16us 982.16us 982.16us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 141.98us 1 141.98us 141.98us 141.98us cudaLaunchKernel
==21650== Range "ConcatV2: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/concat"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 320.71ms 1 320.71ms 320.71ms 320.71ms ConcatV2: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/concat
GPU activities: 100.00% 184.48us 4 46.121us 24.115us 67.710us [CUDA memcpy DtoD]
API calls: 100.00% 124.75ms 4 31.186ms 47.865us 124.55ms cudaMemcpyAsync
==21650== Range "ConcatV2: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/concat"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 336.99us 1 336.99us 336.99us 336.99us ConcatV2: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/concat
GPU activities: 100.00% 203.44us 4 50.860us 44.585us 67.033us [CUDA memcpy DtoD]
API calls: 100.00% 268.39us 4 67.098us 51.043us 104.01us cudaMemcpyAsync
==21650== Range "ConcatV2: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/concat"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 294.07us 1 294.07us 294.07us 294.07us ConcatV2: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/concat
GPU activities: 100.00% 201.99us 4 50.496us 44.949us 66.877us [CUDA memcpy DtoD]
API calls: 100.00% 254.49us 4 63.621us 50.470us 91.669us cudaMemcpyAsync
==21650== Range "ConcatV2: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/concat"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 350.06us 1 350.06us 350.06us 350.06us ConcatV2: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/concat
GPU activities: 100.00% 131.46us 3 43.820us 16.615us 65.106us [CUDA memcpy DtoD]
API calls: 100.00% 265.74us 3 88.578us 63.752us 132.56us cudaMemcpyAsync
==21650== Range "ConcatV2: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/concat"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 373.13us 1 373.13us 373.13us 373.13us ConcatV2: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/concat
GPU activities: 100.00% 136.25us 4 34.063us 32.449us 35.157us [CUDA memcpy DtoD]
API calls: 100.00% 303.29us 4 75.822us 55.158us 115.37us cudaMemcpyAsync
==21650== Range "ConcatV2: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/concat"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 304.75us 1 304.75us 304.75us 304.75us ConcatV2: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/concat
GPU activities: 100.00% 132.82us 4 33.204us 32.188us 34.168us [CUDA memcpy DtoD]
API calls: 100.00% 256.73us 4 64.181us 50.731us 98.440us cudaMemcpyAsync
==21650== Range "ConcatV2: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/concat"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 319.59us 1 319.59us 319.59us 319.59us ConcatV2: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/concat
GPU activities: 100.00% 131.67us 4 32.917us 32.345us 33.438us [CUDA memcpy DtoD]
API calls: 100.00% 276.41us 4 69.103us 53.439us 89.481us cudaMemcpyAsync
==21650== Range "ConcatV2: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/concat"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 290.37us 1 290.37us 290.37us 290.37us ConcatV2: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/concat
GPU activities: 100.00% 130.68us 4 32.670us 31.876us 33.229us [CUDA memcpy DtoD]
API calls: 100.00% 248.81us 4 62.201us 52.918us 84.585us cudaMemcpyAsync
==21650== Range "ConcatV2: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/concat"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 246.36us 1 246.36us 246.36us 246.36us ConcatV2: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/concat
GPU activities: 100.00% 46.929us 3 15.643us 5.9390us 29.792us [CUDA memcpy DtoD]
API calls: 100.00% 204.02us 3 68.005us 52.918us 89.533us cudaMemcpyAsync
==21650== Range "ConcatV2: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/concat"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 402.30us 1 402.30us 402.30us 402.30us ConcatV2: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/concat
GPU activities: 100.00% 84.115us 6 14.019us 8.4370us 16.199us [CUDA memcpy DtoD]
API calls: 100.00% 351.31us 6 58.551us 48.126us 85.888us cudaMemcpyAsync
==21650== Range "ConcatV2: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/concat"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 464.02us 1 464.02us 464.02us 464.02us ConcatV2: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/concat
GPU activities: 100.00% 80.993us 6 13.498us 7.9160us 15.522us [CUDA memcpy DtoD]
API calls: 100.00% 393.97us 6 65.661us 49.741us 122.97us cudaMemcpyAsync
==21650== Range "Const: PermConstNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.0630us 1 4.0630us 4.0630us 4.0630us Const: PermConstNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/final_retrain_ops/biases/final_biases"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.5310us 1 4.5310us 4.5310us 4.5310us Const: import/final_retrain_ops/biases/final_biases
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/final_retrain_ops/weights/final_weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3750us 1 4.3750us 4.3750us 4.3750us Const: import/final_retrain_ops/weights/final_weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Conv2d_1a_3x3/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.6360us 1 4.6360us 4.6360us 4.6360us Const: import/module/InceptionV3/Conv2d_1a_3x3/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Conv2d_2a_3x3/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3750us 1 4.3750us 4.3750us 4.3750us Const: import/module/InceptionV3/Conv2d_2a_3x3/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Conv2d_2b_3x3/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.0100us 1 4.0100us 4.0100us 4.0100us Const: import/module/InceptionV3/Conv2d_2b_3x3/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Conv2d_3b_1x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3240us 1 4.3240us 4.3240us 4.3240us Const: import/module/InceptionV3/Conv2d_3b_1x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Conv2d_4a_3x3/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 3.9580us 1 3.9580us 3.9580us 3.9580us Const: import/module/InceptionV3/Conv2d_4a_3x3/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_5b/Branch_0/Conv2d_0a_1x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3750us 1 4.3750us 4.3750us 4.3750us Const: import/module/InceptionV3/Mixed_5b/Branch_0/Conv2d_0a_1x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_5b/Branch_1/Conv2d_0a_1x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3750us 1 4.3750us 4.3750us 4.3750us Const: import/module/InceptionV3/Mixed_5b/Branch_1/Conv2d_0a_1x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_5b/Branch_1/Conv2d_0b_5x5/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3230us 1 4.3230us 4.3230us 4.3230us Const: import/module/InceptionV3/Mixed_5b/Branch_1/Conv2d_0b_5x5/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_5b/Branch_2/Conv2d_0a_1x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.0100us 1 4.0100us 4.0100us 4.0100us Const: import/module/InceptionV3/Mixed_5b/Branch_2/Conv2d_0a_1x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_5b/Branch_2/Conv2d_0b_3x3/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.4800us 1 4.4800us 4.4800us 4.4800us Const: import/module/InceptionV3/Mixed_5b/Branch_2/Conv2d_0b_3x3/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_5b/Branch_2/Conv2d_0c_3x3/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.4270us 1 4.4270us 4.4270us 4.4270us Const: import/module/InceptionV3/Mixed_5b/Branch_2/Conv2d_0c_3x3/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_5b/Branch_3/Conv2d_0b_1x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.6350us 1 4.6350us 4.6350us 4.6350us Const: import/module/InceptionV3/Mixed_5b/Branch_3/Conv2d_0b_1x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_5c/Branch_0/Conv2d_0a_1x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.5840us 1 4.5840us 4.5840us 4.5840us Const: import/module/InceptionV3/Mixed_5c/Branch_0/Conv2d_0a_1x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_5c/Branch_1/Conv2d_0b_1x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.1670us 1 4.1670us 4.1670us 4.1670us Const: import/module/InceptionV3/Mixed_5c/Branch_1/Conv2d_0b_1x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_5c/Branch_1/Conv_1_0c_5x5/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 13.594us 1 13.594us 13.594us 13.594us Const: import/module/InceptionV3/Mixed_5c/Branch_1/Conv_1_0c_5x5/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_5c/Branch_2/Conv2d_0a_1x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.4270us 1 4.4270us 4.4270us 4.4270us Const: import/module/InceptionV3/Mixed_5c/Branch_2/Conv2d_0a_1x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_5c/Branch_2/Conv2d_0b_3x3/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.4790us 1 4.4790us 4.4790us 4.4790us Const: import/module/InceptionV3/Mixed_5c/Branch_2/Conv2d_0b_3x3/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_5c/Branch_2/Conv2d_0c_3x3/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.4270us 1 4.4270us 4.4270us 4.4270us Const: import/module/InceptionV3/Mixed_5c/Branch_2/Conv2d_0c_3x3/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_5c/Branch_3/Conv2d_0b_1x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2710us 1 4.2710us 4.2710us 4.2710us Const: import/module/InceptionV3/Mixed_5c/Branch_3/Conv2d_0b_1x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_5d/Branch_0/Conv2d_0a_1x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3750us 1 4.3750us 4.3750us 4.3750us Const: import/module/InceptionV3/Mixed_5d/Branch_0/Conv2d_0a_1x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_5d/Branch_1/Conv2d_0a_1x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 3.9590us 1 3.9590us 3.9590us 3.9590us Const: import/module/InceptionV3/Mixed_5d/Branch_1/Conv2d_0a_1x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_5d/Branch_1/Conv2d_0b_5x5/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.0100us 1 4.0100us 4.0100us 4.0100us Const: import/module/InceptionV3/Mixed_5d/Branch_1/Conv2d_0b_5x5/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_5d/Branch_2/Conv2d_0a_1x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.1150us 1 4.1150us 4.1150us 4.1150us Const: import/module/InceptionV3/Mixed_5d/Branch_2/Conv2d_0a_1x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_5d/Branch_2/Conv2d_0b_3x3/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.0630us 1 4.0630us 4.0630us 4.0630us Const: import/module/InceptionV3/Mixed_5d/Branch_2/Conv2d_0b_3x3/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_5d/Branch_2/Conv2d_0c_3x3/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.1660us 1 4.1660us 4.1660us 4.1660us Const: import/module/InceptionV3/Mixed_5d/Branch_2/Conv2d_0c_3x3/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_5d/Branch_3/Conv2d_0b_1x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.6870us 1 4.6870us 4.6870us 4.6870us Const: import/module/InceptionV3/Mixed_5d/Branch_3/Conv2d_0b_1x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_6a/Branch_0/Conv2d_1a_1x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.5840us 1 4.5840us 4.5840us 4.5840us Const: import/module/InceptionV3/Mixed_6a/Branch_0/Conv2d_1a_1x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_6a/Branch_1/Conv2d_0a_1x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.6880us 1 4.6880us 4.6880us 4.6880us Const: import/module/InceptionV3/Mixed_6a/Branch_1/Conv2d_0a_1x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_6a/Branch_1/Conv2d_0b_3x3/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.7400us 1 4.7400us 4.7400us 4.7400us Const: import/module/InceptionV3/Mixed_6a/Branch_1/Conv2d_0b_3x3/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_6a/Branch_1/Conv2d_1a_1x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.8440us 1 4.8440us 4.8440us 4.8440us Const: import/module/InceptionV3/Mixed_6a/Branch_1/Conv2d_1a_1x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_6b/Branch_0/Conv2d_0a_1x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.1150us 1 4.1150us 4.1150us 4.1150us Const: import/module/InceptionV3/Mixed_6b/Branch_0/Conv2d_0a_1x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_6b/Branch_1/Conv2d_0a_1x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.1150us 1 4.1150us 4.1150us 4.1150us Const: import/module/InceptionV3/Mixed_6b/Branch_1/Conv2d_0a_1x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_6b/Branch_1/Conv2d_0b_1x7/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.1670us 1 4.1670us 4.1670us 4.1670us Const: import/module/InceptionV3/Mixed_6b/Branch_1/Conv2d_0b_1x7/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_6b/Branch_1/Conv2d_0c_7x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.4280us 1 4.4280us 4.4280us 4.4280us Const: import/module/InceptionV3/Mixed_6b/Branch_1/Conv2d_0c_7x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_6b/Branch_2/Conv2d_0a_1x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.5310us 1 4.5310us 4.5310us 4.5310us Const: import/module/InceptionV3/Mixed_6b/Branch_2/Conv2d_0a_1x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_6b/Branch_2/Conv2d_0b_7x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2710us 1 4.2710us 4.2710us 4.2710us Const: import/module/InceptionV3/Mixed_6b/Branch_2/Conv2d_0b_7x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_6b/Branch_2/Conv2d_0c_1x7/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.1140us 1 4.1140us 4.1140us 4.1140us Const: import/module/InceptionV3/Mixed_6b/Branch_2/Conv2d_0c_1x7/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_6b/Branch_2/Conv2d_0d_7x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.1670us 1 4.1670us 4.1670us 4.1670us Const: import/module/InceptionV3/Mixed_6b/Branch_2/Conv2d_0d_7x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_6b/Branch_2/Conv2d_0e_1x7/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3230us 1 4.3230us 4.3230us 4.3230us Const: import/module/InceptionV3/Mixed_6b/Branch_2/Conv2d_0e_1x7/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_6b/Branch_3/Conv2d_0b_1x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.0100us 1 4.0100us 4.0100us 4.0100us Const: import/module/InceptionV3/Mixed_6b/Branch_3/Conv2d_0b_1x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_6c/Branch_0/Conv2d_0a_1x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3230us 1 4.3230us 4.3230us 4.3230us Const: import/module/InceptionV3/Mixed_6c/Branch_0/Conv2d_0a_1x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_6c/Branch_1/Conv2d_0a_1x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.6350us 1 4.6350us 4.6350us 4.6350us Const: import/module/InceptionV3/Mixed_6c/Branch_1/Conv2d_0a_1x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_6c/Branch_1/Conv2d_0b_1x7/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2190us 1 4.2190us 4.2190us 4.2190us Const: import/module/InceptionV3/Mixed_6c/Branch_1/Conv2d_0b_1x7/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_6c/Branch_1/Conv2d_0c_7x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.4270us 1 4.4270us 4.4270us 4.4270us Const: import/module/InceptionV3/Mixed_6c/Branch_1/Conv2d_0c_7x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_6c/Branch_2/Conv2d_0a_1x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2180us 1 4.2180us 4.2180us 4.2180us Const: import/module/InceptionV3/Mixed_6c/Branch_2/Conv2d_0a_1x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_6c/Branch_2/Conv2d_0b_7x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 7.0320us 1 7.0320us 7.0320us 7.0320us Const: import/module/InceptionV3/Mixed_6c/Branch_2/Conv2d_0b_7x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_6c/Branch_2/Conv2d_0c_1x7/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2710us 1 4.2710us 4.2710us 4.2710us Const: import/module/InceptionV3/Mixed_6c/Branch_2/Conv2d_0c_1x7/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_6c/Branch_2/Conv2d_0d_7x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.6360us 1 4.6360us 4.6360us 4.6360us Const: import/module/InceptionV3/Mixed_6c/Branch_2/Conv2d_0d_7x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_6c/Branch_2/Conv2d_0e_1x7/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.6880us 1 4.6880us 4.6880us 4.6880us Const: import/module/InceptionV3/Mixed_6c/Branch_2/Conv2d_0e_1x7/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_6c/Branch_3/Conv2d_0b_1x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.0100us 1 4.0100us 4.0100us 4.0100us Const: import/module/InceptionV3/Mixed_6c/Branch_3/Conv2d_0b_1x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_6d/Branch_0/Conv2d_0a_1x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.5310us 1 4.5310us 4.5310us 4.5310us Const: import/module/InceptionV3/Mixed_6d/Branch_0/Conv2d_0a_1x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_6d/Branch_1/Conv2d_0a_1x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.4800us 1 4.4800us 4.4800us 4.4800us Const: import/module/InceptionV3/Mixed_6d/Branch_1/Conv2d_0a_1x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_6d/Branch_1/Conv2d_0b_1x7/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.6870us 1 4.6870us 4.6870us 4.6870us Const: import/module/InceptionV3/Mixed_6d/Branch_1/Conv2d_0b_1x7/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_6d/Branch_1/Conv2d_0c_7x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 14.375us 1 14.375us 14.375us 14.375us Const: import/module/InceptionV3/Mixed_6d/Branch_1/Conv2d_0c_7x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_6d/Branch_2/Conv2d_0a_1x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3230us 1 4.3230us 4.3230us 4.3230us Const: import/module/InceptionV3/Mixed_6d/Branch_2/Conv2d_0a_1x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_6d/Branch_2/Conv2d_0b_7x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2710us 1 4.2710us 4.2710us 4.2710us Const: import/module/InceptionV3/Mixed_6d/Branch_2/Conv2d_0b_7x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_6d/Branch_2/Conv2d_0c_1x7/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 3.9060us 1 3.9060us 3.9060us 3.9060us Const: import/module/InceptionV3/Mixed_6d/Branch_2/Conv2d_0c_1x7/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_6d/Branch_2/Conv2d_0d_7x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2710us 1 4.2710us 4.2710us 4.2710us Const: import/module/InceptionV3/Mixed_6d/Branch_2/Conv2d_0d_7x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_6d/Branch_2/Conv2d_0e_1x7/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.0110us 1 4.0110us 4.0110us 4.0110us Const: import/module/InceptionV3/Mixed_6d/Branch_2/Conv2d_0e_1x7/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_6d/Branch_3/Conv2d_0b_1x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.4270us 1 4.4270us 4.4270us 4.4270us Const: import/module/InceptionV3/Mixed_6d/Branch_3/Conv2d_0b_1x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_6e/Branch_0/Conv2d_0a_1x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2710us 1 4.2710us 4.2710us 4.2710us Const: import/module/InceptionV3/Mixed_6e/Branch_0/Conv2d_0a_1x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_6e/Branch_1/Conv2d_0a_1x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.4790us 1 4.4790us 4.4790us 4.4790us Const: import/module/InceptionV3/Mixed_6e/Branch_1/Conv2d_0a_1x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_6e/Branch_1/Conv2d_0b_1x7/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2710us 1 4.2710us 4.2710us 4.2710us Const: import/module/InceptionV3/Mixed_6e/Branch_1/Conv2d_0b_1x7/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_6e/Branch_1/Conv2d_0c_7x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3750us 1 4.3750us 4.3750us 4.3750us Const: import/module/InceptionV3/Mixed_6e/Branch_1/Conv2d_0c_7x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_6e/Branch_2/Conv2d_0a_1x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.5830us 1 4.5830us 4.5830us 4.5830us Const: import/module/InceptionV3/Mixed_6e/Branch_2/Conv2d_0a_1x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_6e/Branch_2/Conv2d_0b_7x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2190us 1 4.2190us 4.2190us 4.2190us Const: import/module/InceptionV3/Mixed_6e/Branch_2/Conv2d_0b_7x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_6e/Branch_2/Conv2d_0c_1x7/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.0620us 1 4.0620us 4.0620us 4.0620us Const: import/module/InceptionV3/Mixed_6e/Branch_2/Conv2d_0c_1x7/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_6e/Branch_2/Conv2d_0d_7x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.8960us 1 4.8960us 4.8960us 4.8960us Const: import/module/InceptionV3/Mixed_6e/Branch_2/Conv2d_0d_7x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_6e/Branch_2/Conv2d_0e_1x7/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.4800us 1 4.4800us 4.4800us 4.4800us Const: import/module/InceptionV3/Mixed_6e/Branch_2/Conv2d_0e_1x7/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_6e/Branch_3/Conv2d_0b_1x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 3.9580us 1 3.9580us 3.9580us 3.9580us Const: import/module/InceptionV3/Mixed_6e/Branch_3/Conv2d_0b_1x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_7a/Branch_0/Conv2d_0a_1x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.5310us 1 4.5310us 4.5310us 4.5310us Const: import/module/InceptionV3/Mixed_7a/Branch_0/Conv2d_0a_1x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_7a/Branch_0/Conv2d_1a_3x3/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2710us 1 4.2710us 4.2710us 4.2710us Const: import/module/InceptionV3/Mixed_7a/Branch_0/Conv2d_1a_3x3/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_7a/Branch_1/Conv2d_0a_1x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 3.9580us 1 3.9580us 3.9580us 3.9580us Const: import/module/InceptionV3/Mixed_7a/Branch_1/Conv2d_0a_1x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_7a/Branch_1/Conv2d_0b_1x7/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3230us 1 4.3230us 4.3230us 4.3230us Const: import/module/InceptionV3/Mixed_7a/Branch_1/Conv2d_0b_1x7/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_7a/Branch_1/Conv2d_0c_7x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3230us 1 4.3230us 4.3230us 4.3230us Const: import/module/InceptionV3/Mixed_7a/Branch_1/Conv2d_0c_7x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_7a/Branch_1/Conv2d_1a_3x3/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.4270us 1 4.4270us 4.4270us 4.4270us Const: import/module/InceptionV3/Mixed_7a/Branch_1/Conv2d_1a_3x3/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_7b/Branch_0/Conv2d_0a_1x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3230us 1 4.3230us 4.3230us 4.3230us Const: import/module/InceptionV3/Mixed_7b/Branch_0/Conv2d_0a_1x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_7b/Branch_1/Conv2d_0a_1x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3750us 1 4.3750us 4.3750us 4.3750us Const: import/module/InceptionV3/Mixed_7b/Branch_1/Conv2d_0a_1x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_7b/Branch_1/Conv2d_0b_1x3/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.5310us 1 4.5310us 4.5310us 4.5310us Const: import/module/InceptionV3/Mixed_7b/Branch_1/Conv2d_0b_1x3/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_7b/Branch_1/Conv2d_0b_3x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3750us 1 4.3750us 4.3750us 4.3750us Const: import/module/InceptionV3/Mixed_7b/Branch_1/Conv2d_0b_3x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_7b/Branch_2/Conv2d_0a_1x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 5.5210us 1 5.5210us 5.5210us 5.5210us Const: import/module/InceptionV3/Mixed_7b/Branch_2/Conv2d_0a_1x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_7b/Branch_2/Conv2d_0b_3x3/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.1670us 1 4.1670us 4.1670us 4.1670us Const: import/module/InceptionV3/Mixed_7b/Branch_2/Conv2d_0b_3x3/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_7b/Branch_2/Conv2d_0c_1x3/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.7400us 1 4.7400us 4.7400us 4.7400us Const: import/module/InceptionV3/Mixed_7b/Branch_2/Conv2d_0c_1x3/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_7b/Branch_2/Conv2d_0d_3x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 3.9590us 1 3.9590us 3.9590us 3.9590us Const: import/module/InceptionV3/Mixed_7b/Branch_2/Conv2d_0d_3x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_7b/Branch_3/Conv2d_0b_1x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.1150us 1 4.1150us 4.1150us 4.1150us Const: import/module/InceptionV3/Mixed_7b/Branch_3/Conv2d_0b_1x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_7c/Branch_0/Conv2d_0a_1x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3750us 1 4.3750us 4.3750us 4.3750us Const: import/module/InceptionV3/Mixed_7c/Branch_0/Conv2d_0a_1x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_7c/Branch_1/Conv2d_0a_1x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.1660us 1 4.1660us 4.1660us 4.1660us Const: import/module/InceptionV3/Mixed_7c/Branch_1/Conv2d_0a_1x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_7c/Branch_1/Conv2d_0b_1x3/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 3.9580us 1 3.9580us 3.9580us 3.9580us Const: import/module/InceptionV3/Mixed_7c/Branch_1/Conv2d_0b_1x3/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_7c/Branch_1/Conv2d_0c_3x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.0630us 1 4.0630us 4.0630us 4.0630us Const: import/module/InceptionV3/Mixed_7c/Branch_1/Conv2d_0c_3x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_7c/Branch_2/Conv2d_0a_1x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.4790us 1 4.4790us 4.4790us 4.4790us Const: import/module/InceptionV3/Mixed_7c/Branch_2/Conv2d_0a_1x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_7c/Branch_2/Conv2d_0b_3x3/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3230us 1 4.3230us 4.3230us 4.3230us Const: import/module/InceptionV3/Mixed_7c/Branch_2/Conv2d_0b_3x3/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_7c/Branch_2/Conv2d_0c_1x3/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.1670us 1 4.1670us 4.1670us 4.1670us Const: import/module/InceptionV3/Mixed_7c/Branch_2/Conv2d_0c_1x3/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_7c/Branch_2/Conv2d_0d_3x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.0100us 1 4.0100us 4.0100us 4.0100us Const: import/module/InceptionV3/Mixed_7c/Branch_2/Conv2d_0d_3x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module/InceptionV3/Mixed_7c/Branch_3/Conv2d_0b_1x1/weights"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.6360us 1 4.6360us 4.6360us 4.6360us Const: import/module/InceptionV3/Mixed_7c/Branch_3/Conv2d_0b_1x1/weights
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3750us 1 4.3750us 4.3750us 4.3750us Const: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 5.2090us 1 5.2090us 5.2090us 5.2090us Const: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_2a_3x3/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.1670us 1 4.1670us 4.1670us 4.1670us Const: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_2a_3x3/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_2a_3x3/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3230us 1 4.3230us 4.3230us 4.3230us Const: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_2a_3x3/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_2b_3x3/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 3.9580us 1 3.9580us 3.9580us 3.9580us Const: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_2b_3x3/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_2b_3x3/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.5310us 1 4.5310us 4.5310us 4.5310us Const: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_2b_3x3/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_3b_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.0620us 1 4.0620us 4.0620us 4.0620us Const: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_3b_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_3b_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.1140us 1 4.1140us 4.1140us 4.1140us Const: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_3b_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_4a_3x3/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2710us 1 4.2710us 4.2710us 4.2710us Const: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_4a_3x3/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_4a_3x3/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.4270us 1 4.4270us 4.4270us 4.4270us Const: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_4a_3x3/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.4790us 1 4.4790us 4.4790us 4.4790us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 5.3650us 1 5.3650us 5.3650us 5.3650us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2190us 1 4.2190us 4.2190us 4.2190us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.0100us 1 4.0100us 4.0100us 4.0100us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_1/Conv2d_0b_5x5/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2710us 1 4.2710us 4.2710us 4.2710us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_1/Conv2d_0b_5x5/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_1/Conv2d_0b_5x5/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 3.9590us 1 3.9590us 3.9590us 3.9590us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_1/Conv2d_0b_5x5/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.1670us 1 4.1670us 4.1670us 4.1670us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2710us 1 4.2710us 4.2710us 4.2710us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.4270us 1 4.4270us 4.4270us 4.4270us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.4270us 1 4.4270us 4.4270us 4.4270us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0c_3x3/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.6360us 1 4.6360us 4.6360us 4.6360us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0c_3x3/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0c_3x3/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.4270us 1 4.4270us 4.4270us 4.4270us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0c_3x3/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3750us 1 4.3750us 4.3750us 4.3750us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.5320us 1 4.5320us 4.5320us 4.5320us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 8.1260us 1 8.1260us 8.1260us 8.1260us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 5.0530us 1 5.0530us 5.0530us 5.0530us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_1/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.5840us 1 4.5840us 4.5840us 4.5840us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_1/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_1/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2710us 1 4.2710us 4.2710us 4.2710us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_1/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_1/Conv_1_0c_5x5/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3750us 1 4.3750us 4.3750us 4.3750us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_1/Conv_1_0c_5x5/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_1/Conv_1_0c_5x5/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2710us 1 4.2710us 4.2710us 4.2710us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_1/Conv_1_0c_5x5/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2710us 1 4.2710us 4.2710us 4.2710us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.4790us 1 4.4790us 4.4790us 4.4790us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2710us 1 4.2710us 4.2710us 4.2710us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3750us 1 4.3750us 4.3750us 4.3750us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0c_3x3/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.0630us 1 4.0630us 4.0630us 4.0630us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0c_3x3/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0c_3x3/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2710us 1 4.2710us 4.2710us 4.2710us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0c_3x3/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3230us 1 4.3230us 4.3230us 4.3230us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.5840us 1 4.5840us 4.5840us 4.5840us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.1150us 1 4.1150us 4.1150us 4.1150us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2190us 1 4.2190us 4.2190us 4.2190us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.1670us 1 4.1670us 4.1670us 4.1670us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 3.9580us 1 3.9580us 3.9580us 3.9580us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_1/Conv2d_0b_5x5/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2710us 1 4.2710us 4.2710us 4.2710us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_1/Conv2d_0b_5x5/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_1/Conv2d_0b_5x5/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3230us 1 4.3230us 4.3230us 4.3230us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_1/Conv2d_0b_5x5/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3230us 1 4.3230us 4.3230us 4.3230us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.1670us 1 4.1670us 4.1670us 4.1670us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.4790us 1 4.4790us 4.4790us 4.4790us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2710us 1 4.2710us 4.2710us 4.2710us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0c_3x3/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3750us 1 4.3750us 4.3750us 4.3750us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0c_3x3/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0c_3x3/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.5830us 1 4.5830us 4.5830us 4.5830us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0c_3x3/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 5.0520us 1 5.0520us 5.0520us 5.0520us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 5.4690us 1 5.4690us 5.4690us 5.4690us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_0/Conv2d_1a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2710us 1 4.2710us 4.2710us 4.2710us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_0/Conv2d_1a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_0/Conv2d_1a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.5830us 1 4.5830us 4.5830us 4.5830us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_0/Conv2d_1a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.6360us 1 4.6360us 4.6360us 4.6360us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2190us 1 4.2190us 4.2190us 4.2190us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.7400us 1 4.7400us 4.7400us 4.7400us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2710us 1 4.2710us 4.2710us 4.2710us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_1a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.4270us 1 4.4270us 4.4270us 4.4270us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_1a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_1a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.1670us 1 4.1670us 4.1670us 4.1670us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_1a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.0100us 1 4.0100us 4.0100us 4.0100us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.5830us 1 4.5830us 4.5830us 4.5830us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.4270us 1 4.4270us 4.4270us 4.4270us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2190us 1 4.2190us 4.2190us 4.2190us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0b_1x7/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2190us 1 4.2190us 4.2190us 4.2190us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0b_1x7/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0b_1x7/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.4790us 1 4.4790us 4.4790us 4.4790us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0b_1x7/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0c_7x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2710us 1 4.2710us 4.2710us 4.2710us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0c_7x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0c_7x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.1660us 1 4.1660us 4.1660us 4.1660us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0c_7x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3750us 1 4.3750us 4.3750us 4.3750us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3230us 1 4.3230us 4.3230us 4.3230us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0b_7x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2710us 1 4.2710us 4.2710us 4.2710us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0b_7x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0b_7x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.8440us 1 4.8440us 4.8440us 4.8440us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0b_7x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0c_1x7/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3230us 1 4.3230us 4.3230us 4.3230us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0c_1x7/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0c_1x7/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.1670us 1 4.1670us 4.1670us 4.1670us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0c_1x7/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0d_7x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2710us 1 4.2710us 4.2710us 4.2710us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0d_7x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0d_7x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2710us 1 4.2710us 4.2710us 4.2710us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0d_7x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0e_1x7/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.7400us 1 4.7400us 4.7400us 4.7400us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0e_1x7/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0e_1x7/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 5.3120us 1 5.3120us 5.3120us 5.3120us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0e_1x7/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.0110us 1 4.0110us 4.0110us 4.0110us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.1670us 1 4.1670us 4.1670us 4.1670us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3230us 1 4.3230us 4.3230us 4.3230us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.7920us 1 4.7920us 4.7920us 4.7920us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.4790us 1 4.4790us 4.4790us 4.4790us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.4790us 1 4.4790us 4.4790us 4.4790us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0b_1x7/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2710us 1 4.2710us 4.2710us 4.2710us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0b_1x7/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0b_1x7/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2190us 1 4.2190us 4.2190us 4.2190us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0b_1x7/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0c_7x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.1670us 1 4.1670us 4.1670us 4.1670us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0c_7x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0c_7x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.1670us 1 4.1670us 4.1670us 4.1670us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0c_7x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3750us 1 4.3750us 4.3750us 4.3750us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.8440us 1 4.8440us 4.8440us 4.8440us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0b_7x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3750us 1 4.3750us 4.3750us 4.3750us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0b_7x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0b_7x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3230us 1 4.3230us 4.3230us 4.3230us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0b_7x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0c_1x7/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2710us 1 4.2710us 4.2710us 4.2710us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0c_1x7/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0c_1x7/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.0630us 1 4.0630us 4.0630us 4.0630us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0c_1x7/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0d_7x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.5310us 1 4.5310us 4.5310us 4.5310us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0d_7x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0d_7x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2710us 1 4.2710us 4.2710us 4.2710us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0d_7x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0e_1x7/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.1140us 1 4.1140us 4.1140us 4.1140us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0e_1x7/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0e_1x7/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 3.9070us 1 3.9070us 3.9070us 3.9070us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0e_1x7/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.6880us 1 4.6880us 4.6880us 4.6880us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.1150us 1 4.1150us 4.1150us 4.1150us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2190us 1 4.2190us 4.2190us 4.2190us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.7920us 1 4.7920us 4.7920us 4.7920us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.5840us 1 4.5840us 4.5840us 4.5840us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.8440us 1 4.8440us 4.8440us 4.8440us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0b_1x7/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 6.5110us 1 6.5110us 6.5110us 6.5110us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0b_1x7/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0b_1x7/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.5830us 1 4.5830us 4.5830us 4.5830us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0b_1x7/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0c_7x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2710us 1 4.2710us 4.2710us 4.2710us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0c_7x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0c_7x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3230us 1 4.3230us 4.3230us 4.3230us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0c_7x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 3.9580us 1 3.9580us 3.9580us 3.9580us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.4270us 1 4.4270us 4.4270us 4.4270us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0b_7x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.5840us 1 4.5840us 4.5840us 4.5840us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0b_7x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0b_7x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 3.9580us 1 3.9580us 3.9580us 3.9580us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0b_7x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0c_1x7/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2190us 1 4.2190us 4.2190us 4.2190us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0c_1x7/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0c_1x7/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2700us 1 4.2700us 4.2700us 4.2700us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0c_1x7/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0d_7x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3230us 1 4.3230us 4.3230us 4.3230us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0d_7x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0d_7x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 3.9060us 1 3.9060us 3.9060us 3.9060us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0d_7x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0e_1x7/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.1150us 1 4.1150us 4.1150us 4.1150us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0e_1x7/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0e_1x7/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2710us 1 4.2710us 4.2710us 4.2710us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0e_1x7/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.5320us 1 4.5320us 4.5320us 4.5320us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.4790us 1 4.4790us 4.4790us 4.4790us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.6360us 1 4.6360us 4.6360us 4.6360us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3230us 1 4.3230us 4.3230us 4.3230us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2710us 1 4.2710us 4.2710us 4.2710us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.1660us 1 4.1660us 4.1660us 4.1660us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0b_1x7/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.1150us 1 4.1150us 4.1150us 4.1150us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0b_1x7/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0b_1x7/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 5.0520us 1 5.0520us 5.0520us 5.0520us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0b_1x7/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0c_7x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.7390us 1 4.7390us 4.7390us 4.7390us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0c_7x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0c_7x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.6870us 1 4.6870us 4.6870us 4.6870us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0c_7x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.5320us 1 4.5320us 4.5320us 4.5320us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 5.3130us 1 5.3130us 5.3130us 5.3130us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0b_7x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3230us 1 4.3230us 4.3230us 4.3230us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0b_7x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0b_7x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3750us 1 4.3750us 4.3750us 4.3750us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0b_7x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0c_1x7/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3230us 1 4.3230us 4.3230us 4.3230us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0c_1x7/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0c_1x7/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.1670us 1 4.1670us 4.1670us 4.1670us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0c_1x7/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0d_7x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.1140us 1 4.1140us 4.1140us 4.1140us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0d_7x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0d_7x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3760us 1 4.3760us 4.3760us 4.3760us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0d_7x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0e_1x7/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.6360us 1 4.6360us 4.6360us 4.6360us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0e_1x7/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0e_1x7/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.1670us 1 4.1670us 4.1670us 4.1670us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0e_1x7/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3750us 1 4.3750us 4.3750us 4.3750us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.0620us 1 4.0620us 4.0620us 4.0620us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.0630us 1 4.0630us 4.0630us 4.0630us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.4270us 1 4.4270us 4.4270us 4.4270us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_0/Conv2d_1a_3x3/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.4270us 1 4.4270us 4.4270us 4.4270us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_0/Conv2d_1a_3x3/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_0/Conv2d_1a_3x3/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 3.9060us 1 3.9060us 3.9060us 3.9060us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_0/Conv2d_1a_3x3/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.5830us 1 4.5830us 4.5830us 4.5830us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.1660us 1 4.1660us 4.1660us 4.1660us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0b_1x7/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.6870us 1 4.6870us 4.6870us 4.6870us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0b_1x7/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0b_1x7/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.7910us 1 4.7910us 4.7910us 4.7910us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0b_1x7/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0c_7x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.6350us 1 4.6350us 4.6350us 4.6350us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0c_7x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0c_7x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.4280us 1 4.4280us 4.4280us 4.4280us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0c_7x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_1a_3x3/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.4800us 1 4.4800us 4.4800us 4.4800us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_1a_3x3/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_1a_3x3/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.5310us 1 4.5310us 4.5310us 4.5310us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_1a_3x3/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3230us 1 4.3230us 4.3230us 4.3230us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2190us 1 4.2190us 4.2190us 4.2190us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2710us 1 4.2710us 4.2710us 4.2710us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.5830us 1 4.5830us 4.5830us 4.5830us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_1/Conv2d_0b_1x3/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2190us 1 4.2190us 4.2190us 4.2190us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_1/Conv2d_0b_1x3/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_1/Conv2d_0b_1x3/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2710us 1 4.2710us 4.2710us 4.2710us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_1/Conv2d_0b_1x3/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_1/Conv2d_0b_3x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.4790us 1 4.4790us 4.4790us 4.4790us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_1/Conv2d_0b_3x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_1/Conv2d_0b_3x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2190us 1 4.2190us 4.2190us 4.2190us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_1/Conv2d_0b_3x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2710us 1 4.2710us 4.2710us 4.2710us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3230us 1 4.3230us 4.3230us 4.3230us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.1150us 1 4.1150us 4.1150us 4.1150us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.7910us 1 4.7910us 4.7910us 4.7910us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0c_1x3/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.4790us 1 4.4790us 4.4790us 4.4790us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0c_1x3/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0c_1x3/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.1140us 1 4.1140us 4.1140us 4.1140us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0c_1x3/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0d_3x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.4800us 1 4.4800us 4.4800us 4.4800us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0d_3x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0d_3x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.1140us 1 4.1140us 4.1140us 4.1140us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0d_3x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2190us 1 4.2190us 4.2190us 4.2190us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.5320us 1 4.5320us 4.5320us 4.5320us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.1150us 1 4.1150us 4.1150us 4.1150us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.7400us 1 4.7400us 4.7400us 4.7400us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 5.8340us 1 5.8340us 5.8340us 5.8340us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.5320us 1 4.5320us 4.5320us 4.5320us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_1/Conv2d_0b_1x3/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2190us 1 4.2190us 4.2190us 4.2190us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_1/Conv2d_0b_1x3/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_1/Conv2d_0b_1x3/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2190us 1 4.2190us 4.2190us 4.2190us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_1/Conv2d_0b_1x3/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_1/Conv2d_0c_3x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.0110us 1 4.0110us 4.0110us 4.0110us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_1/Conv2d_0c_3x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_1/Conv2d_0c_3x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.0630us 1 4.0630us 4.0630us 4.0630us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_1/Conv2d_0c_3x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 3.9590us 1 3.9590us 3.9590us 3.9590us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.2710us 1 4.2710us 4.2710us 4.2710us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 5.3640us 1 5.3640us 5.3640us 5.3640us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 5.4690us 1 5.4690us 5.4690us 5.4690us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0c_1x3/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.4270us 1 4.4270us 4.4270us 4.4270us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0c_1x3/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0c_1x3/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.1150us 1 4.1150us 4.1150us 4.1150us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0c_1x3/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0d_3x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.5310us 1 4.5310us 4.5310us 4.5310us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0d_3x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0d_3x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.5320us 1 4.5320us 4.5320us 4.5320us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0d_3x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3750us 1 4.3750us 4.3750us 4.3750us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3760us 1 4.3760us 4.3760us 4.3760us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm/Mul-1-ReshapeNHWCToNCHW-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/concat-6-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 7.2390us 1 7.2390us 7.2390us 7.2390us Const: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/concat-6-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/InceptionV3/Logits/GlobalPool-1-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.9480us 1 4.9480us 4.9480us 4.9480us Const: import/module_apply_default/InceptionV3/Logits/GlobalPool-1-LayoutOptimizer
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/hub_input/Mul/y"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 5.1040us 1 5.1040us 5.1040us 5.1040us Const: import/module_apply_default/hub_input/Mul/y
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: import/module_apply_default/hub_input/Sub/y"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.5320us 1 4.5320us 4.5320us 4.5320us Const: import/module_apply_default/hub_input/Sub/y
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: jpeg_reader/_0__cf__0"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 5.9890us 1 5.9890us 5.9890us 5.9890us Const: jpeg_reader/_0__cf__0
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_1a_3x3/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 23.3417s 1 23.3417s 23.3417s 23.3417s Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_1a_3x3/Conv2D
GPU activities: 96.05% 720.34ms 100 7.2034ms 2.1128ms 25.076ms maxwell_gcgemm_64x64_nt
1.01% 7.5676ms 100 75.676us 23.647us 260.27us void fft2d_c2r_32x32<float, bool=1, bool=0, unsigned int=0, bool=0, bool=0>(float*, float2 const *, int, int, int, int, int, int, int, int, int, float, float, cudnn::reduced_divisor, bool, float*, float*, int2, int, int)
0.98% 7.3396ms 2 3.6698ms 1.0677ms 6.2719ms void cudnn::detail::explicit_convolve_sgemm<float, int, int=1024, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=1024, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
0.75% 5.6555ms 1 5.6555ms 5.6555ms 5.6555ms void cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
0.50% 3.7298ms 100 37.298us 17.137us 112.56us void fft2d_r2c_32x32<float, bool=0, unsigned int=0, bool=0>(float2*, float const *, int, int, int, int, int, int, int, int, int, cudnn::reduced_divisor, bool, int2, int, int)
0.37% 2.8095ms 1 2.8095ms 2.8095ms 2.8095ms maxwell_scudnn_128x32_relu_interior_nn
0.22% 1.6247ms 2 812.34us 298.13us 1.3265ms void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
0.10% 733.51us 1 733.51us 733.51us 733.51us void fft2d_r2c_32x32<float, bool=0, unsigned int=1, bool=1>(float2*, float const *, int, int, int, int, int, int, int, int, int, cudnn::reduced_divisor, bool, int2, int, int)
0.02% 136.20us 1 136.20us 136.20us 136.20us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
0.00% 19.063us 1 19.063us 19.063us 19.063us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.00% 12.605us 4 3.1510us 2.2390us 5.8340us [CUDA memset]
0.00% 3.1250us 1 3.1250us 3.1250us 3.1250us [CUDA memcpy HtoD]
API calls: 96.68% 464.83ms 309 1.5043ms 32.396us 265.07ms cudaLaunchKernel
3.25% 15.645ms 1 15.645ms 15.645ms 15.645ms cudaMemcpy
0.06% 308.71us 4 77.176us 27.188us 214.33us cudaMemsetAsync
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_2a_3x3/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 683.19ms 1 683.19ms 683.19ms 683.19ms Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_2a_3x3/Conv2D
GPU activities: 86.03% 92.779ms 25 3.7112ms 3.6845ms 3.7578ms maxwell_gcgemm_64x64_nt
4.89% 5.2695ms 2 2.6348ms 2.6222ms 2.6473ms maxwell_scudnn_128x32_relu_interior_nn
4.79% 5.1656ms 1 5.1656ms 5.1656ms 5.1656ms void cudnn::detail::implicit_convolve_sgemm<float, float, int=128, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=128, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
1.79% 1.9352ms 1 1.9352ms 1.9352ms 1.9352ms maxwell_scudnn_winograd_128x128_ldg1_ldg4_relu_tile148m_nt_v1
0.93% 1.0024ms 25 40.096us 26.979us 47.553us void fft2d_c2r_32x32<float, bool=0, bool=0, unsigned int=0, bool=0, bool=0>(float*, float2 const *, int, int, int, int, int, int, int, int, int, float, float, cudnn::reduced_divisor, bool, float*, float*, int2, int, int)
0.88% 945.75us 25 37.830us 31.720us 47.970us void fft2d_r2c_32x32<float, bool=0, unsigned int=0, bool=0>(float2*, float const *, int, int, int, int, int, int, int, int, int, cudnn::reduced_divisor, bool, int2, int, int)
0.62% 665.85us 1 665.85us 665.85us 665.85us void fft2d_r2c_32x32<float, bool=0, unsigned int=1, bool=1>(float2*, float const *, int, int, int, int, int, int, int, int, int, cudnn::reduced_divisor, bool, int2, int, int)
0.04% 47.137us 2 23.568us 23.491us 23.646us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
0.02% 17.136us 1 17.136us 17.136us 17.136us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.01% 14.272us 1 14.272us 14.272us 14.272us void cudnn::winograd::generateWinogradTilesKernel<int=1, float, float>(cudnn::winograd::GenerateWinogradTilesParams<float, float>)
API calls: 100.00% 8.1069ms 84 96.510us 35.938us 313.76us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_2b_3x3/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 87.886ms 1 87.886ms 87.886ms 87.886ms Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_2b_3x3/Conv2D
GPU activities: 69.32% 60.677ms 25 2.4271ms 2.4070ms 2.4622ms maxwell_gcgemm_32x32_nt
10.89% 9.5360ms 1 9.5360ms 9.5360ms 9.5360ms void cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
9.57% 8.3785ms 2 4.1892ms 4.1754ms 4.2031ms maxwell_scudnn_winograd_128x128_ldg1_ldg4_relu_tile148m_nt_v1
5.36% 4.6935ms 1 4.6935ms 4.6935ms 4.6935ms maxwell_scudnn_128x64_relu_small_nn
1.93% 1.6871ms 25 67.485us 60.939us 74.221us void fft2d_c2r_32x32<float, bool=0, bool=0, unsigned int=0, bool=0, bool=0>(float*, float2 const *, int, int, int, int, int, int, int, int, int, float, float, cudnn::reduced_divisor, bool, float*, float*, int2, int, int)
1.65% 1.4427ms 1 1.4427ms 1.4427ms 1.4427ms void fft2d_r2c_32x32<float, bool=0, unsigned int=1, bool=1>(float2*, float const *, int, int, int, int, int, int, int, int, int, cudnn::reduced_divisor, bool, int2, int, int)
1.17% 1.0216ms 25 40.865us 33.855us 53.751us void fft2d_r2c_32x32<float, bool=0, unsigned int=0, bool=0>(float2*, float const *, int, int, int, int, int, int, int, int, int, cudnn::reduced_divisor, bool, int2, int, int)
0.05% 44.272us 2 22.136us 20.886us 23.386us void cudnn::winograd::generateWinogradTilesKernel<int=1, float, float>(cudnn::winograd::GenerateWinogradTilesParams<float, float>)
0.03% 29.583us 1 29.583us 29.583us 29.583us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.03% 23.334us 1 23.334us 23.334us 23.334us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 4.1807ms 84 49.769us 39.376us 165.32us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_3b_1x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 9.2120ms 1 9.2120ms 9.2120ms 9.2120ms Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_3b_1x1/Conv2D
GPU activities: 32.82% 2.3707ms 3 790.23us 789.76us 791.01us maxwell_gcgemm_64x64_nt
24.29% 1.7544ms 2 877.19us 870.39us 883.98us void cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=6, int=7, int=3, int=3, int=5, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=6, int=7, int=3, int=3, int=5, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
13.32% 962.27us 1 962.27us 962.27us 962.27us void cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
9.39% 678.51us 3 226.17us 221.00us 230.47us void fft1d_r2c_32<float, float, float2, bool=0, bool=0>(float2*, float const *, int, int3, int3, int2, int2)
9.24% 667.26us 1 667.26us 667.26us 667.26us maxwell_scudnn_128x32_relu_interior_nn
5.12% 369.91us 3 123.30us 93.076us 140.52us void fft1d_c2r_32<float2, float, float, bool=0, bool=1, bool=0, bool=0>(float*, float2 const *, int, int3, int3, int2, int, float, float, float*, float*)
3.34% 241.36us 1 241.36us 241.36us 241.36us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
2.24% 162.04us 1 162.04us 162.04us 162.04us void fft1d_r2c_32<float, float, float2, bool=1, bool=0>(float2*, float const *, int, int3, int3, int2, int2)
0.13% 9.6350us 1 9.6350us 9.6350us 9.6350us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.10% 6.9270us 1 6.9270us 6.9270us 6.9270us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 1.8232ms 17 107.25us 62.501us 285.11us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_4a_3x3/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 45.269ms 1 45.269ms 45.269ms 45.269ms Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_4a_3x3/Conv2D
GPU activities: 31.71% 15.439ms 1 15.439ms 15.439ms 15.439ms void cudnn::detail::implicit_convolve_sgemm<float, float, int=512, int=6, int=8, int=3, int=3, int=5, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=512, int=6, int=8, int=3, int=3, int=5, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
24.34% 11.849ms 1 11.849ms 11.849ms 11.849ms void cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
24.33% 11.843ms 2 5.9214ms 5.8865ms 5.9564ms maxwell_scudnn_winograd_128x128_ldg1_ldg4_relu_tile148m_nt_v1
15.69% 7.6406ms 1 7.6406ms 7.6406ms 7.6406ms maxwell_scudnn_128x64_relu_interior_nn
3.08% 1.5008ms 1 1.5008ms 1.5008ms 1.5008ms void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
0.43% 206.99us 1 206.99us 206.99us 206.99us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.41% 197.40us 2 98.699us 95.158us 102.24us void cudnn::winograd::generateWinogradTilesKernel<int=1, float, float>(cudnn::winograd::GenerateWinogradTilesParams<float, float>)
0.01% 6.5620us 1 6.5620us 6.5620us 6.5620us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 863.72us 10 86.372us 66.408us 168.86us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_0/Conv2d_0a_1x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 671.63us 1 671.63us 671.63us 671.63us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_0/Conv2d_0a_1x1/Conv2D
GPU activities: 91.15% 217.77us 1 217.77us 217.77us 217.77us maxwell_scudnn_128x64_relu_interior_nn
7.94% 18.959us 1 18.959us 18.959us 18.959us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.92% 2.1880us 1 2.1880us 2.1880us 2.1880us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 337.41us 3 112.47us 105.26us 120.58us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_1/Conv2d_0a_1x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.9128ms 1 4.9128ms 4.9128ms 4.9128ms Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_1/Conv2d_0a_1x1/Conv2D
GPU activities: 32.62% 1.1350ms 2 567.52us 562.41us 572.62us maxwell_gcgemm_32x32_nt
17.41% 605.90us 2 302.95us 297.09us 308.81us void fft1d_r2c_32<float, float, float2, bool=0, bool=0>(float2*, float const *, int, int3, int3, int2, int2)
12.66% 440.58us 2 220.29us 211.78us 228.81us maxwell_scudnn_128x64_relu_interior_nn
12.23% 425.53us 1 425.53us 425.53us 425.53us void cudnn::detail::implicit_convolve_sgemm<float, float, int=128, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=128, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
8.74% 304.02us 1 304.02us 304.02us 304.02us void cudnn::detail::explicit_convolve_sgemm<float, int, int=1024, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=1024, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
8.24% 286.67us 1 286.67us 286.67us 286.67us void fft1d_r2c_32<float, float, float2, bool=1, bool=0>(float2*, float const *, int, int3, int3, int2, int2)
5.24% 182.45us 1 182.45us 182.45us 182.45us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
2.24% 77.815us 2 38.907us 31.668us 46.147us void fft1d_c2r_32<float2, float, float, bool=0, bool=1, bool=0, bool=0>(float*, float2 const *, int, int3, int3, int2, int, float, float, float*, float*)
0.44% 15.365us 1 15.365us 15.365us 15.365us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.18% 6.3530us 2 3.1760us 3.0720us 3.2810us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 1.2925ms 15 86.165us 42.813us 235.27us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_1/Conv2d_0b_5x5/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 23.501ms 1 23.501ms 23.501ms 23.501ms Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_1/Conv2d_0b_5x5/Conv2D
GPU activities: 56.20% 12.454ms 4 3.1134ms 3.0809ms 3.1324ms maxwell_gcgemm_32x32_nt
10.59% 2.3476ms 1 2.3476ms 2.3476ms 2.3476ms void fft2d_r2c_32x32<float, bool=0, unsigned int=5, bool=1>(float2*, float const *, int, int, int, int, int, int, int, int, int, cudnn::reduced_divisor, bool, int2, int, int)
10.48% 2.3221ms 2 1.1610ms 1.1577ms 1.1644ms maxwell_scudnn_128x64_relu_small_nn
10.17% 2.2529ms 1 2.2529ms 2.2529ms 2.2529ms void cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
7.55% 1.6730ms 1 1.6730ms 1.6730ms 1.6730ms void cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
2.56% 567.20us 1 567.20us 567.20us 567.20us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
1.05% 231.78us 4 57.944us 49.220us 67.242us void fft2d_c2r_32x32<float, bool=0, bool=0, unsigned int=0, bool=0, bool=0>(float*, float2 const *, int, int, int, int, int, int, int, int, int, float, float, cudnn::reduced_divisor, bool, float*, float*, int2, int, int)
0.85% 187.92us 4 46.980us 34.011us 66.721us void fft2d_r2c_32x32<float, bool=0, unsigned int=0, bool=0>(float2*, float const *, int, int, int, int, int, int, int, int, int, cudnn::reduced_divisor, bool, int2, int, int)
0.53% 116.83us 1 116.83us 116.83us 116.83us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.03% 6.1460us 2 3.0730us 3.0210us 3.1250us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 1.9908ms 21 94.801us 48.855us 260.16us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0a_1x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 5.2700ms 1 5.2700ms 5.2700ms 5.2700ms Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0a_1x1/Conv2D
GPU activities: 31.92% 1.1554ms 2 577.72us 574.18us 581.27us maxwell_gcgemm_32x32_nt
16.66% 602.99us 2 301.49us 287.30us 315.69us void fft1d_r2c_32<float, float, float2, bool=0, bool=0>(float2*, float const *, int, int3, int3, int2, int2)
12.48% 451.78us 2 225.89us 222.19us 229.59us maxwell_scudnn_128x64_relu_interior_nn
11.02% 398.81us 1 398.81us 398.81us 398.81us void fft1d_r2c_32<float, float, float2, bool=1, bool=0>(float2*, float const *, int, int3, int3, int2, int2)
10.81% 391.21us 1 391.21us 391.21us 391.21us void cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
8.86% 320.58us 1 320.58us 320.58us 320.58us void cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
5.04% 182.45us 1 182.45us 182.45us 182.45us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
2.52% 91.200us 2 45.600us 37.293us 53.907us void fft1d_c2r_32<float2, float, float, bool=0, bool=1, bool=0, bool=0>(float*, float2 const *, int, int3, int3, int2, int, float, float, float*, float*)
0.54% 19.688us 1 19.688us 19.688us 19.688us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.17% 5.9900us 2 2.9950us 2.9690us 3.0210us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 1.0653ms 15 71.022us 41.563us 152.14us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0b_3x3/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 6.8246ms 1 6.8246ms 6.8246ms 6.8246ms Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0b_3x3/Conv2D
GPU activities: 27.52% 1.6550ms 2 827.52us 819.97us 835.07us maxwell_scudnn_winograd_128x128_ldg1_ldg4_relu_tile148m_nt_v1
27.14% 1.6322ms 1 1.6322ms 1.6322ms 1.6322ms void cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
20.95% 1.2600ms 1 1.2600ms 1.2600ms 1.2600ms void cudnn::detail::explicit_convolve_sgemm<float, int, int=1024, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=1024, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
16.04% 964.71us 1 964.71us 964.71us 964.71us maxwell_scudnn_128x32_relu_small_nn
5.39% 324.33us 1 324.33us 324.33us 324.33us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
1.47% 88.337us 1 88.337us 88.337us 88.337us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
1.43% 86.096us 2 43.048us 39.324us 46.772us void cudnn::winograd::generateWinogradTilesKernel<int=1, float, float>(cudnn::winograd::GenerateWinogradTilesParams<float, float>)
0.05% 3.1250us 1 3.1250us 3.1250us 3.1250us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 1.0460ms 10 104.60us 73.491us 208.03us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0c_3x3/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 9.8977ms 1 9.8977ms 9.8977ms 9.8977ms Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0c_3x3/Conv2D
GPU activities: 28.13% 2.4148ms 1 2.4148ms 2.4148ms 2.4148ms void cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
26.65% 2.2878ms 2 1.1439ms 1.1399ms 1.1479ms maxwell_scudnn_winograd_128x128_ldg1_ldg4_relu_tile148m_nt_v1
20.31% 1.7434ms 1 1.7434ms 1.7434ms 1.7434ms void cudnn::detail::explicit_convolve_sgemm<float, int, int=1024, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=1024, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
16.22% 1.3928ms 1 1.3928ms 1.3928ms 1.3928ms maxwell_scudnn_128x32_relu_small_nn
5.74% 492.93us 1 492.93us 492.93us 492.93us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
1.52% 130.42us 1 130.42us 130.42us 130.42us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
1.40% 120.11us 2 60.053us 56.200us 63.907us void cudnn::winograd::generateWinogradTilesKernel<int=1, float, float>(cudnn::winograd::GenerateWinogradTilesParams<float, float>)
0.03% 2.8650us 1 2.8650us 2.8650us 2.8650us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 1.4621ms 10 146.21us 76.147us 381.83us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_3/Conv2d_0b_1x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.1417ms 1 4.1417ms 4.1417ms 4.1417ms Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_3/Conv2d_0b_1x1/Conv2D
GPU activities: 25.85% 639.86us 2 319.93us 305.89us 333.97us void fft1d_r2c_32<float, float, float2, bool=0, bool=0>(float2*, float const *, int, int3, int3, int2, int2)
25.78% 638.20us 2 319.10us 315.48us 322.72us maxwell_gcgemm_32x32_nt
11.09% 274.49us 2 137.24us 137.19us 137.30us maxwell_scudnn_128x32_relu_interior_nn
9.96% 246.52us 1 246.52us 246.52us 246.52us void fft1d_r2c_32<float, float, float2, bool=1, bool=0>(float2*, float const *, int, int3, int3, int2, int2)
9.60% 237.61us 1 237.61us 237.61us 237.61us void cudnn::detail::implicit_convolve_sgemm<float, float, int=128, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=128, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
7.46% 184.54us 1 184.54us 184.54us 184.54us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
7.20% 178.13us 1 178.13us 178.13us 178.13us void cudnn::detail::explicit_convolve_sgemm<float, int, int=1024, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=1024, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
2.45% 60.625us 2 30.312us 20.885us 39.740us void fft1d_c2r_32<float2, float, float, bool=0, bool=1, bool=0, bool=0>(float*, float2 const *, int, int3, int3, int2, int, float, float, float*, float*)
0.39% 9.5310us 1 9.5310us 9.5310us 9.5310us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.24% 5.8330us 2 2.9160us 2.9160us 2.9170us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 1.4001ms 15 93.339us 48.647us 217.92us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_0/Conv2d_0a_1x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 386.31us 1 386.31us 386.31us 386.31us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_0/Conv2d_0a_1x1/Conv2D
GPU activities: 91.85% 288.97us 1 288.97us 288.97us 288.97us maxwell_scudnn_128x64_relu_interior_nn
7.48% 23.541us 1 23.541us 23.541us 23.541us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.66% 2.0830us 1 2.0830us 2.0830us 2.0830us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 188.76us 3 62.918us 51.981us 73.023us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_1/Conv2d_0b_1x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 5.5533ms 1 5.5533ms 5.5533ms 5.5533ms Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_1/Conv2d_0b_1x1/Conv2D
GPU activities: 32.92% 1.4740ms 2 737.00us 734.92us 739.08us maxwell_gcgemm_32x32_nt
18.47% 827.11us 2 413.55us 396.42us 430.69us void fft1d_r2c_32<float, float, float2, bool=0, bool=0>(float2*, float const *, int, int3, int3, int2, int2)
12.88% 576.94us 2 288.47us 285.01us 291.94us maxwell_scudnn_128x64_relu_interior_nn
10.92% 488.87us 1 488.87us 488.87us 488.87us void cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
8.59% 384.59us 1 384.59us 384.59us 384.59us void cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
8.53% 381.78us 1 381.78us 381.78us 381.78us void fft1d_r2c_32<float, float, float2, bool=1, bool=0>(float2*, float const *, int, int3, int3, int2, int2)
5.43% 242.92us 1 242.92us 242.92us 242.92us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
1.73% 77.399us 2 38.699us 30.001us 47.398us void fft1d_c2r_32<float2, float, float, bool=0, bool=1, bool=0, bool=0>(float*, float2 const *, int, int3, int3, int2, int, float, float, float*, float*)
0.41% 18.333us 1 18.333us 18.333us 18.333us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.13% 5.7310us 2 2.8650us 2.7090us 3.0220us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 1.0755ms 15 71.699us 43.490us 179.22us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_1/Conv_1_0c_5x5/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 346.52us 1 346.52us 346.52us 346.52us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_1/Conv_1_0c_5x5/Conv2D
GPU activities: 90.74% 1.1594ms 1 1.1594ms 1.1594ms 1.1594ms maxwell_scudnn_128x64_relu_small_nn
9.04% 115.47us 1 115.47us 115.47us 115.47us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.22% 2.8650us 1 2.8650us 2.8650us 2.8650us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 183.55us 3 61.182us 56.720us 67.919us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0a_1x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 9.1060ms 1 9.1060ms 9.1060ms 9.1060ms Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0a_1x1/Conv2D
GPU activities: 31.66% 1.5129ms 2 756.43us 747.73us 765.13us maxwell_gcgemm_32x32_nt
17.23% 823.41us 2 411.70us 400.48us 422.93us void fft1d_r2c_32<float, float, float2, bool=0, bool=0>(float2*, float const *, int, int3, int3, int2, int2)
12.39% 592.15us 2 296.08us 294.70us 297.46us maxwell_scudnn_128x64_relu_interior_nn
10.90% 521.00us 1 521.00us 521.00us 521.00us void fft1d_r2c_32<float, float, float2, bool=1, bool=0>(float2*, float const *, int, int3, int3, int2, int2)
10.79% 515.54us 1 515.54us 515.54us 515.54us void cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
9.01% 430.64us 1 430.64us 430.64us 430.64us void cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
5.31% 253.86us 1 253.86us 253.86us 253.86us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
2.04% 97.451us 2 48.725us 38.907us 58.544us void fft1d_c2r_32<float2, float, float, bool=0, bool=1, bool=0, bool=0>(float*, float2 const *, int, int3, int3, int2, int, float, float, float*, float*)
0.54% 25.677us 1 25.677us 25.677us 25.677us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.13% 6.2520us 2 3.1260us 2.9180us 3.3340us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 2.0446ms 15 136.31us 65.783us 316.52us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0b_3x3/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 451.00us 1 451.00us 451.00us 451.00us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0b_3x3/Conv2D
GPU activities: 85.71% 803.09us 1 803.09us 803.09us 803.09us maxwell_scudnn_winograd_128x128_ldg1_ldg4_relu_tile148m_nt_v1
9.34% 87.554us 1 87.554us 87.554us 87.554us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
4.95% 46.356us 1 46.356us 46.356us 46.356us void cudnn::winograd::generateWinogradTilesKernel<int=1, float, float>(cudnn::winograd::GenerateWinogradTilesParams<float, float>)
API calls: 100.00% 190.47us 3 63.491us 61.668us 66.564us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0c_3x3/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 290.48us 1 290.48us 290.48us 290.48us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0c_3x3/Conv2D
GPU activities: 85.94% 1.1463ms 1 1.1463ms 1.1463ms 1.1463ms maxwell_scudnn_winograd_128x128_ldg1_ldg4_relu_tile148m_nt_v1
9.36% 124.85us 1 124.85us 124.85us 124.85us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
4.70% 62.658us 1 62.658us 62.658us 62.658us void cudnn::winograd::generateWinogradTilesKernel<int=1, float, float>(cudnn::winograd::GenerateWinogradTilesParams<float, float>)
API calls: 100.00% 164.07us 3 54.689us 47.501us 60.366us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_3/Conv2d_0b_1x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 388.55us 1 388.55us 388.55us 388.55us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_3/Conv2d_0b_1x1/Conv2D
GPU activities: 91.99% 280.63us 1 280.63us 280.63us 280.63us maxwell_scudnn_128x64_relu_interior_nn
7.32% 22.345us 1 22.345us 22.345us 22.345us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.68% 2.0840us 1 2.0840us 2.0840us 2.0840us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 188.70us 3 62.901us 58.595us 69.325us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_0/Conv2d_0a_1x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 530.48us 1 530.48us 530.48us 530.48us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_0/Conv2d_0a_1x1/Conv2D
GPU activities: 91.43% 322.09us 1 322.09us 322.09us 322.09us maxwell_scudnn_128x64_relu_interior_nn
7.75% 27.291us 1 27.291us 27.291us 27.291us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.83% 2.9180us 1 2.9180us 2.9180us 2.9180us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 211.31us 3 70.435us 55.991us 81.512us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_1/Conv2d_0a_1x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 6.3434ms 1 6.3434ms 6.3434ms 6.3434ms Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_1/Conv2d_0a_1x1/Conv2D
GPU activities: 31.24% 1.6471ms 2 823.56us 821.12us 826.01us maxwell_gcgemm_32x32_nt
22.39% 1.1808ms 2 590.38us 588.92us 591.84us void cudnn::detail::implicit_convolve_sgemm<float, float, int=128, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=128, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
17.55% 925.13us 2 462.56us 441.89us 483.24us void fft1d_r2c_32<float, float, float2, bool=0, bool=0>(float2*, float const *, int, int3, int3, int2, int2)
8.05% 424.65us 1 424.65us 424.65us 424.65us void fft1d_r2c_32<float, float, float2, bool=1, bool=0>(float2*, float const *, int, int3, int3, int2, int2)
7.96% 419.44us 1 419.44us 419.44us 419.44us void cudnn::detail::explicit_convolve_sgemm<float, int, int=1024, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=1024, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
5.98% 315.32us 1 315.32us 315.32us 315.32us maxwell_scudnn_128x64_relu_interior_nn
4.94% 260.42us 1 260.42us 260.42us 260.42us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
1.45% 76.409us 2 38.204us 31.251us 45.158us void fft1d_c2r_32<float2, float, float, bool=0, bool=1, bool=0, bool=0>(float*, float2 const *, int, int3, int3, int2, int, float, float, float*, float*)
0.38% 20.208us 1 20.208us 20.208us 20.208us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.06% 3.1250us 1 3.1250us 3.1250us 3.1250us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 971.43us 14 69.387us 42.553us 147.66us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_1/Conv2d_0b_5x5/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 322.25us 1 322.25us 322.25us 322.25us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_1/Conv2d_0b_5x5/Conv2D
GPU activities: 90.68% 1.1483ms 1 1.1483ms 1.1483ms 1.1483ms maxwell_scudnn_128x64_relu_small_nn
9.07% 114.80us 1 114.80us 114.80us 114.80us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.25% 3.1770us 1 3.1770us 3.1770us 3.1770us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 168.55us 3 56.182us 51.824us 62.085us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0a_1x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 9.7977ms 1 9.7977ms 9.7977ms 9.7977ms Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0a_1x1/Conv2D
GPU activities: 31.90% 1.6897ms 2 844.84us 841.06us 848.62us maxwell_gcgemm_32x32_nt
17.21% 911.53us 2 455.77us 439.80us 471.73us void fft1d_r2c_32<float, float, float2, bool=0, bool=0>(float2*, float const *, int, int3, int3, int2, int2)
12.15% 643.30us 2 321.65us 319.18us 324.12us maxwell_scudnn_128x64_relu_interior_nn
11.48% 607.78us 1 607.78us 607.78us 607.78us void cudnn::detail::implicit_convolve_sgemm<float, float, int=128, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=128, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
11.04% 584.96us 1 584.96us 584.96us 584.96us void fft1d_r2c_32<float, float, float2, bool=1, bool=0>(float2*, float const *, int, int3, int3, int2, int2)
8.69% 460.17us 1 460.17us 460.17us 460.17us void cudnn::detail::explicit_convolve_sgemm<float, int, int=1024, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=1024, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
4.95% 262.14us 1 262.14us 262.14us 262.14us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
1.94% 102.92us 2 51.459us 37.657us 65.262us void fft1d_c2r_32<float2, float, float, bool=0, bool=1, bool=0, bool=0>(float*, float2 const *, int, int3, int3, int2, int, float, float, float*, float*)
0.52% 27.657us 1 27.657us 27.657us 27.657us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.12% 6.2500us 2 3.1250us 3.0730us 3.1770us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 1.1310ms 15 75.397us 47.084us 168.03us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0b_3x3/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 353.50us 1 353.50us 353.50us 353.50us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0b_3x3/Conv2D
GPU activities: 86.38% 835.23us 1 835.23us 835.23us 835.23us maxwell_scudnn_winograd_128x128_ldg1_ldg4_relu_tile148m_nt_v1
8.89% 85.991us 1 85.991us 85.991us 85.991us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
4.72% 45.679us 1 45.679us 45.679us 45.679us void cudnn::winograd::generateWinogradTilesKernel<int=1, float, float>(cudnn::winograd::GenerateWinogradTilesParams<float, float>)
API calls: 100.00% 183.34us 3 61.113us 57.866us 64.846us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0c_3x3/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 312.09us 1 312.09us 312.09us 312.09us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0c_3x3/Conv2D
GPU activities: 85.60% 1.1104ms 1 1.1104ms 1.1104ms 1.1104ms maxwell_scudnn_winograd_128x128_ldg1_ldg4_relu_tile148m_nt_v1
9.52% 123.49us 1 123.49us 123.49us 123.49us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
4.88% 63.336us 1 63.336us 63.336us 63.336us void cudnn::winograd::generateWinogradTilesKernel<int=1, float, float>(cudnn::winograd::GenerateWinogradTilesParams<float, float>)
API calls: 100.00% 170.78us 3 56.928us 56.251us 58.282us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_3/Conv2d_0b_1x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 359.33us 1 359.33us 359.33us 359.33us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_3/Conv2d_0b_1x1/Conv2D
GPU activities: 92.11% 327.04us 1 327.04us 327.04us 327.04us maxwell_scudnn_128x64_relu_interior_nn
7.29% 25.887us 1 25.887us 25.887us 25.887us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.60% 2.1360us 1 2.1360us 2.1360us 2.1360us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 178.76us 3 59.585us 56.564us 64.325us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_0/Conv2d_1a_1x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 64.902ms 1 64.902ms 64.902ms 64.902ms Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_0/Conv2d_1a_1x1/Conv2D
GPU activities: 53.48% 14.790ms 2 7.3951ms 7.3335ms 7.4567ms void cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
19.45% 5.3783ms 1 5.3783ms 5.3783ms 5.3783ms void cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
14.01% 3.8745ms 1 3.8745ms 3.8745ms 3.8745ms maxwell_scudnn_128x128_relu_interior_nn
11.28% 3.1190ms 1 3.1190ms 3.1190ms 3.1190ms void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
1.77% 489.08us 1 489.08us 489.08us 489.08us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
0.01% 2.0830us 1 2.0830us 2.0830us 2.0830us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 995.91us 7 142.27us 85.575us 270.58us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_0a_1x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 271.41us 1 271.41us 271.41us 271.41us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_0a_1x1/Conv2D
GPU activities: 91.38% 328.45us 1 328.45us 328.45us 328.45us maxwell_scudnn_128x64_relu_interior_nn
7.81% 28.074us 1 28.074us 28.074us 28.074us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.81% 2.9170us 1 2.9170us 2.9170us 2.9170us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 163.23us 3 54.411us 53.231us 55.418us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_0b_3x3/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 484.54us 1 484.54us 484.54us 484.54us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_0b_3x3/Conv2D
GPU activities: 86.19% 833.88us 1 833.88us 833.88us 833.88us maxwell_scudnn_winograd_128x128_ldg1_ldg4_relu_tile148m_nt_v1
9.01% 87.139us 1 87.139us 87.139us 87.139us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
4.80% 46.459us 1 46.459us 46.459us 46.459us void cudnn::winograd::generateWinogradTilesKernel<int=1, float, float>(cudnn::winograd::GenerateWinogradTilesParams<float, float>)
API calls: 100.00% 213.18us 3 71.060us 68.127us 76.876us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_1a_1x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 11.033ms 1 11.033ms 11.033ms 11.033ms Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_1a_1x1/Conv2D
GPU activities: 38.23% 890.54us 2 445.27us 443.87us 446.68us maxwell_scudnn_128x32_relu_interior_nn
29.32% 683.09us 1 683.09us 683.09us 683.09us void cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
19.68% 458.45us 1 458.45us 458.45us 458.45us void cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
7.25% 168.81us 1 168.81us 168.81us 168.81us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
5.33% 124.22us 1 124.22us 124.22us 124.22us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.20% 4.5840us 2 2.2920us 2.1880us 2.3960us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 800.12us 8 100.02us 63.543us 174.07us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_0/Conv2d_0a_1x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 5.4108ms 1 5.4108ms 5.4108ms 5.4108ms Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_0/Conv2d_0a_1x1/Conv2D
GPU activities: 38.93% 1.5469ms 2 773.43us 773.04us 773.82us maxwell_scudnn_128x64_relu_interior_nn
29.89% 1.1878ms 1 1.1878ms 1.1878ms 1.1878ms void cudnn::detail::implicit_convolve_sgemm<float, float, int=128, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=128, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
20.24% 804.19us 1 804.19us 804.19us 804.19us void cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
5.79% 230.01us 1 230.01us 230.01us 230.01us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
5.03% 200.01us 1 200.01us 200.01us 200.01us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.12% 4.6880us 2 2.3440us 2.2920us 2.3960us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 1.1836ms 8 147.95us 68.803us 260.06us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0a_1x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 492.04us 1 492.04us 492.04us 492.04us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0a_1x1/Conv2D
GPU activities: 75.90% 429.44us 1 429.44us 429.44us 429.44us maxwell_scudnn_128x128_relu_interior_nn
23.81% 134.74us 1 134.74us 134.74us 134.74us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.29% 1.6160us 1 1.6160us 1.6160us 1.6160us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 241.31us 3 80.436us 68.804us 97.867us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0b_1x7/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 6.2226ms 1 6.2226ms 6.2226ms 6.2226ms Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0b_1x7/Conv2D
GPU activities: 22.82% 1.0328ms 2 516.39us 508.92us 523.87us maxwell_scudnn_128x128_relu_small_nn
21.21% 959.82us 1 959.82us 959.82us 959.82us void cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
16.07% 727.10us 1 727.10us 727.10us 727.10us maxwell_gcgemm_64x32_nt
15.00% 678.72us 1 678.72us 678.72us 678.72us void fft1d_r2c_32<float, float, float2, bool=1, bool=0>(float2*, float const *, int, int3, int3, int2, int2)
14.44% 653.72us 1 653.72us 653.72us 653.72us void cudnn::detail::explicit_convolve_sgemm<float, int, int=1024, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=1024, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
3.70% 167.56us 1 167.56us 167.56us 167.56us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
3.16% 142.92us 1 142.92us 142.92us 142.92us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
2.60% 117.50us 1 117.50us 117.50us 117.50us void fft1d_r2c_32<float, float, float2, bool=0, bool=0>(float2*, float const *, int, int3, int3, int2, int2)
0.90% 40.574us 1 40.574us 40.574us 40.574us void fft1d_c2r_32<float2, float, float, bool=0, bool=1, bool=0, bool=0>(float*, float2 const *, int, int3, int3, int2, int, float, float, float*, float*)
0.11% 4.8950us 2 2.4470us 2.3950us 2.5000us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 1.4060ms 12 117.16us 54.012us 483.14us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0c_7x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 8.0509ms 1 8.0509ms 8.0509ms 8.0509ms Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0c_7x1/Conv2D
GPU activities: 26.74% 1.7940ms 2 897.00us 894.71us 899.29us maxwell_scudnn_128x64_relu_small_nn
19.24% 1.2909ms 1 1.2909ms 1.2909ms 1.2909ms void cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
16.08% 1.0787ms 1 1.0787ms 1.0787ms 1.0787ms maxwell_gcgemm_64x32_nt
15.14% 1.0154ms 1 1.0154ms 1.0154ms 1.0154ms void fft1d_r2c_32<float, float, float2, bool=1, bool=0>(float2*, float const *, int, int3, int3, int2, int2)
14.41% 967.06us 1 967.06us 967.06us 967.06us void cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
3.68% 246.62us 1 246.62us 246.62us 246.62us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
2.68% 179.64us 1 179.64us 179.64us 179.64us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
1.17% 78.179us 1 78.179us 78.179us 78.179us void fft1d_r2c_32<float, float, float2, bool=0, bool=1>(float2*, float const *, int, int3, int3, int2, int2)
0.80% 53.386us 1 53.386us 53.386us 53.386us void fft1d_c2r_32<float2, float, float, bool=0, bool=1, bool=0, bool=1>(float*, float2 const *, int, int3, int3, int2, int, float, float, float*, float*)
0.07% 5.0000us 2 2.5000us 2.4480us 2.5520us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 1.0471ms 12 87.259us 52.293us 153.86us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0a_1x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 31.074ms 1 31.074ms 31.074ms 31.074ms Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0a_1x1/Conv2D
GPU activities: 39.37% 3.8466ms 1 3.8466ms 3.8466ms 3.8466ms maxwell_gcgemm_64x32_nt
28.07% 2.7425ms 1 2.7425ms 2.7425ms 2.7425ms void fft1d_r2c_32<float, float, float2, bool=1, bool=0>(float2*, float const *, int, int3, int3, int2, int2)
9.06% 885.34us 2 442.67us 440.74us 444.60us maxwell_scudnn_128x128_relu_interior_nn
7.95% 777.05us 1 777.05us 777.05us 777.05us void cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
5.82% 569.13us 1 569.13us 569.13us 569.13us void fft1d_r2c_32<float, float, float2, bool=0, bool=0>(float2*, float const *, int, int3, int3, int2, int2)
5.54% 541.37us 1 541.37us 541.37us 541.37us void cudnn::detail::explicit_convolve_sgemm<float, int, int=1024, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=1024, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
2.35% 229.64us 1 229.64us 229.64us 229.64us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
1.39% 135.79us 1 135.79us 135.79us 135.79us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.40% 38.959us 1 38.959us 38.959us 38.959us void fft1d_c2r_32<float2, float, float, bool=0, bool=1, bool=0, bool=0>(float*, float2 const *, int, int3, int3, int2, int, float, float, float*, float*)
0.05% 4.9990us 2 2.4990us 2.3950us 2.6040us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 1.2986ms 12 108.22us 58.751us 173.60us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0b_7x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 5.8295ms 1 5.8295ms 5.8295ms 5.8295ms Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0b_7x1/Conv2D
GPU activities: 23.50% 1.0403ms 2 520.14us 514.13us 526.16us maxwell_scudnn_128x128_relu_small_nn
19.62% 868.25us 1 868.25us 868.25us 868.25us void cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
16.44% 727.73us 1 727.73us 727.73us 727.73us maxwell_gcgemm_64x32_nt
15.48% 685.07us 1 685.07us 685.07us 685.07us void fft1d_r2c_32<float, float, float2, bool=1, bool=0>(float2*, float const *, int, int3, int3, int2, int2)
14.24% 630.28us 1 630.28us 630.28us 630.28us void cudnn::detail::explicit_convolve_sgemm<float, int, int=1024, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=1024, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
3.97% 175.84us 1 175.84us 175.84us 175.84us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
3.94% 174.49us 1 174.49us 174.49us 174.49us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
1.78% 78.856us 1 78.856us 78.856us 78.856us void fft1d_r2c_32<float, float, float2, bool=0, bool=1>(float2*, float const *, int, int3, int3, int2, int2)
0.92% 40.731us 1 40.731us 40.731us 40.731us void fft1d_c2r_32<float2, float, float, bool=0, bool=1, bool=0, bool=1>(float*, float2 const *, int, int3, int3, int2, int, float, float, float*, float*)
0.11% 4.8960us 2 2.4480us 2.3440us 2.5520us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 1.3362ms 12 111.35us 58.855us 292.25us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0c_1x7/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 395.22us 1 395.22us 395.22us 395.22us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0c_1x7/Conv2D
GPU activities: 74.71% 512.20us 1 512.20us 512.20us 512.20us maxwell_scudnn_128x128_relu_small_nn
24.96% 171.15us 1 171.15us 171.15us 171.15us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.33% 2.2390us 1 2.2390us 2.2390us 2.2390us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 210.63us 3 70.209us 57.501us 81.095us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0d_7x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 899.19us 1 899.19us 899.19us 899.19us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0d_7x1/Conv2D
GPU activities: 74.75% 522.20us 1 522.20us 522.20us 522.20us maxwell_scudnn_128x128_relu_small_nn
24.92% 174.07us 1 174.07us 174.07us 174.07us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.34% 2.3440us 1 2.3440us 2.3440us 2.3440us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 684.60us 3 228.20us 69.377us 540.22us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0e_1x7/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 7.4737ms 1 7.4737ms 7.4737ms 7.4737ms Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0e_1x7/Conv2D
GPU activities: 25.90% 1.7401ms 2 870.05us 865.60us 874.50us maxwell_scudnn_128x64_relu_small_nn
20.16% 1.3549ms 1 1.3549ms 1.3549ms 1.3549ms void cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
15.93% 1.0704ms 1 1.0704ms 1.0704ms 1.0704ms maxwell_gcgemm_64x32_nt
14.93% 1.0031ms 1 1.0031ms 1.0031ms 1.0031ms void fft1d_r2c_32<float, float, float2, bool=1, bool=0>(float2*, float const *, int, int3, int3, int2, int2)
14.50% 974.35us 1 974.35us 974.35us 974.35us void cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
3.81% 256.31us 1 256.31us 256.31us 256.31us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
2.09% 140.68us 1 140.68us 140.68us 140.68us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
1.80% 120.68us 1 120.68us 120.68us 120.68us void fft1d_r2c_32<float, float, float2, bool=0, bool=0>(float2*, float const *, int, int3, int3, int2, int2)
0.80% 53.804us 1 53.804us 53.804us 53.804us void fft1d_c2r_32<float2, float, float, bool=0, bool=1, bool=0, bool=0>(float*, float2 const *, int, int3, int3, int2, int, float, float, float*, float*)
0.08% 5.1040us 2 2.5520us 2.5000us 2.6040us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 956.90us 12 79.741us 57.605us 109.38us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_3/Conv2d_0b_1x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 398.92us 1 398.92us 398.92us 398.92us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_3/Conv2d_0b_1x1/Conv2D
GPU activities: 79.00% 769.03us 1 769.03us 769.03us 769.03us maxwell_scudnn_128x64_relu_interior_nn
20.83% 202.77us 1 202.77us 202.77us 202.77us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.17% 1.6140us 1 1.6140us 1.6140us 1.6140us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 225.16us 3 75.053us 64.637us 93.023us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_0/Conv2d_0a_1x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 395.74us 1 395.74us 395.74us 395.74us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_0/Conv2d_0a_1x1/Conv2D
GPU activities: 78.83% 774.34us 1 774.34us 774.34us 774.34us maxwell_scudnn_128x64_relu_interior_nn
20.89% 205.16us 1 205.16us 205.16us 205.16us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.28% 2.7610us 1 2.7610us 2.7610us 2.7610us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 227.66us 3 75.887us 69.169us 85.731us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0a_1x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 472.41us 1 472.41us 472.41us 472.41us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0a_1x1/Conv2D
GPU activities: 85.49% 972.47us 1 972.47us 972.47us 972.47us void cudnn::detail::implicit_convolve_sgemm<float, float, int=128, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=128, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
14.51% 165.01us 1 165.01us 165.01us 165.01us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
API calls: 100.00% 236.67us 2 118.34us 95.836us 140.84us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0b_1x7/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 9.2379ms 1 9.2379ms 9.2379ms 9.2379ms Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0b_1x7/Conv2D
GPU activities: 28.87% 2.1006ms 2 1.0503ms 1.0401ms 1.0604ms maxwell_scudnn_128x64_relu_small_nn
18.66% 1.3577ms 1 1.3577ms 1.3577ms 1.3577ms void cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
15.00% 1.0916ms 1 1.0916ms 1.0916ms 1.0916ms maxwell_gcgemm_64x32_nt
14.60% 1.0621ms 1 1.0621ms 1.0621ms 1.0621ms void fft1d_r2c_32<float, float, float2, bool=1, bool=0>(float2*, float const *, int, int3, int3, int2, int2)
13.99% 1.0178ms 1 1.0178ms 1.0178ms 1.0178ms void cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
3.79% 275.42us 1 275.42us 275.42us 275.42us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
2.52% 183.08us 1 183.08us 183.08us 183.08us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
1.87% 136.15us 1 136.15us 136.15us 136.15us void fft1d_r2c_32<float, float, float2, bool=0, bool=0>(float2*, float const *, int, int3, int3, int2, int2)
0.64% 46.460us 1 46.460us 46.460us 46.460us void fft1d_c2r_32<float2, float, float, bool=0, bool=1, bool=0, bool=0>(float*, float2 const *, int, int3, int3, int2, int, float, float, float*, float*)
0.07% 4.8960us 2 2.4480us 2.3960us 2.5000us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 1.0249ms 12 85.405us 59.533us 118.34us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0c_7x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 10.141ms 1 10.141ms 10.141ms 10.141ms Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0c_7x1/Conv2D
GPU activities: 25.98% 2.1744ms 2 1.0872ms 1.0816ms 1.0927ms maxwell_scudnn_128x64_relu_small_nn
21.52% 1.8011ms 1 1.8011ms 1.8011ms 1.8011ms void cudnn::detail::implicit_convolve_sgemm<float, float, int=128, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=128, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
15.58% 1.3041ms 1 1.3041ms 1.3041ms 1.3041ms maxwell_gcgemm_64x32_nt
14.92% 1.2491ms 1 1.2491ms 1.2491ms 1.2491ms void fft1d_r2c_32<float, float, float2, bool=1, bool=0>(float2*, float const *, int, int3, int3, int2, int2)
13.74% 1.1499ms 1 1.1499ms 1.1499ms 1.1499ms void cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
3.76% 315.11us 1 315.11us 315.11us 315.11us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
2.72% 227.61us 1 227.61us 227.61us 227.61us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
1.10% 91.878us 1 91.878us 91.878us 91.878us void fft1d_r2c_32<float, float, float2, bool=0, bool=1>(float2*, float const *, int, int3, int3, int2, int2)
0.62% 51.824us 1 51.824us 51.824us 51.824us void fft1d_c2r_32<float2, float, float, bool=0, bool=1, bool=0, bool=1>(float*, float2 const *, int, int3, int3, int2, int, float, float, float*, float*)
0.06% 4.7390us 2 2.3690us 2.3430us 2.3960us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 851.27us 12 70.939us 48.543us 133.23us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0a_1x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 14.644ms 1 14.644ms 14.644ms 14.644ms Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0a_1x1/Conv2D
GPU activities: 37.78% 4.8328ms 1 4.8328ms 4.8328ms 4.8328ms maxwell_gcgemm_64x32_nt
26.99% 3.4523ms 1 3.4523ms 3.4523ms 3.4523ms void fft1d_r2c_32<float, float, float2, bool=1, bool=0>(float2*, float const *, int, int3, int3, int2, int2)
15.54% 1.9873ms 2 993.65us 988.26us 999.04us void cudnn::detail::implicit_convolve_sgemm<float, float, int=128, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=128, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
5.97% 763.77us 1 763.77us 763.77us 763.77us maxwell_scudnn_128x64_relu_interior_nn
5.47% 699.39us 1 699.39us 699.39us 699.39us void cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
4.74% 605.75us 1 605.75us 605.75us 605.75us void fft1d_r2c_32<float, float, float2, bool=0, bool=0>(float2*, float const *, int, int3, int3, int2, int2)
1.84% 235.48us 1 235.48us 235.48us 235.48us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
1.28% 164.33us 1 164.33us 164.33us 164.33us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.37% 47.242us 1 47.242us 47.242us 47.242us void fft1d_c2r_32<float2, float, float, bool=0, bool=1, bool=0, bool=0>(float*, float2 const *, int, int3, int3, int2, int, float, float, float*, float*)
0.02% 2.4490us 1 2.4490us 2.4490us 2.4490us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 1.8579ms 11 168.90us 57.345us 577.88us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0b_7x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 10.454ms 1 10.454ms 10.454ms 10.454ms Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0b_7x1/Conv2D
GPU activities: 34.58% 2.6255ms 2 1.3128ms 1.3112ms 1.3144ms void cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
14.42% 1.0949ms 1 1.0949ms 1.0949ms 1.0949ms maxwell_gcgemm_64x32_nt
14.23% 1.0800ms 1 1.0800ms 1.0800ms 1.0800ms maxwell_scudnn_128x64_relu_small_nn
14.09% 1.0699ms 1 1.0699ms 1.0699ms 1.0699ms void cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
14.03% 1.0653ms 1 1.0653ms 1.0653ms 1.0653ms void fft1d_r2c_32<float, float, float2, bool=1, bool=0>(float2*, float const *, int, int3, int3, int2, int2)
3.63% 275.84us 1 275.84us 275.84us 275.84us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
3.09% 234.90us 1 234.90us 234.90us 234.90us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
1.28% 97.139us 1 97.139us 97.139us 97.139us void fft1d_r2c_32<float, float, float2, bool=0, bool=1>(float2*, float const *, int, int3, int3, int2, int2)
0.61% 46.511us 1 46.511us 46.511us 46.511us void fft1d_c2r_32<float2, float, float, bool=0, bool=1, bool=0, bool=1>(float*, float2 const *, int, int3, int3, int2, int, float, float, float*, float*)
0.03% 2.3960us 1 2.3960us 2.3960us 2.3960us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 1.1668ms 11 106.07us 52.605us 203.76us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0c_1x7/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 315.68us 1 315.68us 315.68us 315.68us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0c_1x7/Conv2D
GPU activities: 78.96% 1.0444ms 1 1.0444ms 1.0444ms 1.0444ms maxwell_scudnn_128x64_relu_small_nn
20.88% 276.21us 1 276.21us 276.21us 276.21us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.16% 2.1350us 1 2.1350us 2.1350us 2.1350us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 179.64us 3 59.880us 54.949us 65.679us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0d_7x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 317.82us 1 317.82us 317.82us 317.82us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0d_7x1/Conv2D
GPU activities: 82.40% 1.3001ms 1 1.3001ms 1.3001ms 1.3001ms void cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
17.60% 277.72us 1 277.72us 277.72us 277.72us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
API calls: 100.00% 184.85us 2 92.424us 61.564us 123.28us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0e_1x7/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 10.246ms 1 10.246ms 10.246ms 10.246ms Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0e_1x7/Conv2D
GPU activities: 25.23% 2.1182ms 2 1.0591ms 1.0520ms 1.0662ms maxwell_scudnn_128x64_relu_small_nn
22.32% 1.8732ms 1 1.8732ms 1.8732ms 1.8732ms void cudnn::detail::implicit_convolve_sgemm<float, float, int=128, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=128, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
15.48% 1.2990ms 1 1.2990ms 1.2990ms 1.2990ms maxwell_gcgemm_64x32_nt
14.88% 1.2494ms 1 1.2494ms 1.2494ms 1.2494ms void fft1d_r2c_32<float, float, float2, bool=1, bool=0>(float2*, float const *, int, int3, int3, int2, int2)
13.72% 1.1514ms 1 1.1514ms 1.1514ms 1.1514ms void cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
3.95% 331.78us 1 331.78us 331.78us 331.78us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
2.15% 180.11us 1 180.11us 180.11us 180.11us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
1.61% 135.11us 1 135.11us 135.11us 135.11us void fft1d_r2c_32<float, float, float2, bool=0, bool=0>(float2*, float const *, int, int3, int3, int2, int2)
0.61% 51.095us 1 51.095us 51.095us 51.095us void fft1d_c2r_32<float2, float, float, bool=0, bool=1, bool=0, bool=0>(float*, float2 const *, int, int3, int3, int2, int, float, float, float*, float*)
0.06% 4.8440us 2 2.4220us 2.3960us 2.4480us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 930.60us 12 77.549us 50.834us 134.95us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_3/Conv2d_0b_1x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 343.60us 1 343.60us 343.60us 343.60us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_3/Conv2d_0b_1x1/Conv2D
GPU activities: 78.88% 773.15us 1 773.15us 773.15us 773.15us maxwell_scudnn_128x64_relu_interior_nn
20.95% 205.32us 1 205.32us 205.32us 205.32us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.17% 1.6670us 1 1.6670us 1.6670us 1.6670us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 212.25us 3 70.748us 63.960us 74.481us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_0/Conv2d_0a_1x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 284.49us 1 284.49us 284.49us 284.49us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_0/Conv2d_0a_1x1/Conv2D
GPU activities: 79.28% 772.42us 1 772.42us 772.42us 772.42us maxwell_scudnn_128x64_relu_interior_nn
20.46% 199.38us 1 199.38us 199.38us 199.38us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.26% 2.5000us 1 2.5000us 2.5000us 2.5000us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 176.98us 3 58.994us 54.325us 62.137us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0a_1x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 287.77us 1 287.77us 287.77us 287.77us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0a_1x1/Conv2D
GPU activities: 85.03% 955.23us 1 955.23us 955.23us 955.23us void cudnn::detail::implicit_convolve_sgemm<float, float, int=128, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=128, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
14.97% 168.23us 1 168.23us 168.23us 168.23us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
API calls: 100.00% 167.66us 2 83.830us 68.856us 98.805us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0b_1x7/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 258.50us 1 258.50us 258.50us 258.50us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0b_1x7/Conv2D
GPU activities: 79.46% 1.0471ms 1 1.0471ms 1.0471ms 1.0471ms maxwell_scudnn_128x64_relu_small_nn
20.34% 268.08us 1 268.08us 268.08us 268.08us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.19% 2.5520us 1 2.5520us 2.5520us 2.5520us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 157.35us 3 52.449us 49.272us 55.106us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0c_7x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 272.35us 1 272.35us 272.35us 272.35us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0c_7x1/Conv2D
GPU activities: 77.81% 1.0937ms 1 1.0937ms 1.0937ms 1.0937ms maxwell_scudnn_128x64_relu_small_nn
22.07% 310.27us 1 310.27us 310.27us 310.27us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.11% 1.6150us 1 1.6150us 1.6150us 1.6150us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 162.09us 3 54.029us 48.595us 60.105us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0a_1x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 368.71us 1 368.71us 368.71us 368.71us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0a_1x1/Conv2D
GPU activities: 85.18% 981.38us 1 981.38us 981.38us 981.38us void cudnn::detail::implicit_convolve_sgemm<float, float, int=128, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=128, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
14.82% 170.73us 1 170.73us 170.73us 170.73us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
API calls: 100.00% 192.92us 2 96.460us 62.085us 130.84us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0b_7x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 276.21us 1 276.21us 276.21us 276.21us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0b_7x1/Conv2D
GPU activities: 82.90% 1.3068ms 1 1.3068ms 1.3068ms 1.3068ms void cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
17.10% 269.64us 1 269.64us 269.64us 269.64us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
API calls: 100.00% 169.17us 2 84.585us 56.720us 112.45us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0c_1x7/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 745.75us 1 745.75us 745.75us 745.75us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0c_1x7/Conv2D
GPU activities: 79.74% 1.0423ms 1 1.0423ms 1.0423ms 1.0423ms maxwell_scudnn_128x64_relu_small_nn
20.10% 262.72us 1 262.72us 262.72us 262.72us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.16% 2.1350us 1 2.1350us 2.1350us 2.1350us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 627.67us 3 209.22us 55.991us 510.48us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0d_7x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 277.51us 1 277.51us 277.51us 277.51us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0d_7x1/Conv2D
GPU activities: 82.91% 1.3108ms 1 1.3108ms 1.3108ms 1.3108ms void cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
17.09% 270.16us 1 270.16us 270.16us 270.16us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
API calls: 100.00% 168.08us 2 84.039us 58.804us 109.27us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0e_1x7/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 258.50us 1 258.50us 258.50us 258.50us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0e_1x7/Conv2D
GPU activities: 76.93% 1.0750ms 1 1.0750ms 1.0750ms 1.0750ms maxwell_scudnn_128x64_relu_small_nn
22.90% 320.01us 1 320.01us 320.01us 320.01us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.17% 2.3960us 1 2.3960us 2.3960us 2.3960us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 159.01us 3 53.004us 51.251us 54.168us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_3/Conv2d_0b_1x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 259.80us 1 259.80us 259.80us 259.80us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_3/Conv2d_0b_1x1/Conv2D
GPU activities: 79.80% 763.36us 1 763.36us 763.36us 763.36us maxwell_scudnn_128x64_relu_interior_nn
20.04% 191.67us 1 191.67us 191.67us 191.67us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.17% 1.6150us 1 1.6150us 1.6150us 1.6150us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 162.50us 3 54.168us 52.033us 55.366us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_0/Conv2d_0a_1x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 290.58us 1 290.58us 290.58us 290.58us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_0/Conv2d_0a_1x1/Conv2D
GPU activities: 79.71% 767.78us 1 767.78us 767.78us 767.78us maxwell_scudnn_128x64_relu_interior_nn
20.13% 193.86us 1 193.86us 193.86us 193.86us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.17% 1.6150us 1 1.6150us 1.6150us 1.6150us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 181.93us 3 60.643us 57.605us 64.220us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0a_1x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 324.12us 1 324.12us 324.12us 324.12us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0a_1x1/Conv2D
GPU activities: 79.71% 759.19us 1 759.19us 759.19us 759.19us maxwell_scudnn_128x64_relu_interior_nn
20.11% 191.57us 1 191.57us 191.57us 191.57us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.17% 1.6660us 1 1.6660us 1.6660us 1.6660us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 221.05us 3 73.682us 56.355us 104.01us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0b_1x7/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 11.163ms 1 11.163ms 11.163ms 11.163ms Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0b_1x7/Conv2D
GPU activities: 25.32% 2.5560ms 2 1.2780ms 1.2562ms 1.2997ms maxwell_scudnn_128x64_relu_small_nn
22.25% 2.2458ms 1 2.2458ms 2.2458ms 2.2458ms void cudnn::detail::implicit_convolve_sgemm<float, float, int=128, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=128, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
15.20% 1.5346ms 1 1.5346ms 1.5346ms 1.5346ms maxwell_gcgemm_64x32_nt
14.72% 1.4854ms 1 1.4854ms 1.4854ms 1.4854ms void fft1d_r2c_32<float, float, float2, bool=1, bool=0>(float2*, float const *, int, int3, int3, int2, int2)
14.15% 1.4284ms 1 1.4284ms 1.4284ms 1.4284ms void cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
3.93% 396.89us 1 396.89us 396.89us 396.89us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
2.32% 234.23us 1 234.23us 234.23us 234.23us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
1.53% 154.59us 1 154.59us 154.59us 154.59us void fft1d_r2c_32<float, float, float2, bool=0, bool=0>(float2*, float const *, int, int3, int3, int2, int2)
0.51% 51.617us 1 51.617us 51.617us 51.617us void fft1d_c2r_32<float2, float, float, bool=0, bool=1, bool=0, bool=0>(float*, float2 const *, int, int3, int3, int2, int, float, float, float*, float*)
0.05% 5.2100us 2 2.6050us 2.5000us 2.7100us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 851.48us 12 70.956us 47.397us 123.65us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0c_7x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 253.34us 1 253.34us 253.34us 253.34us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0c_7x1/Conv2D
GPU activities: 76.65% 1.3238ms 1 1.3238ms 1.3238ms 1.3238ms maxwell_scudnn_128x64_relu_small_nn
23.25% 401.57us 1 401.57us 401.57us 401.57us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.10% 1.6660us 1 1.6660us 1.6660us 1.6660us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 159.69us 3 53.230us 47.136us 59.481us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0a_1x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 284.85us 1 284.85us 284.85us 284.85us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0a_1x1/Conv2D
GPU activities: 78.30% 777.68us 1 777.68us 777.68us 777.68us maxwell_scudnn_128x64_relu_interior_nn
21.49% 213.50us 1 213.50us 213.50us 213.50us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.21% 2.0840us 1 2.0840us 2.0840us 2.0840us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 175.42us 3 58.473us 51.720us 66.512us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0b_7x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 23.374ms 1 23.374ms 23.374ms 23.374ms Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0b_7x1/Conv2D
GPU activities: 26.46% 2.6411ms 2 1.3206ms 1.3155ms 1.3257ms maxwell_scudnn_128x64_relu_small_nn
20.86% 2.0820ms 1 2.0820ms 2.0820ms 2.0820ms void cudnn::detail::implicit_convolve_sgemm<float, float, int=128, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=128, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
15.32% 1.5296ms 1 1.5296ms 1.5296ms 1.5296ms maxwell_gcgemm_64x32_nt
14.86% 1.4833ms 1 1.4833ms 1.4833ms 1.4833ms void fft1d_r2c_32<float, float, float2, bool=1, bool=0>(float2*, float const *, int, int3, int3, int2, int2)
13.77% 1.3745ms 1 1.3745ms 1.3745ms 1.3745ms void cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
4.05% 404.18us 1 404.18us 404.18us 404.18us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
3.07% 306.57us 1 306.57us 306.57us 306.57us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
1.03% 103.02us 1 103.02us 103.02us 103.02us void fft1d_r2c_32<float, float, float2, bool=0, bool=1>(float2*, float const *, int, int3, int3, int2, int2)
0.52% 51.981us 1 51.981us 51.981us 51.981us void fft1d_c2r_32<float2, float, float, bool=0, bool=1, bool=0, bool=1>(float*, float2 const *, int, int3, int3, int2, int, float, float, float*, float*)
0.05% 4.8960us 2 2.4480us 2.3960us 2.5000us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 1.7882ms 12 149.02us 48.387us 524.75us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0c_1x7/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 305.53us 1 305.53us 305.53us 305.53us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0c_1x7/Conv2D
GPU activities: 76.08% 1.2926ms 1 1.2926ms 1.2926ms 1.2926ms maxwell_scudnn_128x64_relu_small_nn
23.80% 404.33us 1 404.33us 404.33us 404.33us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.12% 2.0840us 1 2.0840us 2.0840us 2.0840us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 188.08us 3 62.692us 50.887us 82.710us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0d_7x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 275.01us 1 275.01us 275.01us 275.01us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0d_7x1/Conv2D
GPU activities: 76.57% 1.3237ms 1 1.3237ms 1.3237ms 1.3237ms maxwell_scudnn_128x64_relu_small_nn
23.29% 402.56us 1 402.56us 402.56us 402.56us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.14% 2.4480us 1 2.4480us 2.4480us 2.4480us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 173.08us 3 57.692us 55.262us 61.408us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0e_1x7/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 253.91us 1 253.91us 253.91us 253.91us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0e_1x7/Conv2D
GPU activities: 76.13% 1.2951ms 1 1.2951ms 1.2951ms 1.2951ms maxwell_scudnn_128x64_relu_small_nn
23.73% 403.76us 1 403.76us 403.76us 403.76us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.13% 2.2930us 1 2.2930us 2.2930us 2.2930us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 157.04us 3 52.345us 49.168us 55.210us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_3/Conv2d_0b_1x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 274.96us 1 274.96us 274.96us 274.96us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_3/Conv2d_0b_1x1/Conv2D
GPU activities: 78.60% 756.01us 1 756.01us 756.01us 756.01us maxwell_scudnn_128x64_relu_interior_nn
21.23% 204.22us 1 204.22us 204.22us 204.22us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.17% 1.6150us 1 1.6150us 1.6150us 1.6150us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 178.81us 3 59.602us 52.762us 71.721us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_0/Conv2d_0a_1x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 315.16us 1 315.16us 315.16us 315.16us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_0/Conv2d_0a_1x1/Conv2D
GPU activities: 80.30% 762.83us 1 762.83us 762.83us 762.83us maxwell_scudnn_128x64_relu_interior_nn
19.53% 185.53us 1 185.53us 185.53us 185.53us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.17% 1.6140us 1 1.6140us 1.6140us 1.6140us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 167.30us 3 55.765us 48.699us 61.512us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_0/Conv2d_1a_3x3/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 13.241ms 1 13.241ms 13.241ms 13.241ms Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_0/Conv2d_1a_3x3/Conv2D
GPU activities: 26.54% 1.3055ms 2 652.73us 639.08us 666.37us void cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
21.67% 1.0658ms 1 1.0658ms 1.0658ms 1.0658ms void cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
21.53% 1.0590ms 1 1.0590ms 1.0590ms 1.0590ms maxwell_scudnn_128x64_relu_interior_nn
21.15% 1.0404ms 1 1.0404ms 1.0404ms 1.0404ms void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
9.06% 445.64us 2 222.82us 222.51us 223.13us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
0.04% 2.1870us 1 2.1870us 2.1870us 2.1870us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 700.48us 8 87.560us 50.470us 154.85us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0a_1x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 305.89us 1 305.89us 305.89us 305.89us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0a_1x1/Conv2D
GPU activities: 79.04% 767.99us 1 767.99us 767.99us 767.99us maxwell_scudnn_128x64_relu_interior_nn
20.71% 201.26us 1 201.26us 201.26us 201.26us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.25% 2.3960us 1 2.3960us 2.3960us 2.3960us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 190.89us 3 63.630us 53.283us 74.481us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0b_1x7/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 251.73us 1 251.73us 251.73us 251.73us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0b_1x7/Conv2D
GPU activities: 76.13% 1.3013ms 1 1.3013ms 1.3013ms 1.3013ms maxwell_scudnn_128x64_relu_small_nn
23.75% 405.95us 1 405.95us 405.95us 405.95us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.12% 2.1350us 1 2.1350us 2.1350us 2.1350us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 159.28us 3 53.091us 47.189us 63.335us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0c_7x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 316.21us 1 316.21us 316.21us 316.21us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0c_7x1/Conv2D
GPU activities: 76.41% 1.3352ms 1 1.3352ms 1.3352ms 1.3352ms maxwell_scudnn_128x64_relu_small_nn
23.44% 409.54us 1 409.54us 409.54us 409.54us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
0.15% 2.6040us 1 2.6040us 2.6040us 2.6040us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 178.49us 3 59.498us 52.137us 64.012us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_1a_3x3/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.3049ms 1 4.3049ms 4.3049ms 4.3049ms Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_1a_3x3/Conv2D
GPU activities: 39.64% 1.1387ms 2 569.34us 563.30us 575.38us void cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
20.76% 596.42us 1 596.42us 596.42us 596.42us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
19.07% 547.67us 1 547.67us 547.67us 547.67us maxwell_scudnn_128x64_relu_interior_nn
12.63% 362.87us 1 362.87us 362.87us 362.87us void cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
7.82% 224.54us 1 224.54us 224.54us 224.54us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
0.08% 2.1880us 1 2.1880us 2.1880us 2.1880us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 542.62us 7 77.516us 49.116us 136.51us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_0/Conv2d_0a_1x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.4348ms 1 4.4348ms 4.4348ms 4.4348ms Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_0/Conv2d_0a_1x1/Conv2D
GPU activities: 40.11% 1.4692ms 2 734.58us 732.36us 736.79us void cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
21.49% 787.05us 1 787.05us 787.05us 787.05us maxwell_scudnn_128x64_relu_interior_nn
15.54% 569.34us 1 569.34us 569.34us 569.34us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
13.05% 477.83us 1 477.83us 477.83us 477.83us void cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
9.74% 356.89us 1 356.89us 356.89us 356.89us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
0.07% 2.3960us 1 2.3960us 2.3960us 2.3960us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 551.58us 7 78.796us 51.304us 129.27us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_1/Conv2d_0a_1x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 5.4747ms 1 5.4747ms 5.4747ms 5.4747ms Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_1/Conv2d_0a_1x1/Conv2D
GPU activities: 41.49% 1.6469ms 2 823.43us 823.15us 823.72us void cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
17.80% 706.63us 1 706.63us 706.63us 706.63us maxwell_scudnn_128x128_relu_interior_nn
17.69% 702.21us 1 702.21us 702.21us 702.21us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
13.98% 555.07us 1 555.07us 555.07us 555.07us void cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
8.96% 355.79us 1 355.79us 355.79us 355.79us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
0.06% 2.5000us 1 2.5000us 2.5000us 2.5000us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 581.11us 7 83.015us 53.855us 144.54us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_1/Conv2d_0b_1x3/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.1102ms 1 4.1102ms 4.1102ms 4.1102ms Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_1/Conv2d_0b_1x3/Conv2D
GPU activities: 28.17% 981.38us 2 490.69us 482.72us 498.66us void cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
24.85% 865.91us 1 865.91us 865.91us 865.91us void cudnn::detail::implicit_convolve_sgemm<float, float, int=128, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=128, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
18.90% 658.46us 1 658.46us 658.46us 658.46us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
18.69% 651.27us 1 651.27us 651.27us 651.27us maxwell_scudnn_128x128_relu_small_nn
9.32% 324.59us 2 162.30us 161.52us 163.08us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
0.08% 2.7080us 1 2.7080us 2.7080us 2.7080us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 586.16us 8 73.269us 46.303us 116.41us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_1/Conv2d_0b_3x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 5.4898ms 1 5.4898ms 5.4898ms 5.4898ms Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_1/Conv2d_0b_3x1/Conv2D
GPU activities: 28.00% 984.09us 2 492.04us 484.18us 499.91us void cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
24.92% 875.81us 1 875.81us 875.81us 875.81us void cudnn::detail::implicit_convolve_sgemm<float, float, int=128, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=128, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
18.43% 647.57us 1 647.57us 647.57us 647.57us maxwell_scudnn_128x128_relu_small_nn
18.33% 644.34us 1 644.34us 644.34us 644.34us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
10.25% 360.22us 2 180.11us 179.02us 181.20us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
0.07% 2.3960us 1 2.3960us 2.3960us 2.3960us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 701.94us 8 87.743us 50.887us 146.67us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0a_1x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.6683ms 1 4.6683ms 4.6683ms 4.6683ms Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0a_1x1/Conv2D
GPU activities: 41.67% 1.9140ms 2 957.01us 947.01us 967.01us void cudnn::detail::implicit_convolve_sgemm<float, float, int=128, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=128, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
18.75% 861.27us 1 861.27us 861.27us 861.27us maxwell_scudnn_128x128_relu_interior_nn
18.25% 838.15us 1 838.15us 838.15us 838.15us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
13.53% 621.74us 1 621.74us 621.74us 621.74us void cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
7.74% 355.74us 1 355.74us 355.74us 355.74us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
0.06% 2.8120us 1 2.8120us 2.8120us 2.8120us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 566.11us 7 80.872us 53.074us 145.63us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0b_3x3/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 15.175ms 1 15.175ms 15.175ms 15.175ms Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0b_3x3/Conv2D
GPU activities: 41.71% 6.5893ms 1 6.5893ms 6.5893ms 6.5893ms void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
21.14% 3.3392ms 2 1.6696ms 1.6475ms 1.6917ms void cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
19.22% 3.0354ms 1 3.0354ms 3.0354ms 3.0354ms void cudnn::detail::implicit_convolve_sgemm<float, float, int=128, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=128, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
14.47% 2.2855ms 1 2.2855ms 2.2855ms 2.2855ms maxwell_scudnn_128x128_relu_small_nn
3.45% 545.17us 2 272.59us 270.53us 274.64us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
0.02% 2.4480us 1 2.4480us 2.4480us 2.4480us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 830.70us 8 103.84us 56.303us 158.44us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0c_1x3/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 378.92us 1 378.92us 378.92us 378.92us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0c_1x3/Conv2D
GPU activities: 51.32% 671.84us 1 671.84us 671.84us 671.84us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
36.60% 479.18us 1 479.18us 479.18us 479.18us void cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
12.08% 158.13us 1 158.13us 158.13us 158.13us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
API calls: 100.00% 228.81us 3 76.269us 54.897us 115.11us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0d_3x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 356.16us 1 356.16us 356.16us 356.16us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0d_3x1/Conv2D
GPU activities: 49.36% 645.85us 1 645.85us 645.85us 645.85us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
36.77% 481.16us 1 481.16us 481.16us 481.16us void cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
13.88% 181.57us 1 181.57us 181.57us 181.57us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
API calls: 100.00% 203.44us 3 67.813us 55.626us 87.762us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_3/Conv2d_0b_1x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 3.1786ms 1 3.1786ms 3.1786ms 3.1786ms Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_3/Conv2d_0b_1x1/Conv2D
GPU activities: 36.21% 793.20us 2 396.60us 392.93us 400.27us void cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
18.52% 405.74us 1 405.74us 405.74us 405.74us maxwell_scudnn_128x64_relu_interior_nn
16.33% 357.72us 1 357.72us 357.72us 357.72us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
15.86% 347.35us 1 347.35us 347.35us 347.35us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
12.96% 283.86us 1 283.86us 283.86us 283.86us void cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
0.11% 2.5000us 1 2.5000us 2.5000us 2.5000us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 536.58us 7 76.653us 48.543us 134.27us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_0/Conv2d_0a_1x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 7.7980ms 1 7.7980ms 7.7980ms 7.7980ms Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_0/Conv2d_0a_1x1/Conv2D
GPU activities: 30.55% 2.1323ms 1 2.1323ms 2.1323ms 2.1323ms void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
22.05% 1.5388ms 2 769.40us 748.25us 790.54us void cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
18.80% 1.3118ms 1 1.3118ms 1.3118ms 1.3118ms void cudnn::detail::implicit_convolve_sgemm<float, float, int=128, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=128, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
18.15% 1.2663ms 1 1.2663ms 1.2663ms 1.2663ms maxwell_scudnn_128x64_relu_interior_nn
10.42% 727.31us 2 363.66us 362.51us 364.80us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
0.03% 2.3960us 1 2.3960us 2.3960us 2.3960us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 857.73us 8 107.22us 53.699us 359.70us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_1/Conv2d_0a_1x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 9.0169ms 1 9.0169ms 9.0169ms 9.0169ms Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_1/Conv2d_0a_1x1/Conv2D
GPU activities: 34.97% 2.6242ms 1 2.6242ms 2.6242ms 2.6242ms void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
22.76% 1.7077ms 2 853.83us 838.93us 868.72us void cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
17.50% 1.3134ms 1 1.3134ms 1.3134ms 1.3134ms void cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
15.01% 1.1263ms 1 1.1263ms 1.1263ms 1.1263ms maxwell_scudnn_128x128_relu_interior_nn
9.72% 729.39us 2 364.70us 363.34us 366.05us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
0.03% 2.3960us 1 2.3960us 2.3960us 2.3960us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 721.37us 8 90.171us 61.356us 150.47us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_1/Conv2d_0b_1x3/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 395.84us 1 395.84us 395.84us 395.84us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_1/Conv2d_0b_1x3/Conv2D
GPU activities: 50.08% 640.75us 1 640.75us 640.75us 640.75us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
37.53% 480.17us 1 480.17us 480.17us 480.17us void cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
12.40% 158.60us 1 158.60us 158.60us 158.60us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
API calls: 100.00% 256.83us 3 85.609us 54.272us 120.89us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_1/Conv2d_0c_3x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 419.28us 1 419.28us 419.28us 419.28us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_1/Conv2d_0c_3x1/Conv2D
GPU activities: 49.56% 644.34us 1 644.34us 644.34us 644.34us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
36.86% 479.28us 1 479.28us 479.28us 479.28us void cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
13.58% 176.57us 1 176.57us 176.57us 176.57us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
API calls: 100.00% 277.04us 3 92.345us 63.074us 150.21us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0a_1x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 10.251ms 1 10.251ms 10.251ms 10.251ms Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0a_1x1/Conv2D
GPU activities: 35.85% 3.1308ms 1 3.1308ms 3.1308ms 3.1308ms void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
22.73% 1.9854ms 2 992.68us 974.19us 1.0112ms void cudnn::detail::explicit_convolve_sgemm<float, int, int=1024, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=1024, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
17.53% 1.5305ms 1 1.5305ms 1.5305ms 1.5305ms void cudnn::detail::implicit_convolve_sgemm<float, float, int=128, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=128, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
15.45% 1.3496ms 1 1.3496ms 1.3496ms 1.3496ms maxwell_scudnn_128x128_relu_interior_nn
8.41% 734.19us 2 367.09us 366.94us 367.25us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
0.03% 2.4480us 1 2.4480us 2.4480us 2.4480us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 704.23us 8 88.029us 58.178us 162.82us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0b_3x3/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 509.28us 1 509.28us 509.28us 509.28us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0b_3x3/Conv2D
GPU activities: 77.25% 6.5436ms 1 6.5436ms 6.5436ms 6.5436ms void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
19.49% 1.6511ms 1 1.6511ms 1.6511ms 1.6511ms void cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
3.26% 276.26us 1 276.26us 276.26us 276.26us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
API calls: 100.00% 292.25us 3 97.415us 64.324us 161.25us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0c_1x3/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 548.92us 1 548.92us 548.92us 548.92us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0c_1x3/Conv2D
GPU activities: 50.88% 663.51us 1 663.51us 663.51us 663.51us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
36.81% 480.12us 1 480.12us 480.12us 480.12us void cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
12.31% 160.53us 1 160.53us 160.53us 160.53us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
API calls: 100.00% 408.08us 3 136.03us 57.241us 289.75us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0d_3x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 403.19us 1 403.19us 403.19us 403.19us Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0d_3x1/Conv2D
GPU activities: 49.37% 644.44us 1 644.44us 644.44us 644.44us void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
36.92% 481.94us 1 481.94us 481.94us 481.94us void cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
13.71% 179.02us 1 179.02us 179.02us 179.02us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
API calls: 100.00% 231.46us 3 77.154us 57.397us 115.78us cudaLaunchKernel
==21650== Range "Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_3/Conv2d_0b_1x1/Conv2D"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 5.1774ms 1 5.1774ms 5.1774ms 5.1774ms Conv2D: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_3/Conv2d_0b_1x1/Conv2D
GPU activities: 32.12% 1.2752ms 2 637.62us 632.93us 642.31us void cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const *, int, float*, cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
31.50% 1.2510ms 1 1.2510ms 1.2510ms 1.2510ms void tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>(int, float const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::ShuffleInTensor3Simple<float, int=2, int=1, int=0, bool=0>*)
16.58% 658.56us 1 658.56us 658.56us 658.56us maxwell_scudnn_128x64_relu_interior_nn
10.72% 425.48us 1 425.48us 425.48us 425.48us void cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>(int, int, int, float const *, int, float const , int, cudnn::detail::explicit_convolve_sgemm<float, int, int=128, int=5, int=5, int=3, int=3, int=3, int=0, bool=1>*, kernel_conv_params, int, int, float, float, int, float const *, float const *)
9.02% 357.98us 1 357.98us 357.98us 357.98us void im2col4d_kernel<float, int>(im2col4d_params, cudnnConvolutionStruct, cudnnTensor4dStruct, float const *, float*, int)
0.07% 2.6040us 1 2.6040us 2.6040us 2.6040us cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::maxwell::gemm::ComputeOffsetsParams)
API calls: 100.00% 579.44us 7 82.777us 52.657us 141.31us cudaLaunchKernel
==21650== Range "ExpandDims: ArithmeticOptimizer/ReorderCastLikeAndValuePreserving_uint8_ExpandDims"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 27.136us 1 27.136us 27.136us 27.136us ExpandDims: ArithmeticOptimizer/ReorderCastLikeAndValuePreserving_uint8_ExpandDims
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "MatMul: import/final_retrain_ops/Wx_plus_b/MatMul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 285.14ms 1 285.14ms 285.14ms 285.14ms MatMul: import/final_retrain_ops/Wx_plus_b/MatMul
GPU activities: 100.00% 57.397us 1 57.397us 57.397us 57.397us void gemv2N_kernel_val<int, int, float, float, float, int=128, int=32, int=4, int=4, int=1, cublasGemvParams<cublasGemvTensor<float const >, cublasGemvTensor<float>, float>>(float, float, float const )
API calls: 100.00% 390.32us 1 390.32us 390.32us 390.32us cudaLaunchKernel
==21650== Range "MaxPool: import/module_apply_default/InceptionV3/InceptionV3/MaxPool_3a_3x3/MaxPool"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 76.576ms 1 76.576ms 76.576ms 76.576ms MaxPool: import/module_apply_default/InceptionV3/InceptionV3/MaxPool_3a_3x3/MaxPool
GPU activities: 100.00% 1.0956ms 1 1.0956ms 1.0956ms 1.0956ms void cudnn::detail::pooling_fw_4d_kernel<float, float, cudnn::detail::maxpooling_func<float, cudnnNanPropagation_t=0>, int=0, bool=0>(cudnnTensorStruct, float const *, cudnn::detail::pooling_fw_4d_kernel<float, float, cudnn::detail::maxpooling_func<float, cudnnNanPropagation_t=0>, int=0, bool=0>, cudnnTensorStruct*, cudnnPoolingStruct, float, cudnnPoolingStruct, int, cudnn::reduced_divisor, float)
API calls: 100.00% 116.46us 1 116.46us 116.46us 116.46us cudaLaunchKernel
==21650== Range "MaxPool: import/module_apply_default/InceptionV3/InceptionV3/MaxPool_5a_3x3/MaxPool"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 240.74us 1 240.74us 240.74us 240.74us MaxPool: import/module_apply_default/InceptionV3/InceptionV3/MaxPool_5a_3x3/MaxPool
GPU activities: 100.00% 762.26us 1 762.26us 762.26us 762.26us void cudnn::detail::pooling_fw_4d_kernel<float, float, cudnn::detail::maxpooling_func<float, cudnnNanPropagation_t=0>, int=0, bool=0>(cudnnTensorStruct, float const *, cudnn::detail::pooling_fw_4d_kernel<float, float, cudnn::detail::maxpooling_func<float, cudnnNanPropagation_t=0>, int=0, bool=0>, cudnnTensorStruct*, cudnnPoolingStruct, float, cudnnPoolingStruct, int, cudnn::reduced_divisor, float)
API calls: 100.00% 93.492us 1 93.492us 93.492us 93.492us cudaLaunchKernel
==21650== Range "MaxPool: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_2/MaxPool_1a_3x3/MaxPool"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 129.17us 1 129.17us 129.17us 129.17us MaxPool: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_2/MaxPool_1a_3x3/MaxPool
GPU activities: 100.00% 292.56us 1 292.56us 292.56us 292.56us void cudnn::detail::pooling_fw_4d_kernel<float, float, cudnn::detail::maxpooling_func<float, cudnnNanPropagation_t=0>, int=0, bool=0>(cudnnTensorStruct, float const *, cudnn::detail::pooling_fw_4d_kernel<float, float, cudnn::detail::maxpooling_func<float, cudnnNanPropagation_t=0>, int=0, bool=0>, cudnnTensorStruct*, cudnnPoolingStruct, float, cudnnPoolingStruct, int, cudnn::reduced_divisor, float)
API calls: 100.00% 63.387us 1 63.387us 63.387us 63.387us cudaLaunchKernel
==21650== Range "MaxPool: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_2/MaxPool_1a_3x3/MaxPool"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 129.53us 1 129.53us 129.53us 129.53us MaxPool: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_2/MaxPool_1a_3x3/MaxPool
GPU activities: 100.00% 169.54us 1 169.54us 169.54us 169.54us void cudnn::detail::pooling_fw_4d_kernel<float, float, cudnn::detail::maxpooling_func<float, cudnnNanPropagation_t=0>, int=0, bool=0>(cudnnTensorStruct, float const *, cudnn::detail::pooling_fw_4d_kernel<float, float, cudnn::detail::maxpooling_func<float, cudnnNanPropagation_t=0>, int=0, bool=0>, cudnnTensorStruct*, cudnnPoolingStruct, float, cudnnPoolingStruct, int, cudnn::reduced_divisor, float)
API calls: 100.00% 63.334us 1 63.334us 63.334us 63.334us cudaLaunchKernel
==21650== Range "Mean: import/module_apply_default/InceptionV3/Logits/GlobalPool"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 138.40ms 1 138.40ms 138.40ms 138.40ms Mean: import/module_apply_default/InceptionV3/Logits/GlobalPool
GPU activities: 100.00% 83.336us 1 83.336us 83.336us 83.336us void tensorflow::functor::RowReduceKernel<float*, tensorflow::TransformOutputIterator<float, float, tensorflow::functor::DividesBy<float, float>, long>, tensorflow::functor::Sum<float>>(float*, float, int, int, float, std::iterator_traits<tensorflow::functor::RowReduceKernel<float*, tensorflow::TransformOutputIterator<float, float, tensorflow::functor::DividesBy<float, float>, long>, tensorflow::functor::Sum<float>>>::value_type)
API calls: 100.00% 422.41us 1 422.41us 422.41us 422.41us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 135.93ms 1 135.93ms 135.93ms 135.93ms Mul: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 353.55us 1 353.55us 353.55us 353.55us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 452.25us 1 452.25us 452.25us 452.25us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_2a_3x3/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 229.02us 1 229.02us 229.02us 229.02us Mul: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_2a_3x3/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 321.00us 1 321.00us 321.00us 321.00us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 121.15us 1 121.15us 121.15us 121.15us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_2b_3x3/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 232.71us 1 232.71us 232.71us 232.71us Mul: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_2b_3x3/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 620.43us 1 620.43us 620.43us 620.43us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 108.86us 1 108.86us 108.86us 108.86us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_3b_1x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 147.66us 1 147.66us 147.66us 147.66us Mul: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_3b_1x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 198.55us 1 198.55us 198.55us 198.55us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 98.336us 1 98.336us 98.336us 98.336us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_4a_3x3/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 167.14us 1 167.14us 167.14us 167.14us Mul: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_4a_3x3/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 428.66us 1 428.66us 428.66us 428.66us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 98.128us 1 98.128us 98.128us 98.128us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 129.12us 1 129.12us 129.12us 129.12us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 39.324us 1 39.324us 39.324us 39.324us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 103.96us 1 103.96us 103.96us 103.96us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 147.45us 1 147.45us 147.45us 147.45us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 28.491us 1 28.491us 28.491us 28.491us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 88.127us 1 88.127us 88.127us 88.127us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_1/Conv2d_0b_5x5/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 145.73us 1 145.73us 145.73us 145.73us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_1/Conv2d_0b_5x5/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 41.564us 1 41.564us 41.564us 41.564us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 67.658us 1 67.658us 67.658us 67.658us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 153.23us 1 153.23us 153.23us 153.23us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 38.647us 1 38.647us 38.647us 38.647us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 101.98us 1 101.98us 101.98us 101.98us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 161.15us 1 161.15us 161.15us 161.15us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 58.699us 1 58.699us 58.699us 58.699us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 100.58us 1 100.58us 100.58us 100.58us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0c_3x3/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 222.71us 1 222.71us 222.71us 222.71us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0c_3x3/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 54.480us 1 54.480us 54.480us 54.480us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 96.513us 1 96.513us 96.513us 96.513us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 120.16us 1 120.16us 120.16us 120.16us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 20.000us 1 20.000us 20.000us 20.000us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 93.388us 1 93.388us 93.388us 93.388us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 119.64us 1 119.64us 119.64us 119.64us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 39.429us 1 39.429us 39.429us 39.429us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 58.439us 1 58.439us 58.439us 58.439us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_1/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 115.21us 1 115.21us 115.21us 115.21us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_1/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 29.793us 1 29.793us 29.793us 29.793us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 74.481us 1 74.481us 74.481us 74.481us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_1/Conv_1_0c_5x5/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 78.960us 1 78.960us 78.960us 78.960us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_1/Conv_1_0c_5x5/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 37.813us 1 37.813us 37.813us 37.813us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 57.033us 1 57.033us 57.033us 57.033us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 963.25us 1 963.25us 963.25us 963.25us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 41.512us 1 41.512us 41.512us 41.512us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 894.03us 1 894.03us 894.03us 894.03us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 109.12us 1 109.12us 109.12us 109.12us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 53.230us 1 53.230us 53.230us 53.230us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 57.918us 1 57.918us 57.918us 57.918us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0c_3x3/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 76.304us 1 76.304us 76.304us 76.304us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0c_3x3/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 57.085us 1 57.085us 57.085us 57.085us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 53.699us 1 53.699us 53.699us 53.699us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 82.033us 1 82.033us 82.033us 82.033us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 37.605us 1 37.605us 37.605us 37.605us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 57.501us 1 57.501us 57.501us 57.501us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 117.14us 1 117.14us 117.14us 117.14us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 38.178us 1 38.178us 38.178us 38.178us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 55.679us 1 55.679us 55.679us 55.679us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 82.450us 1 82.450us 82.450us 82.450us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 29.011us 1 29.011us 29.011us 29.011us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 55.053us 1 55.053us 55.053us 55.053us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_1/Conv2d_0b_5x5/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 77.762us 1 77.762us 77.762us 77.762us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_1/Conv2d_0b_5x5/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 40.105us 1 40.105us 40.105us 40.105us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 54.064us 1 54.064us 54.064us 54.064us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 102.71us 1 102.71us 102.71us 102.71us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 38.803us 1 38.803us 38.803us 38.803us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 67.137us 1 67.137us 67.137us 67.137us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 85.731us 1 85.731us 85.731us 85.731us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 57.709us 1 57.709us 57.709us 57.709us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 59.585us 1 59.585us 59.585us 59.585us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0c_3x3/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 79.638us 1 79.638us 79.638us 79.638us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0c_3x3/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 51.512us 1 51.512us 51.512us 51.512us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 54.168us 1 54.168us 54.168us 54.168us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 86.148us 1 86.148us 86.148us 86.148us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 39.844us 1 39.844us 39.844us 39.844us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 62.033us 1 62.033us 62.033us 62.033us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_0/Conv2d_1a_1x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 127.03us 1 127.03us 127.03us 127.03us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_0/Conv2d_1a_1x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 58.856us 1 58.856us 58.856us 58.856us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 94.325us 1 94.325us 94.325us 94.325us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 198.60us 1 198.60us 198.60us 198.60us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 38.022us 1 38.022us 38.022us 38.022us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 111.36us 1 111.36us 111.36us 111.36us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 114.43us 1 114.43us 114.43us 114.43us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 55.575us 1 55.575us 55.575us 55.575us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 79.169us 1 79.169us 79.169us 79.169us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_1a_1x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 138.34us 1 138.34us 138.34us 138.34us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_1a_1x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 13.021us 1 13.021us 13.021us 13.021us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 80.471us 1 80.471us 80.471us 80.471us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 113.86us 1 113.86us 113.86us 113.86us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 30.626us 1 30.626us 30.626us 30.626us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 73.648us 1 73.648us 73.648us 73.648us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 251.88us 1 251.88us 251.88us 251.88us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 21.876us 1 21.876us 21.876us 21.876us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 85.836us 1 85.836us 85.836us 85.836us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0b_1x7/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 131.77us 1 131.77us 131.77us 131.77us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0b_1x7/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 21.980us 1 21.980us 21.980us 21.980us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 75.106us 1 75.106us 75.106us 75.106us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0c_7x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 99.065us 1 99.065us 99.065us 99.065us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0c_7x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 31.719us 1 31.719us 31.719us 31.719us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 63.387us 1 63.387us 63.387us 63.387us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 124.64us 1 124.64us 124.64us 124.64us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 16.823us 1 16.823us 16.823us 16.823us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 75.888us 1 75.888us 75.888us 75.888us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0b_7x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 144.01us 1 144.01us 144.01us 144.01us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0b_7x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 17.395us 1 17.395us 17.395us 17.395us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 83.700us 1 83.700us 83.700us 83.700us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0c_1x7/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 147.30us 1 147.30us 147.30us 147.30us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0c_1x7/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 17.032us 1 17.032us 17.032us 17.032us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 88.179us 1 88.179us 88.179us 88.179us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0d_7x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 146.83us 1 146.83us 146.83us 146.83us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0d_7x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 15.573us 1 15.573us 15.573us 15.573us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 95.679us 1 95.679us 95.679us 95.679us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0e_1x7/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 156.83us 1 156.83us 156.83us 156.83us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0e_1x7/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 25.313us 1 25.313us 25.313us 25.313us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 91.304us 1 91.304us 91.304us 91.304us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 95.732us 1 95.732us 95.732us 95.732us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 28.908us 1 28.908us 28.908us 28.908us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 64.897us 1 64.897us 64.897us 64.897us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 93.440us 1 93.440us 93.440us 93.440us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 28.231us 1 28.231us 28.231us 28.231us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 61.095us 1 61.095us 61.095us 61.095us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 114.12us 1 114.12us 114.12us 114.12us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 25.782us 1 25.782us 25.782us 25.782us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 88.023us 1 88.023us 88.023us 88.023us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0b_1x7/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 76.877us 1 76.877us 76.877us 76.877us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0b_1x7/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 25.053us 1 25.053us 25.053us 25.053us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 51.876us 1 51.876us 51.876us 51.876us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0c_7x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 120.06us 1 120.06us 120.06us 120.06us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0c_7x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 28.750us 1 28.750us 28.750us 28.750us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 64.220us 1 64.220us 64.220us 64.220us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 116.88us 1 116.88us 116.88us 116.88us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 22.135us 1 22.135us 22.135us 22.135us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 71.460us 1 71.460us 71.460us 71.460us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0b_7x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 133.65us 1 133.65us 133.65us 133.65us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0b_7x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 22.449us 1 22.449us 22.449us 22.449us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 79.950us 1 79.950us 79.950us 79.950us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0c_1x7/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 148.81us 1 148.81us 148.81us 148.81us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0c_1x7/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 19.741us 1 19.741us 19.741us 19.741us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 71.460us 1 71.460us 71.460us 71.460us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0d_7x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 88.544us 1 88.544us 88.544us 88.544us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0d_7x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 19.583us 1 19.583us 19.583us 19.583us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 60.002us 1 60.002us 60.002us 60.002us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0e_1x7/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 117.76us 1 117.76us 117.76us 117.76us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0e_1x7/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 25.990us 1 25.990us 25.990us 25.990us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 72.397us 1 72.397us 72.397us 72.397us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 86.356us 1 86.356us 86.356us 86.356us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 28.282us 1 28.282us 28.282us 28.282us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 62.762us 1 62.762us 62.762us 62.762us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 80.835us 1 80.835us 80.835us 80.835us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 28.699us 1 28.699us 28.699us 28.699us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 56.876us 1 56.876us 56.876us 56.876us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 78.179us 1 78.179us 78.179us 78.179us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 23.335us 1 23.335us 23.335us 23.335us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 56.147us 1 56.147us 56.147us 56.147us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0b_1x7/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 112.56us 1 112.56us 112.56us 112.56us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0b_1x7/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 24.324us 1 24.324us 24.324us 24.324us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 57.032us 1 57.032us 57.032us 57.032us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0c_7x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 75.575us 1 75.575us 75.575us 75.575us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0c_7x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 31.042us 1 31.042us 31.042us 31.042us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 53.386us 1 53.386us 53.386us 53.386us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 84.637us 1 84.637us 84.637us 84.637us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 19.532us 1 19.532us 19.532us 19.532us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 55.002us 1 55.002us 55.002us 55.002us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0b_7x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 89.690us 1 89.690us 89.690us 89.690us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0b_7x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 21.563us 1 21.563us 21.563us 21.563us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 63.075us 1 63.075us 63.075us 63.075us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0c_1x7/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 87.607us 1 87.607us 87.607us 87.607us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0c_1x7/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 19.741us 1 19.741us 19.741us 19.741us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 60.106us 1 60.106us 60.106us 60.106us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0d_7x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 78.543us 1 78.543us 78.543us 78.543us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0d_7x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 20.208us 1 20.208us 20.208us 20.208us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 56.251us 1 56.251us 56.251us 56.251us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0e_1x7/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 81.721us 1 81.721us 81.721us 81.721us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0e_1x7/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 24.740us 1 24.740us 24.740us 24.740us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 57.398us 1 57.398us 57.398us 57.398us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 71.304us 1 71.304us 71.304us 71.304us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 27.292us 1 27.292us 27.292us 27.292us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 52.189us 1 52.189us 52.189us 52.189us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 80.002us 1 80.002us 80.002us 80.002us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 29.480us 1 29.480us 29.480us 29.480us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 56.772us 1 56.772us 56.772us 56.772us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 75.887us 1 75.887us 75.887us 75.887us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 27.241us 1 27.241us 27.241us 27.241us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 53.386us 1 53.386us 53.386us 53.386us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0b_1x7/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 78.440us 1 78.440us 78.440us 78.440us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0b_1x7/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 30.210us 1 30.210us 30.210us 30.210us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 54.480us 1 54.480us 54.480us 54.480us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0c_7x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 72.658us 1 72.658us 72.658us 72.658us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0c_7x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 28.178us 1 28.178us 28.178us 28.178us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 52.032us 1 52.032us 52.032us 52.032us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 87.034us 1 87.034us 87.034us 87.034us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 24.116us 1 24.116us 24.116us 24.116us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 58.074us 1 58.074us 58.074us 58.074us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0b_7x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 111.20us 1 111.20us 111.20us 111.20us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0b_7x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 24.687us 1 24.687us 24.687us 24.687us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 70.262us 1 70.262us 70.262us 70.262us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0c_1x7/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 78.231us 1 78.231us 78.231us 78.231us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0c_1x7/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 24.428us 1 24.428us 24.428us 24.428us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 55.575us 1 55.575us 55.575us 55.575us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0d_7x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 81.773us 1 81.773us 81.773us 81.773us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0d_7x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 23.543us 1 23.543us 23.543us 23.543us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 56.356us 1 56.356us 56.356us 56.356us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0e_1x7/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 79.585us 1 79.585us 79.585us 79.585us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0e_1x7/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 22.449us 1 22.449us 22.449us 22.449us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 58.856us 1 58.856us 58.856us 58.856us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 73.647us 1 73.647us 73.647us 73.647us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 29.272us 1 29.272us 29.272us 29.272us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 53.543us 1 53.543us 53.543us 53.543us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 77.294us 1 77.294us 77.294us 77.294us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 29.637us 1 29.637us 29.637us 29.637us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 51.512us 1 51.512us 51.512us 51.512us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_0/Conv2d_1a_3x3/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 86.981us 1 86.981us 86.981us 86.981us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_0/Conv2d_1a_3x3/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 11.824us 1 11.824us 11.824us 11.824us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 60.106us 1 60.106us 60.106us 60.106us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 87.867us 1 87.867us 87.867us 87.867us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 24.480us 1 24.480us 24.480us 24.480us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 60.418us 1 60.418us 60.418us 60.418us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0b_1x7/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 115.42us 1 115.42us 115.42us 115.42us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0b_1x7/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 26.041us 1 26.041us 26.041us 26.041us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 72.033us 1 72.033us 72.033us 72.033us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0c_7x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 84.220us 1 84.220us 84.220us 84.220us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0c_7x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 24.324us 1 24.324us 24.324us 24.324us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 58.908us 1 58.908us 58.908us 58.908us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_1a_3x3/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 263.60us 1 263.60us 263.60us 263.60us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_1a_3x3/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 7.2920us 1 7.2920us 7.2920us 7.2920us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 87.189us 1 87.189us 87.189us 87.189us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 79.168us 1 79.168us 79.168us 79.168us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 12.501us 1 12.501us 12.501us 12.501us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 54.324us 1 54.324us 54.324us 54.324us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 73.283us 1 73.283us 73.283us 73.283us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 13.437us 1 13.437us 13.437us 13.437us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 51.772us 1 51.772us 51.772us 51.772us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_1/Conv2d_0b_1x3/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 89.013us 1 89.013us 89.013us 89.013us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_1/Conv2d_0b_1x3/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 12.709us 1 12.709us 12.709us 12.709us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 56.512us 1 56.512us 56.512us 56.512us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_1/Conv2d_0b_3x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 87.034us 1 87.034us 87.034us 87.034us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_1/Conv2d_0b_3x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 12.136us 1 12.136us 12.136us 12.136us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 62.761us 1 62.761us 62.761us 62.761us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 102.61us 1 102.61us 102.61us 102.61us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 13.593us 1 13.593us 13.593us 13.593us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 66.772us 1 66.772us 66.772us 66.772us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 127.03us 1 127.03us 127.03us 127.03us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 11.198us 1 11.198us 11.198us 11.198us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 87.919us 1 87.919us 87.919us 87.919us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0c_1x3/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 119.53us 1 119.53us 119.53us 119.53us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0c_1x3/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 10.625us 1 10.625us 10.625us 10.625us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 58.282us 1 58.282us 58.282us 58.282us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0d_3x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 82.763us 1 82.763us 82.763us 82.763us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0d_3x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 11.459us 1 11.459us 11.459us 11.459us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 56.511us 1 56.511us 56.511us 56.511us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 82.294us 1 82.294us 82.294us 82.294us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 7.2920us 1 7.2920us 7.2920us 7.2920us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 59.064us 1 59.064us 59.064us 59.064us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 70.574us 1 70.574us 70.574us 70.574us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 11.563us 1 11.563us 11.563us 11.563us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 48.855us 1 48.855us 48.855us 48.855us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 82.293us 1 82.293us 82.293us 82.293us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_1/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 13.073us 1 13.073us 13.073us 13.073us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 58.647us 1 58.647us 58.647us 58.647us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_1/Conv2d_0b_1x3/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 78.700us 1 78.700us 78.700us 78.700us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_1/Conv2d_0b_1x3/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 13.229us 1 13.229us 13.229us 13.229us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 55.783us 1 55.783us 55.783us 55.783us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_1/Conv2d_0c_3x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 86.409us 1 86.409us 86.409us 86.409us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_1/Conv2d_0c_3x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 11.198us 1 11.198us 11.198us 11.198us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 61.564us 1 61.564us 61.564us 61.564us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 110.32us 1 110.32us 110.32us 110.32us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 14.376us 1 14.376us 14.376us 14.376us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 69.272us 1 69.272us 69.272us 69.272us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 93.231us 1 93.231us 93.231us 93.231us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0b_3x3/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 11.199us 1 11.199us 11.199us 11.199us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 62.971us 1 62.971us 62.971us 62.971us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0c_1x3/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 1.0482ms 1 1.0482ms 1.0482ms 1.0482ms Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0c_1x3/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 10.521us 1 10.521us 10.521us 10.521us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 1.0183ms 1 1.0183ms 1.0183ms 1.0183ms cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0d_3x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 98.440us 1 98.440us 98.440us 98.440us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0d_3x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 11.147us 1 11.147us 11.147us 11.147us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 64.064us 1 64.064us 64.064us 64.064us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 76.721us 1 76.721us 76.721us 76.721us Mul: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNorm/Mul
GPU activities: 100.00% 7.3450us 1 7.3450us 7.3450us 7.3450us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=2, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const , Eigen::TensorBroadcastingOp<Eigen::array<long, unsigned long=2> const , Eigen::TensorMap<Eigen::Tensor<float const , int=2, int=1, int>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, int>(float, int=2)
API calls: 100.00% 53.908us 1 53.908us 53.908us 53.908us cudaLaunchKernel
==21650== Range "Mul: import/module_apply_default/hub_input/Mul"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 348.19us 1 348.19us 348.19us 348.19us Mul: import/module_apply_default/hub_input/Mul
GPU activities: 100.00% 331.47us 1 331.47us 331.47us 331.47us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseUnaryOp<Eigen::internal::scalar_right<float, float, Eigen::internal::scalar_product_op<float, float>>, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, int>, int=16, Eigen::MakePointer> const > const > const , Eigen::GpuDevice>, int>(float, int=1)
API calls: 100.00% 159.12us 1 159.12us 159.12us 159.12us cudaLaunchKernel
==21650== Range "Mul: truediv"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 176.05us 1 176.05us 176.05us 176.05us Mul: truediv
GPU activities: 100.00% 329.80us 1 329.80us 329.80us 329.80us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseUnaryOp<Eigen::internal::scalar_right<float, float, Eigen::internal::scalar_product_op<float, float>>, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, int>, int=16, Eigen::MakePointer> const > const > const , Eigen::GpuDevice>, int>(float, int=1)
API calls: 100.00% 64.741us 1 64.741us 64.741us 64.741us cudaLaunchKernel
==21650== Range "NoOp: _SOURCE"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 774.71us 2 387.35us 23.855us 750.85us NoOp: _SOURCE
GPU activities: 100.00% 6.0430us 1 6.0430us 6.0430us 6.0430us [CUDA memset]
API calls: 100.00% 86.512us 1 86.512us 86.512us 86.512us cuMemsetD32
==21650== Range "PlaceholderWithDefault: import/input/BottleneckInputPlaceholder"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 23.212ms 1 23.212ms 23.212ms 23.212ms PlaceholderWithDefault: import/input/BottleneckInputPlaceholder
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_1a_3x3/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 66.420ms 1 66.420ms 66.420ms 66.420ms Relu: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_1a_3x3/Relu
GPU activities: 100.00% 328.19us 1 328.19us 328.19us 328.19us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 367.14us 1 367.14us 367.14us 367.14us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_2a_3x3/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 102.14us 1 102.14us 102.14us 102.14us Relu: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_2a_3x3/Relu
GPU activities: 100.00% 299.90us 1 299.90us 299.90us 299.90us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 67.762us 1 67.762us 67.762us 67.762us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_2b_3x3/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 85.211us 1 85.211us 85.211us 85.211us Relu: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_2b_3x3/Relu
GPU activities: 100.00% 598.25us 1 598.25us 598.25us 598.25us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 64.846us 1 64.846us 64.846us 64.846us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_3b_1x1/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 86.356us 1 86.356us 86.356us 86.356us Relu: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_3b_1x1/Relu
GPU activities: 100.00% 189.69us 1 189.69us 189.69us 189.69us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 64.064us 1 64.064us 64.064us 64.064us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_4a_3x3/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 93.648us 1 93.648us 93.648us 93.648us Relu: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_4a_3x3/Relu
GPU activities: 100.00% 420.43us 1 420.43us 420.43us 420.43us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 69.377us 1 69.377us 69.377us 69.377us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_0/Conv2d_0a_1x1/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 251.05us 1 251.05us 251.05us 251.05us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_0/Conv2d_0a_1x1/Relu
GPU activities: 100.00% 140.37us 1 140.37us 140.37us 140.37us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 155.00us 1 155.00us 155.00us 155.00us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_1/Conv2d_0a_1x1/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 94.690us 1 94.690us 94.690us 94.690us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_1/Conv2d_0a_1x1/Relu
GPU activities: 100.00% 26.458us 1 26.458us 26.458us 26.458us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 75.783us 1 75.783us 75.783us 75.783us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0a_1x1/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 113.28us 1 113.28us 113.28us 113.28us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0a_1x1/Relu
GPU activities: 100.00% 35.834us 1 35.834us 35.834us 35.834us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 89.794us 1 89.794us 89.794us 89.794us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0b_3x3/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 145.16us 1 145.16us 145.16us 145.16us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0b_3x3/Relu
GPU activities: 100.00% 54.898us 1 54.898us 54.898us 54.898us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 112.97us 1 112.97us 112.97us 112.97us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_0/Conv2d_0a_1x1/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 80.315us 1 80.315us 80.315us 80.315us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_0/Conv2d_0a_1x1/Relu
GPU activities: 100.00% 157.82us 1 157.82us 157.82us 157.82us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 56.355us 1 56.355us 56.355us 56.355us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_1/Conv2d_0b_1x1/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 79.116us 1 79.116us 79.116us 79.116us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_1/Conv2d_0b_1x1/Relu
GPU activities: 100.00% 27.084us 1 27.084us 27.084us 27.084us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 56.355us 1 56.355us 56.355us 56.355us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0a_1x1/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 96.565us 1 96.565us 96.565us 96.565us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0a_1x1/Relu
GPU activities: 100.00% 36.980us 1 36.980us 36.980us 36.980us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 68.543us 1 68.543us 68.543us 68.543us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0b_3x3/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 75.835us 1 75.835us 75.835us 75.835us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0b_3x3/Relu
GPU activities: 100.00% 53.961us 1 53.961us 53.961us 53.961us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 56.824us 1 56.824us 56.824us 56.824us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_0/Conv2d_0a_1x1/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 79.845us 1 79.845us 79.845us 79.845us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_0/Conv2d_0a_1x1/Relu
GPU activities: 100.00% 152.14us 1 152.14us 152.14us 152.14us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 59.585us 1 59.585us 59.585us 59.585us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_1/Conv2d_0a_1x1/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 77.554us 1 77.554us 77.554us 77.554us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_1/Conv2d_0a_1x1/Relu
GPU activities: 100.00% 26.718us 1 26.718us 26.718us 26.718us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 54.741us 1 54.741us 54.741us 54.741us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0a_1x1/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 81.982us 1 81.982us 81.982us 81.982us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0a_1x1/Relu
GPU activities: 100.00% 35.574us 1 35.574us 35.574us 35.574us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 56.668us 1 56.668us 56.668us 56.668us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0b_3x3/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 95.679us 1 95.679us 95.679us 95.679us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0b_3x3/Relu
GPU activities: 100.00% 53.907us 1 53.907us 53.907us 53.907us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 74.898us 1 74.898us 74.898us 74.898us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_0/Conv2d_1a_1x1/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 81.617us 1 81.617us 81.617us 81.617us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_0/Conv2d_1a_1x1/Relu
GPU activities: 100.00% 50.939us 1 50.939us 50.939us 50.939us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 62.918us 1 62.918us 62.918us 62.918us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_0a_1x1/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 95.367us 1 95.367us 95.367us 95.367us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_0a_1x1/Relu
GPU activities: 100.00% 36.615us 1 36.615us 36.615us 36.615us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 62.814us 1 62.814us 62.814us 62.814us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_0b_3x3/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 79.586us 1 79.586us 79.586us 79.586us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_0b_3x3/Relu
GPU activities: 100.00% 53.127us 1 53.127us 53.127us 53.127us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 59.429us 1 59.429us 59.429us 59.429us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_1a_1x1/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 117.29us 1 117.29us 117.29us 117.29us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_1a_1x1/Relu
GPU activities: 100.00% 7.3960us 1 7.3960us 7.3960us 7.3960us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 69.169us 1 69.169us 69.169us 69.169us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_0/Conv2d_0a_1x1/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 110.06us 1 110.06us 110.06us 110.06us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_0/Conv2d_0a_1x1/Relu
GPU activities: 100.00% 107.09us 1 107.09us 107.09us 107.09us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 72.658us 1 72.658us 72.658us 72.658us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0a_1x1/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 91.201us 1 91.201us 91.201us 91.201us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0a_1x1/Relu
GPU activities: 100.00% 17.344us 1 17.344us 17.344us 17.344us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 65.626us 1 65.626us 65.626us 65.626us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0b_1x7/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 80.731us 1 80.731us 80.731us 80.731us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0b_1x7/Relu
GPU activities: 100.00% 16.719us 1 16.719us 16.719us 16.719us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 59.324us 1 59.324us 59.324us 59.324us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0a_1x1/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 199.90us 1 199.90us 199.90us 199.90us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0a_1x1/Relu
GPU activities: 100.00% 18.750us 1 18.750us 18.750us 18.750us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 161.83us 1 161.83us 161.83us 161.83us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0b_7x1/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 87.554us 1 87.554us 87.554us 87.554us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0b_7x1/Relu
GPU activities: 100.00% 18.907us 1 18.907us 18.907us 18.907us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 62.554us 1 62.554us 62.554us 62.554us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0c_1x7/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 94.899us 1 94.899us 94.899us 94.899us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0c_1x7/Relu
GPU activities: 100.00% 18.751us 1 18.751us 18.751us 18.751us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 67.241us 1 67.241us 67.241us 67.241us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0d_7x1/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 102.56us 1 102.56us 102.56us 102.56us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0d_7x1/Relu
GPU activities: 100.00% 8.7500us 1 8.7500us 8.7500us 8.7500us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 75.106us 1 75.106us 75.106us 75.106us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_0/Conv2d_0a_1x1/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 88.596us 1 88.596us 88.596us 88.596us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_0/Conv2d_0a_1x1/Relu
GPU activities: 100.00% 99.170us 1 99.170us 99.170us 99.170us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 61.044us 1 61.044us 61.044us 61.044us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0a_1x1/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 77.710us 1 77.710us 77.710us 77.710us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0a_1x1/Relu
GPU activities: 100.00% 19.949us 1 19.949us 19.949us 19.949us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 58.022us 1 58.022us 58.022us 58.022us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0b_1x7/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 72.553us 1 72.553us 72.553us 72.553us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0b_1x7/Relu
GPU activities: 100.00% 19.583us 1 19.583us 19.583us 19.583us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 52.970us 1 52.970us 52.970us 52.970us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0a_1x1/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 95.471us 1 95.471us 95.471us 95.471us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0a_1x1/Relu
GPU activities: 100.00% 22.552us 1 22.552us 22.552us 22.552us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 71.199us 1 71.199us 71.199us 71.199us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0b_7x1/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 82.554us 1 82.554us 82.554us 82.554us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0b_7x1/Relu
GPU activities: 100.00% 21.459us 1 21.459us 21.459us 21.459us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 62.710us 1 62.710us 62.710us 62.710us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0c_1x7/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 79.429us 1 79.429us 79.429us 79.429us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0c_1x7/Relu
GPU activities: 100.00% 22.813us 1 22.813us 22.813us 22.813us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 57.137us 1 57.137us 57.137us 57.137us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0d_7x1/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 79.168us 1 79.168us 79.168us 79.168us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0d_7x1/Relu
GPU activities: 100.00% 10.625us 1 10.625us 10.625us 10.625us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 57.970us 1 57.970us 57.970us 57.970us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_0/Conv2d_0a_1x1/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 94.586us 1 94.586us 94.586us 94.586us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_0/Conv2d_0a_1x1/Relu
GPU activities: 100.00% 98.283us 1 98.283us 98.283us 98.283us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 66.669us 1 66.669us 66.669us 66.669us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0a_1x1/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 73.283us 1 73.283us 73.283us 73.283us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0a_1x1/Relu
GPU activities: 100.00% 20.782us 1 20.782us 20.782us 20.782us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 53.908us 1 53.908us 53.908us 53.908us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0b_1x7/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 70.366us 1 70.366us 70.366us 70.366us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0b_1x7/Relu
GPU activities: 100.00% 20.365us 1 20.365us 20.365us 20.365us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 53.960us 1 53.960us 53.960us 53.960us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0a_1x1/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 68.387us 1 68.387us 68.387us 68.387us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0a_1x1/Relu
GPU activities: 100.00% 22.500us 1 22.500us 22.500us 22.500us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 50.938us 1 50.938us 50.938us 50.938us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0b_7x1/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 77.242us 1 77.242us 77.242us 77.242us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0b_7x1/Relu
GPU activities: 100.00% 22.085us 1 22.085us 22.085us 22.085us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 57.241us 1 57.241us 57.241us 57.241us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0c_1x7/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 74.637us 1 74.637us 74.637us 74.637us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0c_1x7/Relu
GPU activities: 100.00% 22.866us 1 22.866us 22.866us 22.866us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 55.991us 1 55.991us 55.991us 55.991us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0d_7x1/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 70.523us 1 70.523us 70.523us 70.523us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0d_7x1/Relu
GPU activities: 100.00% 10.364us 1 10.364us 10.364us 10.364us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 51.147us 1 51.147us 51.147us 51.147us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_0/Conv2d_0a_1x1/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 74.897us 1 74.897us 74.897us 74.897us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_0/Conv2d_0a_1x1/Relu
GPU activities: 100.00% 96.618us 1 96.618us 96.618us 96.618us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 51.928us 1 51.928us 51.928us 51.928us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0a_1x1/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 72.033us 1 72.033us 72.033us 72.033us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0a_1x1/Relu
GPU activities: 100.00% 25.052us 1 25.052us 25.052us 25.052us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 55.158us 1 55.158us 55.158us 55.158us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0b_1x7/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 65.262us 1 65.262us 65.262us 65.262us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0b_1x7/Relu
GPU activities: 100.00% 25.104us 1 25.104us 25.104us 25.104us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 47.814us 1 47.814us 47.814us 47.814us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0a_1x1/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 64.533us 1 64.533us 64.533us 64.533us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0a_1x1/Relu
GPU activities: 100.00% 24.793us 1 24.793us 24.793us 24.793us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 47.450us 1 47.450us 47.450us 47.450us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0b_7x1/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 74.742us 1 74.742us 74.742us 74.742us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0b_7x1/Relu
GPU activities: 100.00% 26.407us 1 26.407us 26.407us 26.407us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 54.585us 1 54.585us 54.585us 54.585us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0c_1x7/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 87.710us 1 87.710us 87.710us 87.710us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0c_1x7/Relu
GPU activities: 100.00% 26.146us 1 26.146us 26.146us 26.146us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 65.366us 1 65.366us 65.366us 65.366us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0d_7x1/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 78.909us 1 78.909us 78.909us 78.909us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0d_7x1/Relu
GPU activities: 100.00% 12.918us 1 12.918us 12.918us 12.918us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 58.595us 1 58.595us 58.595us 58.595us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_0/Conv2d_0a_1x1/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 54.949us 1 54.949us 54.949us 54.949us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_0/Conv2d_0a_1x1/Relu
GPU activities: 100.00% 24.168us 1 24.168us 24.168us 24.168us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 41.251us 1 41.251us 41.251us 41.251us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_0/Conv2d_1a_3x3/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 68.596us 1 68.596us 68.596us 68.596us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_0/Conv2d_1a_3x3/Relu
GPU activities: 100.00% 9.8970us 1 9.8970us 9.8970us 9.8970us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 50.626us 1 50.626us 50.626us 50.626us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0a_1x1/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 68.335us 1 68.335us 68.335us 68.335us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0a_1x1/Relu
GPU activities: 100.00% 25.834us 1 25.834us 25.834us 25.834us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 51.460us 1 51.460us 51.460us 51.460us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0b_1x7/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 78.596us 1 78.596us 78.596us 78.596us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0b_1x7/Relu
GPU activities: 100.00% 23.386us 1 23.386us 23.386us 23.386us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 57.241us 1 57.241us 57.241us 57.241us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0c_7x1/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 87.815us 1 87.815us 87.815us 87.815us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0c_7x1/Relu
GPU activities: 100.00% 13.333us 1 13.333us 13.333us 13.333us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 48.908us 1 48.908us 48.908us 48.908us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_1a_3x3/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 85.419us 1 85.419us 85.419us 85.419us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_1a_3x3/Relu
GPU activities: 100.00% 4.4790us 1 4.4790us 4.4790us 4.4790us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 63.491us 1 63.491us 63.491us 63.491us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_0/Conv2d_0a_1x1/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 74.690us 1 74.690us 74.690us 74.690us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_0/Conv2d_0a_1x1/Relu
GPU activities: 100.00% 58.127us 1 58.127us 58.127us 58.127us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 52.866us 1 52.866us 52.866us 52.866us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_1/Conv2d_0a_1x1/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 181.88us 1 181.88us 181.88us 181.88us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_1/Conv2d_0a_1x1/Relu
GPU activities: 100.00% 11.718us 1 11.718us 11.718us 11.718us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 153.49us 1 153.49us 153.49us 153.49us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0a_1x1/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 102.19us 1 102.19us 102.19us 102.19us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0a_1x1/Relu
GPU activities: 100.00% 16.146us 1 16.146us 16.146us 16.146us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 78.179us 1 78.179us 78.179us 78.179us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0b_3x3/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 83.805us 1 83.805us 83.805us 83.805us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0b_3x3/Relu
GPU activities: 100.00% 12.658us 1 12.658us 12.658us 12.658us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 63.283us 1 63.283us 63.283us 63.283us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_0/Conv2d_0a_1x1/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 84.221us 1 84.221us 84.221us 84.221us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_0/Conv2d_0a_1x1/Relu
GPU activities: 100.00% 56.720us 1 56.720us 56.720us 56.720us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 60.835us 1 60.835us 60.835us 60.835us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_1/Conv2d_0a_1x1/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 75.627us 1 75.627us 75.627us 75.627us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_1/Conv2d_0a_1x1/Relu
GPU activities: 100.00% 11.615us 1 11.615us 11.615us 11.615us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 56.460us 1 56.460us 56.460us 56.460us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0a_1x1/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 84.325us 1 84.325us 84.325us 84.325us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0a_1x1/Relu
GPU activities: 100.00% 14.793us 1 14.793us 14.793us 14.793us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 58.855us 1 58.855us 58.855us 58.855us cudaLaunchKernel
==21650== Range "Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0b_3x3/Relu"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 70.210us 1 70.210us 70.210us 70.210us Relu: import/module_apply_default/InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0b_3x3/Relu
GPU activities: 100.00% 12.343us 1 12.343us 12.343us 12.343us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, long>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const , float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const , Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const >, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, long>, int=16, Eigen::MakePointer> const > const > const > const , Eigen::GpuDevice>, long>(float, int=1)
API calls: 100.00% 51.980us 1 51.980us 51.980us 51.980us cudaLaunchKernel
==21650== Range "ResizeBilinear: ResizeBilinear"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 194.95us 1 194.95us 194.95us 194.95us ResizeBilinear: ResizeBilinear
GPU activities: 100.00% 2.9790ms 1 2.9790ms 2.9790ms 2.9790ms void tensorflow::_GLOBAL__N__71_tmpxft_0000409a_00000000_8_resize_bilinear_op_gpu_cu_compute_72_cpp1_ii_f402459c::ResizeBilinearKernel<float>(int, float const *, float, float, int, int, int, int, int, int, float*)
API calls: 100.00% 72.241us 1 72.241us 72.241us 72.241us cudaLaunchKernel
==21650== Range "Softmax: import/final_result"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 27.149ms 1 27.149ms 27.149ms 27.149ms Softmax: import/final_result
GPU activities: 38.46% 3.9060us 1 3.9060us 3.9060us 3.9060us void tensorflow::functor::RowReduceKernel<cub::TransformInputIterator<float, tensorflow::_GLOBAL__N__63_tmpxft_00002a79_00000000_8_softmax_op_gpu_cu_compute_72_cpp1_ii_c381a214::SubtractAndExpFunctor<float, float>, cub::CountingInputIterator<int, long>, long>, float*, cub::Sum>(float, float, int, int, float, std::iterator_traits<tensorflow::functor::RowReduceKernel<cub::TransformInputIterator<float, tensorflow::_GLOBAL__N__63_tmpxft_00002a79_00000000_8_softmax_op_gpu_cu_compute_72_cpp1_ii_c381a214::SubtractAndExpFunctor<float, float>, cub::CountingInputIterator<int, long>, long>, float*, cub::Sum>>::value_type)
34.36% 3.4900us 1 3.4900us 3.4900us 3.4900us void tensorflow::_GLOBAL__N__63_tmpxft_00002a79_00000000_8_softmax_op_gpu_cu_compute_72_cpp1_ii_c381a214::GenerateNormalizedProb<float, float>(float const *, float const *, float const , tensorflow::_GLOBAL__N__63_tmpxft_00002a79_00000000_8_softmax_op_gpu_cu_compute_72_cpp1_ii_c381a214::GenerateNormalizedProb<float, float>*, int, int, bool)
27.18% 2.7610us 1 2.7610us 2.7610us 2.7610us void tensorflow::functor::RowReduceKernel<float const *, float*, cub::Max>(float const *, float*, int, int, cub::Max, std::iterator_traits<tensorflow::functor::RowReduceKernel<float const *, float*, cub::Max>>::value_type)
API calls: 100.00% 1.2818ms 3 427.27us 245.58us 604.18us cudaLaunchKernel
==21650== Range "Squeeze: import/module_apply_default/hub_output/feature_vector/SpatialSqueeze"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 137.00ms 1 137.00ms 137.00ms 137.00ms Squeeze: import/module_apply_default/hub_output/feature_vector/SpatialSqueeze
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Sub: import/module_apply_default/hub_input/Sub"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 20.693ms 1 20.693ms 20.693ms 20.693ms Sub: import/module_apply_default/hub_input/Sub
GPU activities: 100.00% 335.32us 1 335.32us 335.32us 335.32us void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, int=1, int=1, int>, int=16, Eigen::MakePointer>, Eigen::TensorCwiseUnaryOp<Eigen::internal::scalar_right<float, float, Eigen::internal::scalar_difference_op<float, float>>, Eigen::TensorMap<Eigen::Tensor<float const , int=1, int=1, int>, int=16, Eigen::MakePointer> const > const > const , Eigen::GpuDevice>, int>(float, int=1)
API calls: 100.00% 120.47us 1 120.47us 120.47us 120.47us cudaLaunchKernel
==21650== Range "Transpose: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_1a_3x3/Conv2D-0-TransposeNHWCToNCHW-LayoutOptimizer"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 837.36us 1 837.36us 837.36us 837.36us Transpose: import/module_apply_default/InceptionV3/InceptionV3/Conv2d_1a_3x3/Conv2D-0-TransposeNHWCToNCHW-LayoutOptimizer
GPU activities: 100.00% 2.7086ms 1 2.7086ms 2.7086ms 2.7086ms void tensorflow::functor::SwapDimension1And2InTensor3UsingTiles<unsigned int, int=1024, int=1024, int=2, bool=0>(unsigned int const *, tensorflow::functor::Dimension<int=3>, tensorflow::functor::SwapDimension1And2InTensor3UsingTiles<unsigned int, int=1024, int=1024, int=2, bool=0>*)
API calls: 100.00% 82.711us 1 82.711us 82.711us 82.711us cudaLaunchKernel
==21650== Range "_Recv: jpeg_reader/_1"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 1.1283ms 1 1.1283ms 1.1283ms 1.1283ms _Recv: jpeg_reader/_1
GPU activities: 100.00% 433.35us 1 433.35us 433.35us 433.35us [CUDA memcpy HtoD]
API calls: 100.00% 424.33us 1 424.33us 424.33us 424.33us cuMemcpyHtoDAsync
==21650== Range "_Send: import/final_result/_2"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 205.25ms 1 205.25ms 205.25ms 205.25ms _Send: import/final_result/_2
GPU activities: 100.00% 2.0310us 1 2.0310us 2.0310us 2.0310us [CUDA memcpy DtoH]
API calls: 100.00% 43.055ms 1 43.055ms 43.055ms 43.055ms cuMemcpyDtoHAsync
==21650== Range "_Send: jpeg_reader/_0"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 45.730us 1 45.730us 45.730us 45.730us _Send: jpeg_reader/_0
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "_Send: truediv/_2"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 1.9536ms 1 1.9536ms 1.9536ms 1.9536ms _Send: truediv/_2
GPU activities: 100.00% 505.95us 1 505.95us 505.95us 505.95us [CUDA memcpy DtoH]
API calls: 100.00% 66.512us 1 66.512us 66.512us 66.512us cuMemcpyDtoHAsync
==21650== Thread "<unnamed>" (id = 4080181744)
==21650== Domain "<unnamed>"
==21650== Range "_Recv: _arg_import/Placeholder_0_0/_1"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 851.27us 1 851.27us 851.27us 851.27us _Recv: _arg_import/Placeholder_0_0/_1
GPU activities: 100.00% 502.83us 1 502.83us 502.83us 502.83us [CUDA memcpy HtoD]
API calls: 100.00% 97.086us 1 97.086us 97.086us 97.086us cuMemcpyHtoDAsync
==21650== Range "_Recv: truediv/_3"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 6.5718ms 1 6.5718ms 6.5718ms 6.5718ms _Recv: truediv/_3
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "_Retval: _retval_import/final_result_0_0"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 94.433ms 1 94.433ms 94.433ms 94.433ms _Retval: _retval_import/final_result_0_0
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Thread "<unnamed>" (id = 4126069232)
==21650== Domain "<unnamed>"
==21650== Range "Const: ConstantFolding/truediv_recip"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 11.406us 1 11.406us 11.406us 11.406us Const: ConstantFolding/truediv_recip
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: ExpandDims/dim"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 4.4790us 1 4.4790us 4.4790us 4.4790us Const: ExpandDims/dim
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "Const: ResizeBilinear/size"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 5.3650us 1 5.3650us 5.3650us 5.3650us Const: ResizeBilinear/size
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "NoOp: _SOURCE"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 355.63us 2 177.82us 34.011us 321.62us NoOp: _SOURCE
GPU activities: 100.00% 2.9170us 1 2.9170us 2.9170us 2.9170us [CUDA memset]
API calls: 100.00% 91.304us 1 91.304us 91.304us 91.304us cuMemsetD32
==21650== Range "_Arg: _arg_import/Placeholder_0_0"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 12.240us 1 12.240us 12.240us 12.240us _Arg: _arg_import/Placeholder_0_0
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "_Recv: import/final_result/_3"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 26.2725s 1 26.2725s 26.2725s 26.2725s _Recv: import/final_result/_3
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "_Retval: _retval_truediv_0_0"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 11.198us 1 11.198us 11.198us 11.198us _Retval: _retval_truediv_0_0
No kernels were profiled in this range.
No API activities were profiled in this range.
==21650== Range "_Send: _arg_import/Placeholder_0_0/_0"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 15.105us 1 15.105us 15.105us 15.105us _Send: _arg_import/Placeholder_0_0/_0
No kernels were profiled in this range.
No API activities were profiled in this range.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment