Skip to content

Instantly share code, notes, and snippets.

@gokul-uf
Last active March 24, 2016 02:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save gokul-uf/52c121b7f0b38ffd2b44 to your computer and use it in GitHub Desktop.
Save gokul-uf/52c121b7f0b38ffd2b44 to your computer and use it in GitHub Desktop.
Profiler output for theano GSoC proposal
/home/gokul/Theano/theano/compile/pfunc.py:479: UserWarning: config.profile_optimizer requires config.profile to be set to True as well
output_keys=output_keys)
Using gpu device 0: GeForce GT 750M (CNMeM is disabled, CuDNN 4004)
/home/gokul/Theano/theano/tensor/signal/downsample.py:6: UserWarning: downsample module has been moved to the theano.tensor.signal.pool module.
"downsample module has been moved to the theano.tensor.signal.pool module.")
Function profiling
==================
Message: lasagne_googlenet.py:132
Time in 0 calls to Function.__call__: 0.000000e+00s
Total compile time: 3.520045e+02s
Number of Apply nodes: 1788
Theano Optimizer time: 4.472619e+01s
Theano validate time: 4.449729e+00s
Theano Linker time (includes C, CUDA code generation/compiling): 3.054581e+02s
Import time 1.817276e-01s
Time in all call to theano.grad() 7.898371e-01s
Time since theano import 370.526s
Optimizer Profile
-----------------
SeqOptimizer OPT_FAST_RUN time 44.726s for 6324/1788 nodes before/after optimization
13.111s for callback
4.450s for fgraph.validate()
time - (name, class, index) - validate time
15.334199s - ('canonicalize', 'EquilibriumOptimizer', 4) - 0.226s
EquilibriumOptimizer canonicalize
time 15.334s for 4 passes
nb nodes (start, end, max) 4784 3333 4784
time io_toposort 0.155s
time in local optimizers 7.065s
time in global optimizers 0.000s
time in final optimizers 4.223s
time in cleanup optimizers 3.596s
0 - 12.757s 4318 (4.007s in global opts, 0.047s io_toposort) - 4784 nodes - ('MergeOptimizer', 1835) ('local_upcast_elemwise_constant_inputs', 904) ('local_add_canonizer', 661) ('local_subtensor_merge', 194) ('local_cut_gpu_host_gpu', 163) ...
1 - 1.435s 1005 (0.144s in global opts, 0.043s io_toposort) - 4271 nodes - ('local_subtensor_make_vector', 427) ('MergeOptimizer', 191) ('local_mul_canonizer', 137) ('local_intdiv_by_one', 130) ('local_add_canonizer', 80) ...
2 - 0.783s 276 (0.037s in global opts, 0.036s io_toposort) - 3437 nodes - ('local_add_canonizer', 138) ('MergeOptimizer', 129) ('local_cut_gpu_host_gpu', 9)
3 - 0.359s 0 (0.038s in global opts, 0.030s io_toposort) - 3333 nodes -
times - times applied - nb node created - name:
4.223s - 2 - 0 - topo_constant_folding
3.596s - 2155 - 5 - MergeOptimizer
1.777s - 194 - 5337 - local_subtensor_merge
1.442s - 879 - 1146 - local_add_canonizer
1.077s - 904 - 2712 - local_upcast_elemwise_constant_inputs
0.688s - 254 - 499 - local_mul_canonizer
0.285s - 39 - 0 - local_useless_switch
0.200s - 97 - 194 - local_mul_switch_sink
0.185s - 137 - 812 - local_shape_to_shape_i
0.185s - 172 - 0 - local_cut_gpu_host_gpu
0.161s - 113 - 172 - local_dimshuffle_lift
0.140s - 433 - 0 - local_subtensor_make_vector
0.112s - 1 - 6 - local_greedy_distributor
0.098s - 5 - 7 - local_fill_sink
0.090s - 58 - 0 - local_useless_elemwise
0.071s - 12 - 12 - local_useless_slice
0.061s - 130 - 0 - local_intdiv_by_one
0.004s - 2 - 4 - local_subtensor_lift
0.004s - 8 - 0 - local_useless_fill
0.003s - 4 - 6 - local_neg_to_mul
0.482s - in 62 optimization that where not used (display only those with a runtime > 0)
0.112s - local_mul_zero
0.080s - local_func_inv
0.066s - local_one_minus_erf2
0.047s - local_one_minus_erf
0.037s - local_track_shape_i
0.027s - local_useless_elemwise_comparison
0.022s - local_fill_cut
0.020s - local_expm1
0.019s - local_IncSubtensor_serialize
0.018s - local_cast_cast
0.016s - local_useless_subtensor
0.003s - local_div_switch_sink
0.002s - local_abs_lift
0.001s - local_sum_prod_all_to_none
0.001s - local_subtensor_of_alloc
0.001s - local_sum_prod_div_dimshuffle
0.001s - local_subtensor_of_dot
0.001s - local_incsubtensor_of_zeros
0.001s - local_useless_inc_subtensor
0.001s - local_lift_transpose_through_dot
0.001s - local_scalar_tensor_scalar
0.001s - local_cut_useless_reduce
0.001s - local_op_of_op
0.000s - local_reduce_join
0.000s - local_join_empty
0.000s - local_dimshuffle_no_inplace_at_canonicalize
0.000s - local_pow_canonicalize
0.000s - local_0_dot_x
0.000s - local_setsubtensor_of_constants
0.000s - local_join_make_vector
0.000s - local_useless_inc_subtensor_alloc
0.000s - local_useless_split
0.000s - f
0.000s - local_join_1
0.000s - local_useless_alloc
0.000s - local_useless_reshape
0.000s - local_merge_alloc
0.000s - local_reshape_lift
0.000s - local_tensor_scalar_tensor
Global, final and clean up optimizers
Iter 0
TopoOptimizer topo_constant_folding
nb_node (start, end, changed) (4575, 4461, 114)
init io_toposort 0.0480871200562
loop time 3.95878100395
callback_time 0.319423437119
MergeOptimizer
nb fail= 0 merged=13675 constant= 6450
time replace=3.32 validate=0.17 callback=2.43
callbacks_time
Iter 1
TopoOptimizer topo_constant_folding
nb_node (start, end, changed) (3442, 3437, 5)
init io_toposort 0.0371699333191
loop time 0.103798151016
callback_time 0.0362775325775
MergeOptimizer
nb fail= 0 merged= 286 constant= 170
time replace=0.17 validate=0.00 callback=0.13
Iter 2
TopoOptimizer topo_constant_folding
nb_node (start, end, changed) (3333, 3333, 0)
init io_toposort 0.0322849750519
loop time 0.00407814979553
callback_time 0.0
MergeOptimizer
nb fail= 0 merged= 248 constant= 126
time replace=0.09 validate=0.00 callback=0.07
Iter 3
TopoOptimizer topo_constant_folding
nb_node (start, end, changed) (3333, 3333, 0)
init io_toposort 0.0343580245972
loop time 0.00389385223389
callback_time 0.0
MergeOptimizer
nb fail= 0 merged= 0 constant= 0
time replace=0.00 validate=0.00 callback=0.00
9.217939s - ('gpu_opt', 'SeqOptimizer', 14) - 0.046s
SeqOptimizer gpu_opt time 9.218s for 3329/2574 nodes before/after optimization
2.273s for callback
0.046s for fgraph.validate()
8.734742s - ('gpu_local_optimizations', 'EquilibriumOptimizer', 1) - 0.043s
EquilibriumOptimizer gpu_local_optimizations
time 8.734s for 7 passes
nb nodes (start, end, max) 3331 2950 3331
time io_toposort 0.212s
time in local optimizers 8.050s
time in global optimizers 0.000s
time in final optimizers 0.298s
time in cleanup optimizers 0.000s
0 - 5.999s 1322 (0.193s in global opts, 0.032s io_toposort) - 3331 nodes - ('constant_folding', 462) ('local_gpu_elemwise_1', 444) ('local_gpu_elemwise_0', 279) ('local_gpu_dimshuffle_0', 59) ('local_dnn_convw_alpha_merge', 57) ...
1 - 0.554s 270 (0.008s in global opts, 0.033s io_toposort) - 3163 nodes - ('local_gpu_elemwise_1', 156) ('local_gpu_elemwise_0', 97) ('local_gpu_split', 6) ('constant_folding', 3) ('local_pool_dnn_grad_stride', 3) ...
2 - 1.373s 408 (0.080s in global opts, 0.030s io_toposort) - 2976 nodes - ('constant_folding', 163) ('local_gpu_elemwise_0', 104) ('local_gpu_elemwise_1', 63) ('local_gpu_careduce', 60) ('local_gpu_incsubtensor', 5) ...
3 - 0.431s 108 (0.014s in global opts, 0.031s io_toposort) - 3067 nodes - ('local_gpu_elemwise_1', 63) ('constant_folding', 21) ('local_gpu_join', 9) ('local_gpu_subtensor', 5) ('local_gpu_incsubtensor', 4) ...
4 - 0.131s 19 (0.002s in global opts, 0.031s io_toposort) - 2941 nodes - ('local_gpu_incsubtensor', 6) ('constant_folding', 3) ('local_pool_dnn_alternative', 3) ('local_gpu_contiguous_gpu_contiguous', 3) ('local_gpualloc', 2) ...
5 - 0.149s 21 (0.002s in global opts, 0.027s io_toposort) - 2950 nodes - ('constant_folding', 6) ('local_gpu_subtensor', 6) ('local_gpu_elemwise_1', 6) ('local_gpualloc_memset_0', 2) ('MergeOptimizer', 1)
6 - 0.096s 0 (0.000s in global opts, 0.027s io_toposort) - 2950 nodes -
times - times applied - nb node created - name:
5.146s - 658 - 0 - constant_folding
1.107s - 733 - 1698 - local_gpu_elemwise_1
0.795s - 484 - 1428 - local_gpu_elemwise_0
0.298s - 6 - 0 - MergeOptimizer
0.154s - 57 - 456 - local_dnn_convw_alpha_merge
0.139s - 60 - 160 - local_gpu_careduce
0.123s - 9 - 46 - local_gpu_split
0.070s - 64 - 70 - local_gpu_dimshuffle_0
0.059s - 11 - 18 - local_gpu_subtensor
0.040s - 9 - 18 - local_gpu_join
0.037s - 18 - 36 - local_dnn_convi_output_merge
0.022s - 15 - 37 - local_gpu_incsubtensor
0.019s - 2 - 4 - local_gpu_reshape
0.012s - 3 - 12 - local_pool_dnn_alternative
0.008s - 3 - 21 - local_pool_dnn_grad_stride
0.006s - 2 - 6 - local_gpu_dot22scalar
0.006s - 2 - 6 - local_gpu_dot22
0.005s - 4 - 12 - local_gpualloc
0.003s - 3 - 0 - local_gpu_contiguous_gpu_contiguous
0.002s - 3 - 3 - local_gpualloc_memset_0
0.002s - 1 - 5 - local_gpu_crossentorpy_softmax_argmax_1hot_with_bias
0.001s - 1 - 5 - local_gpu_crossentorpy_softmax_1hot_with_bias_dx
0.292s - in 51 optimization that where not used (display only those with a runtime > 0)
0.039s - local_elemwise_alloc
0.037s - local_track_shape_i
0.037s - local_log_softmax_dnn
0.027s - local_dnn_conv_output_merge
0.024s - local_dnn_conv_alpha_merge
0.020s - local_useless_elemwise
0.017s - local_dnn_convw_output_merge
0.016s - local_dnn_convi_alpha_merge
0.008s - gpu_sparse_block_gemv_opt
0.005s - gpu_sparse_block_outer_opt
0.005s - local_gpu_dot_to_dot22
0.005s - local_gpu_ger
0.005s - local_gpu_gemv
0.004s - local_gpu_batched_dot
0.004s - local_gpu_conv
0.004s - local_gpu_lazy_ifelse
0.004s - local_gpu_gemm
0.004s - gpuScanOptimization
0.004s - local_gpu_solve
0.004s - local_gpu_specifyShape_0
0.003s - local_gpu_eye
0.003s - local_conv2d_gpu_conv
0.003s - local_gpu_flatten
0.003s - local_gpu_advanced_incsubtensor1
0.003s - local_gpu_advanced_subtensor1
0.002s - local_gpu_allocempty
0.001s - local_gpu_elemwise_careduce
0.000s - local_subtensor_make_vector
0.000s - f
0.000s - local_gpujoin_1
0.000s - local_gpu_downsample_factor_max
0.000s - local_gpu_downsample_factor_max_grad
Global, final and clean up optimizers
Iter 0
MergeOptimizer
nb fail= 0 merged= 854 constant= 473
time replace=0.19 validate=0.01 callback=0.15
Iter 1
MergeOptimizer
nb fail= 0 merged= 80 constant= 9
time replace=0.01 validate=0.00 callback=0.00
Iter 2
MergeOptimizer
nb fail= 0 merged= 202 constant= 158
time replace=0.08 validate=0.00 callback=0.07
Iter 3
MergeOptimizer
nb fail= 0 merged= 42 constant= 28
time replace=0.01 validate=0.00 callback=0.01
Iter 4
MergeOptimizer
nb fail= 0 merged= 12 constant= 11
time replace=0.00 validate=0.00 callback=0.00
Iter 5
MergeOptimizer
nb fail= 0 merged= 6 constant= 6
time replace=0.00 validate=0.00 callback=0.00
Iter 6
MergeOptimizer
nb fail= 0 merged= 0 constant= 0
time replace=0.00 validate=0.00 callback=0.00
0.482284s - ('gpu_cut_transfers', 'EquilibriumOptimizer', 2) - 0.003s
EquilibriumOptimizer gpu_cut_transfers
time 0.482s for 2 passes
nb nodes (start, end, max) 2950 2574 2950
time io_toposort 0.298s
time in local optimizers 0.167s
time in global optimizers 0.000s
time in final optimizers 0.000s
time in cleanup optimizers 0.000s
0 - 0.199s 189 (0.000s in global opts, 0.027s io_toposort) - 2950 nodes - ('local_cut_gpu_host_gpu', 189)
1 - 0.283s 0 (0.000s in global opts, 0.271s io_toposort) - 2574 nodes -
times - times applied - nb node created - name:
0.158s - 189 - 0 - local_cut_gpu_host_gpu
0.009s - in 1 optimization that where not used (display only those with a runtime > 0)
0.009s - constant_folding
0.000894s - ('InputToGpuOptimizer', 'InputToGpuOptimizer', 0) - 0.000s
9.179657s - ('inplace_elemwise_optimizer', 'FromFunctionOptimizer', 38) - 1.380s
3.009087s - ('local_dnn_conv_inplace', 'TopoOptimizer', 33) - 2.658s
TopoOptimizer local_dnn_conv_inplace
nb_node (start, end, changed) (1760, 1788, 170)
init io_toposort 0.0167989730835
loop time 2.99210095406
callback_time 2.77951002121
1.587725s - ('gpu_elemwise_fusion', 'FusionOptimizer', 18) - 0.004s
FusionOptimizer
nb_iter 3
nb_replacement 352
nb_inconsistency_replace 0
validate_time 0.00416564941406
callback_time 0.226927757263
time_toposort 0.349379062653
1.342049s - ('specialize', 'EquilibriumOptimizer', 11) - 0.000s
EquilibriumOptimizer specialize
time 1.342s for 3 passes
nb nodes (start, end, max) 3336 3329 3336
time io_toposort 0.302s
time in local optimizers 0.389s
time in global optimizers 0.142s
time in final optimizers 0.441s
time in cleanup optimizers 0.000s
0 - 0.590s 7 (0.402s in global opts, 0.033s io_toposort) - 3336 nodes - ('local_mul_to_sqr', 2) ('local_div_to_inv', 1) ('local_softmax_grad_to_crossentropy_with_softmax_grad', 1) ('local_softmax_with_bias', 1) ('topo_constant_folding', 1) ...
1 - 0.500s 2 (0.112s in global opts, 0.236s io_toposort) - 3330 nodes - ('crossentropy_to_crossentropy_with_softmax_with_bias', 1) ('local_useless_crossentropy_softmax_1hot_with_bias_dx_alloc', 1)
2 - 0.252s 0 (0.069s in global opts, 0.033s io_toposort) - 3329 nodes -
times - times applied - nb node created - name:
0.441s - 1 - 0 - topo_constant_folding
0.142s - 1 - 1 - crossentropy_to_crossentropy_with_softmax_with_bias
0.007s - 2 - 2 - local_mul_to_sqr
0.001s - 1 - 1 - local_div_to_inv
0.001s - 1 - 1 - local_softmax_grad_to_crossentropy_with_softmax_grad
0.001s - 1 - 1 - local_useless_crossentropy_softmax_1hot_with_bias_dx_alloc
0.001s - 1 - 1 - local_softmax_with_bias
0.000s - 1 - 0 - local_subtensor_make_vector
0.378s - in 63 optimization that where not used (display only those with a runtime > 0)
0.093s - local_add_specialize
0.045s - local_mul_specialize
0.041s - local_one_minus_erf2
0.029s - local_elemwise_alloc
0.027s - local_useless_elemwise
0.025s - local_func_inv
0.016s - local_track_shape_i
0.015s - local_one_minus_erf
0.015s - local_abs_merge
0.011s - local_mul_switch_sink
0.010s - local_useless_elemwise_comparison
0.008s - local_expm1
0.007s - local_elemwise_sub_zeros
0.007s - local_logsoftmax
0.007s - local_cast_cast
0.007s - local_useless_switch
0.006s - local_alloc_unary
0.001s - local_sum_prod_mul_by_scalar
0.001s - local_useless_subtensor
0.001s - local_pow_specialize
0.001s - local_reduce_broadcastable
0.001s - local_sum_prod_div_dimshuffle
0.001s - local_useless_inc_subtensor
0.001s - local_dimshuffle_lift
0.000s - local_useless_slice
0.000s - local_opt_alloc
0.000s - local_join_empty
0.000s - local_grad_log_erfc_neg
0.000s - local_useless_alloc
0.000s - local_join_make_vector
0.000s - local_useless_inc_subtensor_alloc
0.000s - local_scalar_tensor_scalar
0.000s - local_subtensor_merge
0.000s - local_subtensor_of_alloc
0.000s - local_subtensor_of_dot
0.000s - local_join_1
0.000s - local_useless_split
0.000s - local_merge_alloc
0.000s - local_logsoftmax_grad
Global, final and clean up optimizers
Iter 0
TopoOptimizer topo_constant_folding
nb_node (start, end, changed) (3332, 3331, 1)
init io_toposort 0.0324211120605
loop time 0.334736824036
callback_time 0.000298023223877
Iter 1
TopoOptimizer topo_constant_folding
nb_node (start, end, changed) (3329, 3329, 0)
init io_toposort 0.0332930088043
loop time 0.00426387786865
callback_time 0.0
Iter 2
TopoOptimizer topo_constant_folding
nb_node (start, end, changed) (3329, 3329, 0)
init io_toposort 0.0321750640869
loop time 0.00389409065247
callback_time 0.0
0.627962s - ('merge1', 'MergeOptimizer', 0) - 0.030s
MergeOptimizer
nb fail= 0 merged= 3298 constant= 1703
time replace=0.39 validate=0.03 callback=0.19
0.546419s - ('stabilize', 'EquilibriumOptimizer', 6) - 0.000s
EquilibriumOptimizer stabilize
time 0.546s for 2 passes
nb nodes (start, end, max) 3333 3336 3336
time io_toposort 0.065s
time in local optimizers 0.115s
time in global optimizers 0.269s
time in final optimizers 0.075s
time in cleanup optimizers 0.000s
0 - 0.373s 4 (0.271s in global opts, 0.032s io_toposort) - 3333 nodes - ('local_fill_to_alloc', 4)
1 - 0.173s 0 (0.074s in global opts, 0.033s io_toposort) - 3336 nodes -
times - times applied - nb node created - name:
0.003s - 4 - 7 - local_fill_to_alloc
0.457s - in 39 optimization that where not used (display only those with a runtime > 0)
0.269s - crossentropy_to_crossentropy_with_softmax_with_bias
0.075s - topo_constant_folding
0.039s - local_greedy_distributor
0.025s - local_one_minus_erf2
0.024s - local_sigm_times_exp
0.010s - local_one_minus_erf
0.007s - local_useless_elemwise_comparison
0.007s - local_expm1
0.000s - local_incsubtensor_of_zeros
0.000s - local_exp_over_1_plus_exp
0.000s - local_grad_log_erfc_neg
0.000s - local_0_dot_x
0.000s - local_setsubtensor_of_constants
0.000s - local_subtensor_of_dot
0.000s - local_useless_inc_subtensor_alloc
0.000s - local_useless_alloc
0.000s - local_merge_alloc
0.000s - local_useless_reshape
0.000s - local_reshape_lift
Global, final and clean up optimizers
Iter 0
TopoOptimizer topo_constant_folding
nb_node (start, end, changed) (3336, 3336, 0)
init io_toposort 0.0341119766235
loop time 0.00382590293884
callback_time 0.0
Iter 1
TopoOptimizer topo_constant_folding
nb_node (start, end, changed) (3336, 3336, 0)
init io_toposort 0.0333940982819
loop time 0.00389885902405
callback_time 0.0
0.511182s - ('elemwise_fusion', 'SeqOptimizer', 17) - 0.001s
SeqOptimizer elemwise_fusion time 0.511s for 2574/2420 nodes before/after optimization
0.126s for callback
0.001s for fgraph.validate()
0.398408s - ('composite_elemwise_fusion', 'FusionOptimizer', 1) - 0.001s
FusionOptimizer
nb_iter 2
nb_replacement 95
nb_inconsistency_replace 0
validate_time 0.00113773345947
callback_time 0.100536823273
time_toposort 0.04758477211
0.112761s - ('local_add_mul_fusion', 'FusionOptimizer', 0) - 0.000s
FusionOptimizer
nb_iter 3
nb_replacement 18
nb_inconsistency_replace 0
validate_time 0.000210523605347
callback_time 0.0254974365234
time_toposort 0.0722908973694
0.492368s - ('add_destroy_handler', 'AddDestroyHandler', 21) - 0.000s
0.481988s - ('scan_eqopt2', 'EquilibriumOptimizer', 9) - 0.000s
EquilibriumOptimizer scan_eqopt2
time 0.482s for 1 passes
nb nodes (start, end, max) 3336 3336 3336
time io_toposort 0.035s
time in local optimizers 0.000s
time in global optimizers 0.441s
time in final optimizers 0.000s
time in cleanup optimizers 0.000s
0 - 0.482s 0 (0.441s in global opts, 0.035s io_toposort) - 3336 nodes -
Global, final and clean up optimizers
Iter 0
TopoOptimizer constant_folding_for_scan2
nb_node (start, end, changed) (3336, 3336, 0)
init io_toposort 0.0319931507111
loop time 0.00378704071045
callback_time 0.0
TopoOptimizer scanOp_remove_constants_and_unused_inputs1
nb_node (start, end, changed) (3336, 3336, 0)
init io_toposort 0.0346179008484
loop time 0.0035719871521
callback_time 0.0
TopoOptimizer scanop_remove_constants_and_unused_inputs2
nb_node (start, end, changed) (3336, 3336, 0)
init io_toposort 0.0306630134583
loop time 0.00346493721008
callback_time 0.0
TopoOptimizer scanOp_merge_inouts
nb_node (start, end, changed) (3336, 3336, 0)
init io_toposort 0.0324099063873
loop time 0.0039119720459
callback_time 0.0
TopoOptimizer scanOp_remove_constants_and_unused_inputs3
nb_node (start, end, changed) (3336, 3336, 0)
init io_toposort 0.0334548950195
loop time 0.00352096557617
callback_time 0.0
0.445209s - ('scan_eqopt1', 'EquilibriumOptimizer', 2) - 0.000s
EquilibriumOptimizer scan_eqopt1
time 0.445s for 1 passes
nb nodes (start, end, max) 4784 4784 4784
time io_toposort 0.047s
time in local optimizers 0.000s
time in global optimizers 0.391s
time in final optimizers 0.000s
time in cleanup optimizers 0.000s
0 - 0.445s 0 (0.391s in global opts, 0.047s io_toposort) - 4784 nodes -
Global, final and clean up optimizers
Iter 0
SeqOptimizer all_pushout_opt time 0.391s for 4784/4784 nodes before/after optimization
0.000s for callback
0.000s for fgraph.validate()
0.191856s - ('remove_constants_and_unused_inputs_scan', 'TopoOptimizer', 0) - 0.000s
TopoOptimizer scanOp_remove_constants_and_unused_inputs0
nb_node (start, end, changed) (4784, 4784, 0)
init io_toposort 0.18671298027
loop time 0.00507307052612
callback_time 0.0
0.051415s - ('scan_pushout_dot1', 'PushOutDot1', 3) - 0.000s
0.049522s - ('scanOp_pushout_seqs_ops', 'PushOutSeqScan', 2) - 0.000s
0.049322s - ('scanOp_pushout_output', 'PushOutScanOutput', 4) - 0.000s
0.048480s - ('scanOp_pushout_nonseqs_ops', 'PushOutNonSeqScan', 1) - 0.000s
0.397820s - ('ShapeOpt', 'ShapeOptimizer', 1) - 0.000s
0.288505s - ('BlasOpt', 'SeqOptimizer', 10) - 0.000s
SeqOptimizer BlasOpt time 0.288s for 3336/3336 nodes before/after optimization
0.001s for callback
0.000s for fgraph.validate()
0.083254s - ('gemm_optimizer', 'GemmOptimizer', 1) - 0.000s
GemmOptimizer
nb_iter 1
nb_replacement 0
nb_replacement_didn_t_remove 0
nb_inconsistency_make 0
nb_inconsistency_replace 0
time_canonicalize 0.0285625457764
time_factor_can 0
time_factor_list 0
time_toposort 0.0311839580536
validate_time 0.0
callback_time 0.0
0.045156s - ('use_c_blas', 'TopoOptimizer', 4) - 0.000s
TopoOptimizer use_c_blas
nb_node (start, end, changed) (3336, 3336, 0)
init io_toposort 0.0328800678253
loop time 0.0122079849243
callback_time 0.0
0.042374s - ('local_dot22_to_dot22scalar', 'TopoOptimizer', 2) - 0.000s
TopoOptimizer local_dot22_to_dot22scalar
nb_node (start, end, changed) (3336, 3336, 1)
init io_toposort 0.0323820114136
loop time 0.00990796089172
callback_time 0.000487804412842
0.041280s - ('local_gemm_to_gemv', 'EquilibriumOptimizer', 3) - 0.000s
EquilibriumOptimizer local_gemm_to_gemv
time 0.041s for 1 passes
nb nodes (start, end, max) 3336 3336 3336
time io_toposort 0.035s
time in local optimizers 0.000s
time in global optimizers 0.000s
time in final optimizers 0.000s
time in cleanup optimizers 0.000s
0 - 0.041s 0 (0.000s in global opts, 0.035s io_toposort) - 3336 nodes -
0.039106s - ('use_scipy_ger', 'TopoOptimizer', 5) - 0.000s
TopoOptimizer scipy_blas
nb_node (start, end, changed) (3336, 3336, 0)
init io_toposort 0.0348558425903
loop time 0.00417590141296
callback_time 0.0
0.037307s - ('local_dot_to_dot22', 'TopoOptimizer', 0) - 0.000s
TopoOptimizer local_dot_to_dot22
nb_node (start, end, changed) (3336, 3336, 3)
init io_toposort 0.0324900150299
loop time 0.00476217269897
callback_time 0.000386953353882
0.207870s - ('local_IncSubtensor_serialize', 'TopoOptimizer', 3) - 0.000s
TopoOptimizer pre_local_IncSubtensor_serialize
nb_node (start, end, changed) (4784, 4784, 20)
init io_toposort 0.176945924759
loop time 0.0308661460876
callback_time 0.00447106361389
0.167285s - ('gpu_after_fusion', 'SeqOptimizer', 20) - 0.000s
SeqOptimizer gpu_after_fusion time 0.167s for 1760/1760 nodes before/after optimization
0.001s for callback
0.000s for fgraph.validate()
0.083829s - ('gpu_local_optimizations', 'EquilibriumOptimizer', 1) - 0.000s
EquilibriumOptimizer gpu_local_optimizations
time 0.083s for 1 passes
nb nodes (start, end, max) 1762 1762 1762
time io_toposort 0.028s
time in local optimizers 0.040s
time in global optimizers 0.000s
time in final optimizers 0.000s
time in cleanup optimizers 0.000s
0 - 0.083s 0 (0.000s in global opts, 0.028s io_toposort) - 1762 nodes -
Global, final and clean up optimizers
Iter 0
MergeOptimizer
nb fail= 0 merged= 0 constant= 0
time replace=0.00 validate=0.00 callback=0.00
0.082261s - ('gpu_cut_transfers', 'EquilibriumOptimizer', 2) - 0.000s
EquilibriumOptimizer gpu_cut_transfers
time 0.082s for 2 passes
nb nodes (start, end, max) 1762 1760 1762
time io_toposort 0.055s
time in local optimizers 0.010s
time in global optimizers 0.000s
time in final optimizers 0.000s
time in cleanup optimizers 0.000s
0 - 0.042s 2 (0.000s in global opts, 0.028s io_toposort) - 1762 nodes - ('local_cut_gpu_host_gpu', 2)
1 - 0.040s 0 (0.000s in global opts, 0.027s io_toposort) - 1760 nodes -
times - times applied - nb node created - name:
0.001s - 2 - 0 - local_cut_gpu_host_gpu
0.008s - in 1 optimization that where not used (display only those with a runtime > 0)
0.008s - constant_folding
0.001180s - ('InputToGpuOptimizer', 'InputToGpuOptimizer', 0) - 0.000s
0.134150s - ('local_inplace_setsubtensor', 'TopoOptimizer', 26) - 0.104s
TopoOptimizer local_inplace_setsubtensor
nb_node (start, end, changed) (1760, 1760, 13)
init io_toposort 0.0167520046234
loop time 0.117291927338
callback_time 0.108938455582
0.062133s - ('inplace_elemwise_optimizer', 'FromFunctionOptimizer', 41) - 0.000s
0.058811s - ('scanOp_make_inplace', 'ScanInplaceOptimizer', 42) - 0.000s
0.042339s - ('local_elemwise_alloc', 'TopoOptimizer', 8) - 0.000s
TopoOptimizer local_elemwise_alloc
nb_node (start, end, changed) (3336, 3336, 0)
init io_toposort 0.032870054245
loop time 0.00939798355103
callback_time 0.0
0.040328s - ('gpu_scanOp_make_inplace', 'ScanInplaceOptimizer', 39) - 0.000s
0.038766s - ('uncanonicalize', 'EquilibriumOptimizer', 13) - 0.000s
EquilibriumOptimizer uncanonicalize
time 0.039s for 1 passes
nb nodes (start, end, max) 3329 3329 3329
time io_toposort 0.033s
time in local optimizers 0.000s
time in global optimizers 0.000s
time in final optimizers 0.000s
time in cleanup optimizers 0.000s
0 - 0.039s 0 (0.000s in global opts, 0.033s io_toposort) - 3329 nodes -
0.038547s - ('local_fill_to_alloc', 'TopoOptimizer', 7) - 0.000s
TopoOptimizer local_fill_to_alloc
nb_node (start, end, changed) (3336, 3336, 0)
init io_toposort 0.0312049388885
loop time 0.0072660446167
callback_time 0.0
0.037259s - ('specialize_device', 'EquilibriumOptimizer', 15) - 0.000s
EquilibriumOptimizer specialize_device
time 0.037s for 1 passes
nb nodes (start, end, max) 2574 2574 2574
time io_toposort 0.026s
time in local optimizers 0.006s
time in global optimizers 0.000s
time in final optimizers 0.000s
time in cleanup optimizers 0.000s
0 - 0.037s 0 (0.000s in global opts, 0.026s io_toposort) - 2574 nodes -
0.032412s - ('crossentropy_to_crossentropy_with_softmax', 'FromFunctionOptimizer', 12) - 0.000s
0.029169s - ('AbstractConvCheck', 'TopoOptimizer', 16) - 0.000s
TopoOptimizer AbstractConvCheck
nb_node (start, end, changed) (2574, 2574, 0)
init io_toposort 0.0260169506073
loop time 0.00309085845947
callback_time 0.0
0.027419s - ('gpua_elemwise_fusion', 'FusionOptimizer', 37) - 0.000s
FusionOptimizer
nb_iter 1
nb_replacement 0
nb_inconsistency_replace 0
validate_time 0.0
callback_time 0.0
time_toposort 0.0257840156555
0.021753s - ('InplaceGpuBlasOpt', 'TopoOptimizer', 30) - 0.000s
TopoOptimizer InplaceGpuBlasOpt
nb_node (start, end, changed) (1760, 1760, 0)
init io_toposort 0.016930103302
loop time 0.00467014312744
callback_time 0.0
0.020725s - ('local_dnna_conv_inplace', 'TopoOptimizer', 34) - 0.000s
TopoOptimizer local_dnna_conv_inplace
nb_node (start, end, changed) (1788, 1788, 0)
init io_toposort 0.0175051689148
loop time 0.00305318832397
callback_time 0.0
0.020469s - ('blas_opt_inplace', 'TopoOptimizer', 29) - 0.000s
TopoOptimizer InplaceBlasOpt
nb_node (start, end, changed) (1760, 1760, 0)
init io_toposort 0.0158088207245
loop time 0.00451397895813
callback_time 0.0
0.020260s - ('local_inplace_gpu_sparse_block_gemv', 'TopoOptimizer', 23) - 0.000s
TopoOptimizer local_inplace_gpu_sparse_block_gemv
nb_node (start, end, changed) (1760, 1760, 0)
init io_toposort 0.0183529853821
loop time 0.00179409980774
callback_time 0.0
0.020023s - ('random_make_inplace', 'TopoOptimizer', 45) - 0.000s
TopoOptimizer random_make_inplace
nb_node (start, end, changed) (1788, 1788, 0)
init io_toposort 0.0180921554565
loop time 0.00187206268311
callback_time 0.0
0.019779s - ('make_ger_destructive', 'TopoOptimizer', 36) - 0.000s
TopoOptimizer make_scipy_blas_destructive
nb_node (start, end, changed) (1788, 1788, 0)
init io_toposort 0.0174670219421
loop time 0.00225496292114
callback_time 0.0
0.019312s - ('local_inplace_incsubtensor1', 'TopoOptimizer', 25) - 0.000s
TopoOptimizer local_inplace_incsubtensor1
nb_node (start, end, changed) (1760, 1760, 0)
init io_toposort 0.0174140930176
loop time 0.00179100036621
callback_time 0.0
0.019285s - ('dimshuffle_as_view', 'TopoOptimizer', 22) - 0.000s
TopoOptimizer dimshuffle_as_view
nb_node (start, end, changed) (1760, 1760, 0)
init io_toposort 0.0174798965454
loop time 0.00175309181213
callback_time 0.0
0.019202s - ('gpuablas_opt_inplace', 'TopoOptimizer', 31) - 0.000s
TopoOptimizer InplaceGpuaBlasOpt
nb_node (start, end, changed) (1760, 1760, 0)
init io_toposort 0.0163221359253
loop time 0.00271797180176
callback_time 0.0
0.019028s - ('local_inplace_sparse_block_gemv', 'TopoOptimizer', 27) - 0.000s
TopoOptimizer local_inplace_sparse_block_gemv
nb_node (start, end, changed) (1760, 1760, 0)
init io_toposort 0.0169730186462
loop time 0.00194787979126
callback_time 0.0
0.018978s - ('local_gemm16_inplace', 'TopoOptimizer', 35) - 0.000s
TopoOptimizer local_gemm16_inplace
nb_node (start, end, changed) (1788, 1788, 0)
init io_toposort 0.0172350406647
loop time 0.00164389610291
callback_time 0.0
0.018874s - ('c_blas_destructive', 'TopoOptimizer', 32) - 0.000s
TopoOptimizer c_blas_destructive
nb_node (start, end, changed) (1760, 1760, 0)
init io_toposort 0.0154929161072
loop time 0.00330901145935
callback_time 0.0
0.018569s - ('local_inplace_gpu_sparse_block_outer', 'TopoOptimizer', 24) - 0.000s
TopoOptimizer local_inplace_gpu_sparse_block_outer
nb_node (start, end, changed) (1760, 1760, 0)
init io_toposort 0.0166771411896
loop time 0.00178098678589
callback_time 0.0
0.018206s - ('cond_make_inplace', 'TopoOptimizer', 43) - 0.000s
TopoOptimizer cond_make_inplace
nb_node (start, end, changed) (1788, 1788, 0)
init io_toposort 0.0165610313416
loop time 0.00158500671387
callback_time 0.0
0.018179s - ('local_inplace_sparse_block_outer', 'TopoOptimizer', 28) - 0.000s
TopoOptimizer local_inplace_sparse_block_outer
nb_node (start, end, changed) (1760, 1760, 0)
init io_toposort 0.0165140628815
loop time 0.00155401229858
callback_time 0.0
0.018010s - ('mrg_random_make_inplace', 'TopoOptimizer', 46) - 0.000s
TopoOptimizer random_make_inplace_mrg
nb_node (start, end, changed) (1788, 1788, 0)
init io_toposort 0.0160892009735
loop time 0.00187706947327
callback_time 0.0
0.017756s - ('local_destructive', 'TopoOptimizer', 44) - 0.000s
TopoOptimizer CURAND_destructive
nb_node (start, end, changed) (1788, 1788, 0)
init io_toposort 0.0161859989166
loop time 0.00152897834778
callback_time 0.0
0.017310s - ('inplace_elemwise_optimizer', 'FromFunctionOptimizer', 40) - 0.000s
0.000910s - ('merge2', 'MergeOptimizer', 19) - 0.000s
MergeOptimizer
nb fail= 0 merged= 3 constant= 0
time replace=0.00 validate=0.00 callback=0.00
0.000725s - ('merge3', 'MergeOptimizer', 47) - 0.000s
MergeOptimizer
nb fail= 0 merged= 0 constant= 0
time replace=0.00 validate=0.00 callback=0.00
0.000049s - ('merge1.2', 'MergeOptimizer', 5) - 0.000s
MergeOptimizer
nb fail= 0 merged= 0 constant= 0
time replace=0.00 validate=0.00 callback=0.00
Here are tips to potentially make your code run faster
(if you think of new ones, suggest them on the mailing list).
Test them first, as they are not guaranteed to always provide a speedup.
Sorry, no tip for today.
start compile
(<theano.gof.toolbox.PreserveVariableAttributes object at 0x7f71e492a1d0>, 0.027862071990966797)
(<theano.compile.function_module.Supervisor instance at 0x7f71e45cef38>, 0.0343480110168457)
(<theano.gof.opt.ChangeTracker instance at 0x7f71e4dadfc8>, 0.03467535972595215)
(<theano.gof.toolbox.ReplaceValidate object at 0x7f71e54b9ad0>, 0.045465946197509766)
(<theano.tensor.opt.ShapeFeature object at 0x7f71e4af1050>, 0.1717238426208496)
(<theano.gof.opt.MergeFeature object at 0x7f71e45ded50>, 0.5850560665130615)
(<theano.gof.opt.Updater instance at 0x7f71e4858200>, 1.2420501708984375)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment