Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save gokul-uf/e738f0ade9be279300a8 to your computer and use it in GitHub Desktop.
Save gokul-uf/e738f0ade9be279300a8 to your computer and use it in GitHub Desktop.
Profiler Output without local_subtensor_merge on GoogleNet using Lasagne
/home/gokul/Theano/theano/compile/pfunc.py:479: UserWarning: config.profile_optimizer requires config.profile to be set to True as well
output_keys=output_keys)
Using gpu device 0: GeForce GT 750M (CNMeM is disabled, CuDNN 4004)
/home/gokul/Theano/theano/tensor/signal/downsample.py:6: UserWarning: downsample module has been moved to the theano.tensor.signal.pool module.
"downsample module has been moved to the theano.tensor.signal.pool module.")
Function profiling
==================
Message: lasagne_googlenet.py:135
Time in 0 calls to Function.__call__: 0.000000e+00s
Total compile time: 3.531861e+02s
Number of Apply nodes: 1788
Theano Optimizer time: 4.029457e+01s
Theano validate time: 4.504317e+00s
Theano Linker time (includes C, CUDA code generation/compiling): 3.111209e+02s
Import time 2.162204e-01s
Time in all call to theano.grad() 7.573941e-01s
Time since theano import 371.493s
Optimizer Profile
-----------------
SeqOptimizer time 40.294s for 6324/1788 nodes before/after optimization
10.777s for callback
4.504s for fgraph.validate()
time - (name, class, index) - validate time
10.399793s - ('canonicalize', 'EquilibriumOptimizer', 4) - 0.066s
EquilibriumOptimizer canonicalize
time 10.399s for 5 passes
nb nodes (start, end, max) 4784 3345 4784
time io_toposort 0.381s
time in local optimizers 4.540s
time in global optimizers 0.000s
time in final optimizers 4.288s
time in cleanup optimizers 0.921s
0 - 6.933s 2291 (4.081s in global opts, 0.044s io_toposort) - 4784 nodes - ('MergeOptimizer', 822) ('local_dimshuffle_lift', 364) ('local_upcast_elemwise_constant_inputs', 319) ('local_add_canonizer', 272) ('local_cut_gpu_host_gpu', 163) ...
1 - 1.621s 1214 (0.091s in global opts, 0.045s io_toposort) - 4558 nodes - ('local_subtensor_make_vector', 520) ('MergeOptimizer', 249) ('local_add_canonizer', 138) ('local_mul_canonizer', 137) ('local_intdiv_by_one', 130) ...
2 - 0.590s 296 (0.041s in global opts, 0.056s io_toposort) - 3728 nodes - ('local_subtensor_make_vector', 194) ('local_add_canonizer', 48) ('MergeOptimizer', 45) ('local_cut_gpu_host_gpu', 9)
3 - 0.732s 241 (0.040s in global opts, 0.037s io_toposort) - 3385 nodes - ('local_add_canonizer', 124) ('MergeOptimizer', 117)
4 - 0.524s 0 (0.037s in global opts, 0.199s io_toposort) - 3345 nodes -
times - times applied - nb node created - name:
4.288s - 2 - 0 - topo_constant_folding
1.157s - 582 - 882 - local_add_canonizer
0.921s - 1233 - 5 - MergeOptimizer
0.733s - 254 - 499 - local_mul_canonizer
0.431s - 319 - 957 - local_upcast_elemwise_constant_inputs
0.418s - 364 - 674 - local_dimshuffle_lift
0.261s - 39 - 0 - local_useless_switch
0.242s - 720 - 97 - local_subtensor_make_vector
0.190s - 172 - 0 - local_cut_gpu_host_gpu
0.176s - 137 - 812 - local_shape_to_shape_i
0.120s - 1 - 6 - local_greedy_distributor
0.093s - 130 - 0 - local_intdiv_by_one
0.082s - 5 - 7 - local_fill_sink
0.079s - 58 - 0 - local_useless_elemwise
0.075s - 12 - 12 - local_useless_slice
0.004s - 2 - 4 - local_subtensor_lift
0.003s - 4 - 6 - local_neg_to_mul
0.003s - 8 - 0 - local_useless_fill
0.473s - in 63 optimization that where not used (display only those with a runtime > 0)
0.092s - local_mul_zero
0.092s - local_one_minus_erf2
0.061s - local_func_inv
0.036s - local_track_shape_i
0.035s - local_one_minus_erf
0.024s - local_useless_elemwise_comparison
0.021s - local_mul_switch_sink
0.020s - local_IncSubtensor_serialize
0.019s - local_expm1
0.019s - local_fill_cut
0.017s - local_useless_subtensor
0.016s - local_cast_cast
0.004s - local_div_switch_sink
0.002s - local_abs_lift
0.002s - local_subtensor_of_alloc
0.002s - local_subtensor_of_dot
0.002s - local_sum_prod_all_to_none
0.002s - local_sum_prod_div_dimshuffle
0.001s - local_incsubtensor_of_zeros
0.001s - local_useless_inc_subtensor
0.001s - local_lift_transpose_through_dot
0.001s - local_cut_useless_reduce
0.001s - local_op_of_op
0.001s - local_reduce_join
0.000s - local_join_empty
0.000s - local_dimshuffle_no_inplace_at_canonicalize
0.000s - local_pow_canonicalize
0.000s - local_0_dot_x
0.000s - local_setsubtensor_of_constants
0.000s - local_useless_inc_subtensor_alloc
0.000s - local_join_make_vector
0.000s - local_scalar_tensor_scalar
0.000s - local_join_1
0.000s - local_useless_split
0.000s - f
0.000s - local_useless_alloc
0.000s - local_merge_alloc
0.000s - local_useless_reshape
0.000s - local_reshape_lift
0.000s - local_tensor_scalar_tensor
Global, final and clean up optimizers
Iter 0
TopoOptimizer topo_constant_folding
nb_node (start, end, changed) (4607, 4558, 49)
init io_toposort 0.19864988327
loop time 3.88232302666
callback_time 0.321128368378
MergeOptimizer
nb fail= 0 merged= 1610 constant= 726
time replace=0.60 validate=0.02 callback=0.44
Iter 1
TopoOptimizer topo_constant_folding
nb_node (start, end, changed) (3733, 3728, 5)
init io_toposort 0.0340428352356
loop time 0.0554389953613
callback_time 0.0345137119293
MergeOptimizer
nb fail= 0 merged= 344 constant= 228
time replace=0.22 validate=0.00 callback=0.17
Iter 2
TopoOptimizer topo_constant_folding
nb_node (start, end, changed) (3385, 3385, 0)
init io_toposort 0.0363101959229
loop time 0.00421690940857
callback_time 0.0
MergeOptimizer
nb fail= 0 merged= 45 constant= 45
time replace=0.02 validate=0.00 callback=0.01
Iter 3
TopoOptimizer topo_constant_folding
nb_node (start, end, changed) (3345, 3345, 0)
init io_toposort 0.0359070301056
loop time 0.00407600402832
callback_time 0.0
MergeOptimizer
nb fail= 0 merged= 224 constant= 114
time replace=0.08 validate=0.00 callback=0.06
Iter 4
TopoOptimizer topo_constant_folding
nb_node (start, end, changed) (3345, 3345, 0)
init io_toposort 0.0328199863434
loop time 0.00390100479126
callback_time 0.0
MergeOptimizer
nb fail= 0 merged= 0 constant= 0
time replace=0.00 validate=0.00 callback=0.00
9.525620s - ('gpu_opt', 'SeqOptimizer', 14) - 0.048s
SeqOptimizer gpu_opt time 9.526s for 3341/2586 nodes before/after optimization
2.406s for callback
0.048s for fgraph.validate()
9.083087s - ('gpu_local_optimizations', 'EquilibriumOptimizer', 1) - 0.046s
EquilibriumOptimizer gpu_local_optimizations
time 9.083s for 7 passes
nb nodes (start, end, max) 3343 2962 3343
time io_toposort 0.411s
time in local optimizers 8.191s
time in global optimizers 0.000s
time in final optimizers 0.303s
time in cleanup optimizers 0.000s
0 - 6.006s 1322 (0.192s in global opts, 0.034s io_toposort) - 3343 nodes - ('constant_folding', 462) ('local_gpu_elemwise_1', 444) ('local_gpu_elemwise_0', 279) ('local_gpu_dimshuffle_0', 59) ('local_dnn_convw_alpha_merge', 57) ...
1 - 0.762s 270 (0.008s in global opts, 0.221s io_toposort) - 3175 nodes - ('local_gpu_elemwise_1', 156) ('local_gpu_elemwise_0', 97) ('local_gpu_split', 6) ('constant_folding', 3) ('local_pool_dnn_grad_stride', 3) ...
2 - 1.468s 408 (0.086s in global opts, 0.029s io_toposort) - 2988 nodes - ('constant_folding', 163) ('local_gpu_elemwise_0', 104) ('local_gpu_elemwise_1', 63) ('local_gpu_careduce', 60) ('local_gpu_incsubtensor', 5) ...
3 - 0.477s 108 (0.013s in global opts, 0.039s io_toposort) - 3079 nodes - ('local_gpu_elemwise_1', 63) ('constant_folding', 21) ('local_gpu_join', 9) ('local_gpu_subtensor', 5) ('local_gpu_incsubtensor', 4) ...
4 - 0.127s 19 (0.002s in global opts, 0.027s io_toposort) - 2953 nodes - ('local_gpu_incsubtensor', 6) ('constant_folding', 3) ('local_pool_dnn_alternative', 3) ('local_gpu_contiguous_gpu_contiguous', 3) ('local_gpualloc', 2) ...
5 - 0.143s 21 (0.002s in global opts, 0.028s io_toposort) - 2962 nodes - ('constant_folding', 6) ('local_gpu_subtensor', 6) ('local_gpu_elemwise_1', 6) ('local_gpualloc_memset_0', 2) ('MergeOptimizer', 1)
6 - 0.099s 0 (0.000s in global opts, 0.032s io_toposort) - 2962 nodes -
times - times applied - nb node created - name:
5.108s - 658 - 0 - constant_folding
1.199s - 733 - 1698 - local_gpu_elemwise_1
0.842s - 484 - 1428 - local_gpu_elemwise_0
0.303s - 6 - 0 - MergeOptimizer
0.170s - 57 - 456 - local_dnn_convw_alpha_merge
0.146s - 60 - 160 - local_gpu_careduce
0.126s - 9 - 46 - local_gpu_split
0.072s - 64 - 70 - local_gpu_dimshuffle_0
0.060s - 11 - 18 - local_gpu_subtensor
0.042s - 18 - 36 - local_dnn_convi_output_merge
0.041s - 9 - 18 - local_gpu_join
0.023s - 15 - 37 - local_gpu_incsubtensor
0.020s - 2 - 4 - local_gpu_reshape
0.011s - 3 - 12 - local_pool_dnn_alternative
0.007s - 3 - 21 - local_pool_dnn_grad_stride
0.007s - 2 - 6 - local_gpu_dot22scalar
0.006s - 2 - 6 - local_gpu_dot22
0.005s - 4 - 12 - local_gpualloc
0.003s - 3 - 0 - local_gpu_contiguous_gpu_contiguous
0.002s - 1 - 5 - local_gpu_crossentorpy_softmax_argmax_1hot_with_bias
0.002s - 3 - 3 - local_gpualloc_memset_0
0.002s - 1 - 5 - local_gpu_crossentorpy_softmax_1hot_with_bias_dx
0.297s - in 51 optimization that where not used (display only those with a runtime > 0)
0.039s - local_elemwise_alloc
0.037s - local_log_softmax_dnn
0.036s - local_track_shape_i
0.027s - local_dnn_conv_output_merge
0.026s - local_dnn_conv_alpha_merge
0.020s - local_useless_elemwise
0.017s - local_dnn_convw_output_merge
0.017s - local_dnn_convi_alpha_merge
0.010s - gpu_sparse_block_gemv_opt
0.006s - gpu_sparse_block_outer_opt
0.005s - local_gpu_dot_to_dot22
0.005s - local_gpu_ger
0.005s - local_gpu_gemv
0.004s - local_gpu_batched_dot
0.004s - local_gpu_conv
0.004s - local_gpu_lazy_ifelse
0.004s - local_gpu_gemm
0.004s - gpuScanOptimization
0.004s - local_gpu_specifyShape_0
0.004s - local_gpu_solve
0.004s - local_gpu_eye
0.003s - local_conv2d_gpu_conv
0.003s - local_gpu_flatten
0.003s - local_gpu_advanced_incsubtensor1
0.003s - local_gpu_advanced_subtensor1
0.003s - local_gpu_allocempty
0.001s - local_gpu_elemwise_careduce
0.000s - local_subtensor_make_vector
0.000s - f
0.000s - local_gpujoin_1
0.000s - local_gpu_downsample_factor_max
0.000s - local_gpu_downsample_factor_max_grad
Global, final and clean up optimizers
Iter 0
MergeOptimizer
nb fail= 0 merged= 854 constant= 473
time replace=0.19 validate=0.01 callback=0.15
Iter 1
MergeOptimizer
nb fail= 0 merged= 80 constant= 9
time replace=0.01 validate=0.00 callback=0.00
Iter 2
MergeOptimizer
nb fail= 0 merged= 202 constant= 158
time replace=0.09 validate=0.00 callback=0.07
Iter 3
MergeOptimizer
nb fail= 0 merged= 42 constant= 28
time replace=0.01 validate=0.00 callback=0.01
Iter 4
MergeOptimizer
nb fail= 0 merged= 12 constant= 11
time replace=0.00 validate=0.00 callback=0.00
Iter 5
MergeOptimizer
nb fail= 0 merged= 6 constant= 6
time replace=0.00 validate=0.00 callback=0.00
Iter 6
MergeOptimizer
nb fail= 0 merged= 0 constant= 0
time replace=0.00 validate=0.00 callback=0.00
0.441609s - ('gpu_cut_transfers', 'EquilibriumOptimizer', 2) - 0.003s
EquilibriumOptimizer gpu_cut_transfers
time 0.442s for 2 passes
nb nodes (start, end, max) 2962 2586 2962
time io_toposort 0.254s
time in local optimizers 0.170s
time in global optimizers 0.000s
time in final optimizers 0.000s
time in cleanup optimizers 0.000s
0 - 0.406s 189 (0.000s in global opts, 0.230s io_toposort) - 2962 nodes - ('local_cut_gpu_host_gpu', 189)
1 - 0.035s 0 (0.000s in global opts, 0.024s io_toposort) - 2586 nodes -
times - times applied - nb node created - name:
0.161s - 189 - 0 - local_cut_gpu_host_gpu
0.009s - in 1 optimization that where not used (display only those with a runtime > 0)
0.009s - constant_folding
0.000908s - ('InputToGpuOptimizer', 'InputToGpuOptimizer', 0) - 0.000s
9.196867s - ('inplace_elemwise_optimizer', 'FromFunctionOptimizer', 38) - 1.416s
3.167935s - ('local_dnn_conv_inplace', 'TopoOptimizer', 33) - 2.829s
TopoOptimizer local_dnn_conv_inplace
nb_node (start, end, changed) (1760, 1788, 170)
init io_toposort 0.0148651599884
loop time 3.15292692184
callback_time 2.94574427605
1.487668s - ('gpu_elemwise_fusion', 'FusionOptimizer', 18) - 0.004s
FusionOptimizer
nb_iter 3
nb_replacement 352
nb_inconsistency_replace 0
validate_time 0.00405836105347
callback_time 0.229281663895
time_toposort 0.0550458431244
1.233950s - ('specialize', 'EquilibriumOptimizer', 11) - 0.000s
EquilibriumOptimizer specialize
time 1.234s for 3 passes
nb nodes (start, end, max) 3349 3341 3349
time io_toposort 0.109s
time in local optimizers 0.413s
time in global optimizers 0.165s
time in final optimizers 0.476s
time in cleanup optimizers 0.000s
0 - 0.615s 7 (0.424s in global opts, 0.031s io_toposort) - 3349 nodes - ('local_mul_to_sqr', 2) ('local_div_to_inv', 1) ('local_softmax_grad_to_crossentropy_with_softmax_grad', 1) ('local_softmax_with_bias', 1) ('topo_constant_folding', 1) ...
1 - 0.335s 2 (0.134s in global opts, 0.039s io_toposort) - 3342 nodes - ('crossentropy_to_crossentropy_with_softmax_with_bias', 1) ('local_useless_crossentropy_softmax_1hot_with_bias_dx_alloc', 1)
2 - 0.284s 0 (0.083s in global opts, 0.039s io_toposort) - 3341 nodes -
times - times applied - nb node created - name:
0.476s - 1 - 0 - topo_constant_folding
0.165s - 1 - 1 - crossentropy_to_crossentropy_with_softmax_with_bias
0.007s - 2 - 2 - local_mul_to_sqr
0.001s - 1 - 1 - local_div_to_inv
0.001s - 1 - 1 - local_useless_crossentropy_softmax_1hot_with_bias_dx_alloc
0.001s - 1 - 1 - local_softmax_grad_to_crossentropy_with_softmax_grad
0.001s - 1 - 1 - local_softmax_with_bias
0.001s - 1 - 0 - local_subtensor_make_vector
0.401s - in 62 optimization that where not used (display only those with a runtime > 0)
0.102s - local_add_specialize
0.047s - local_mul_specialize
0.045s - local_one_minus_erf2
0.031s - local_elemwise_alloc
0.028s - local_useless_elemwise
0.027s - local_func_inv
0.017s - local_one_minus_erf
0.015s - local_track_shape_i
0.015s - local_abs_merge
0.012s - local_mul_switch_sink
0.011s - local_useless_elemwise_comparison
0.008s - local_expm1
0.008s - local_elemwise_sub_zeros
0.007s - local_logsoftmax
0.007s - local_cast_cast
0.007s - local_useless_switch
0.006s - local_alloc_unary
0.001s - local_useless_subtensor
0.001s - local_sum_prod_mul_by_scalar
0.001s - local_pow_specialize
0.001s - local_reduce_broadcastable
0.001s - local_sum_prod_div_dimshuffle
0.001s - local_useless_inc_subtensor
0.001s - local_useless_slice
0.001s - local_dimshuffle_lift
0.000s - local_opt_alloc
0.000s - local_join_empty
0.000s - local_grad_log_erfc_neg
0.000s - local_useless_alloc
0.000s - local_join_make_vector
0.000s - local_useless_inc_subtensor_alloc
0.000s - local_subtensor_of_alloc
0.000s - local_scalar_tensor_scalar
0.000s - local_subtensor_of_dot
0.000s - local_useless_split
0.000s - local_join_1
0.000s - local_merge_alloc
0.000s - local_logsoftmax_grad
Global, final and clean up optimizers
Iter 0
TopoOptimizer topo_constant_folding
nb_node (start, end, changed) (3344, 3343, 1)
init io_toposort 0.0301430225372
loop time 0.355483055115
callback_time 0.000345945358276
Iter 1
TopoOptimizer topo_constant_folding
nb_node (start, end, changed) (3341, 3341, 0)
init io_toposort 0.0428228378296
loop time 0.00455713272095
callback_time 0.0
Iter 2
TopoOptimizer topo_constant_folding
nb_node (start, end, changed) (3341, 3341, 0)
init io_toposort 0.0389051437378
loop time 0.00414204597473
callback_time 0.0
0.615536s - ('merge1', 'MergeOptimizer', 0) - 0.029s
MergeOptimizer
nb fail= 0 merged= 3298 constant= 1703
time replace=0.38 validate=0.03 callback=0.19
0.532832s - ('elemwise_fusion', 'SeqOptimizer', 17) - 0.001s
SeqOptimizer elemwise_fusion time 0.533s for 2586/2420 nodes before/after optimization
0.137s for callback
0.001s for fgraph.validate()
0.417839s - ('composite_elemwise_fusion', 'FusionOptimizer', 1) - 0.001s
FusionOptimizer
nb_iter 2
nb_replacement 107
nb_inconsistency_replace 0
validate_time 0.00120878219604
callback_time 0.110847473145
time_toposort 0.0463998317719
0.114981s - ('local_add_mul_fusion', 'FusionOptimizer', 0) - 0.000s
FusionOptimizer
nb_iter 3
nb_replacement 18
nb_inconsistency_replace 0
validate_time 0.000219821929932
callback_time 0.0261061191559
time_toposort 0.0736041069031
0.475244s - ('scan_eqopt2', 'EquilibriumOptimizer', 9) - 0.000s
EquilibriumOptimizer scan_eqopt2
time 0.475s for 1 passes
nb nodes (start, end, max) 3349 3349 3349
time io_toposort 0.037s
time in local optimizers 0.000s
time in global optimizers 0.433s
time in final optimizers 0.000s
time in cleanup optimizers 0.000s
0 - 0.475s 0 (0.433s in global opts, 0.037s io_toposort) - 3349 nodes -
Global, final and clean up optimizers
Iter 0
TopoOptimizer constant_folding_for_scan2
nb_node (start, end, changed) (3349, 3349, 0)
init io_toposort 0.0356850624084
loop time 0.00393390655518
callback_time 0.0
TopoOptimizer scanOp_remove_constants_and_unused_inputs1
nb_node (start, end, changed) (3349, 3349, 0)
init io_toposort 0.188410043716
loop time 0.00373697280884
callback_time 0.0
TopoOptimizer scanop_remove_constants_and_unused_inputs2
nb_node (start, end, changed) (3349, 3349, 0)
init io_toposort 0.0367538928986
loop time 0.00363206863403
callback_time 0.0
TopoOptimizer scanOp_merge_inouts
nb_node (start, end, changed) (3349, 3349, 0)
init io_toposort 0.0370938777924
loop time 0.00395393371582
callback_time 0.0
TopoOptimizer scanOp_remove_constants_and_unused_inputs3
nb_node (start, end, changed) (3349, 3349, 0)
init io_toposort 0.0391750335693
loop time 0.00397610664368
callback_time 0.0
0.466038s - ('BlasOpt', 'SeqOptimizer', 10) - 0.000s
SeqOptimizer BlasOpt time 0.466s for 3349/3349 nodes before/after optimization
0.001s for callback
0.000s for fgraph.validate()
0.205265s - ('use_c_blas', 'TopoOptimizer', 4) - 0.000s
TopoOptimizer use_c_blas
nb_node (start, end, changed) (3349, 3349, 0)
init io_toposort 0.192625999451
loop time 0.0125570297241
callback_time 0.0
0.093847s - ('gemm_optimizer', 'GemmOptimizer', 1) - 0.000s
GemmOptimizer
nb_iter 1
nb_replacement 0
nb_replacement_didn_t_remove 0
nb_inconsistency_make 0
nb_inconsistency_replace 0
time_canonicalize 0.0312497615814
time_factor_can 0
time_factor_list 0
time_toposort 0.0369219779968
validate_time 0.0
callback_time 0.0
0.043994s - ('local_dot_to_dot22', 'TopoOptimizer', 0) - 0.000s
TopoOptimizer local_dot_to_dot22
nb_node (start, end, changed) (3349, 3349, 3)
init io_toposort 0.0389251708984
loop time 0.0050060749054
callback_time 0.000382900238037
0.043500s - ('use_scipy_ger', 'TopoOptimizer', 5) - 0.000s
TopoOptimizer scipy_blas
nb_node (start, end, changed) (3349, 3349, 0)
init io_toposort 0.0391268730164
loop time 0.00428605079651
callback_time 0.0
0.040933s - ('local_dot22_to_dot22scalar', 'TopoOptimizer', 2) - 0.000s
TopoOptimizer local_dot22_to_dot22scalar
nb_node (start, end, changed) (3349, 3349, 1)
init io_toposort 0.0308468341827
loop time 0.010027885437
callback_time 0.000476837158203
0.038460s - ('local_gemm_to_gemv', 'EquilibriumOptimizer', 3) - 0.000s
EquilibriumOptimizer local_gemm_to_gemv
time 0.038s for 1 passes
nb nodes (start, end, max) 3349 3349 3349
time io_toposort 0.032s
time in local optimizers 0.000s
time in global optimizers 0.000s
time in final optimizers 0.000s
time in cleanup optimizers 0.000s
0 - 0.038s 0 (0.000s in global opts, 0.032s io_toposort) - 3349 nodes -
0.418858s - ('scan_eqopt1', 'EquilibriumOptimizer', 2) - 0.000s
EquilibriumOptimizer scan_eqopt1
time 0.419s for 1 passes
nb nodes (start, end, max) 4784 4784 4784
time io_toposort 0.043s
time in local optimizers 0.000s
time in global optimizers 0.368s
time in final optimizers 0.000s
time in cleanup optimizers 0.000s
0 - 0.419s 0 (0.368s in global opts, 0.043s io_toposort) - 4784 nodes -
Global, final and clean up optimizers
Iter 0
SeqOptimizer all_pushout_opt time 0.368s for 4784/4784 nodes before/after optimization
0.000s for callback
0.000s for fgraph.validate()
0.179664s - ('remove_constants_and_unused_inputs_scan', 'TopoOptimizer', 0) - 0.000s
TopoOptimizer scanOp_remove_constants_and_unused_inputs0
nb_node (start, end, changed) (4784, 4784, 0)
init io_toposort 0.174652814865
loop time 0.00494480133057
callback_time 0.0
0.050260s - ('scanOp_pushout_nonseqs_ops', 'PushOutNonSeqScan', 1) - 0.000s
0.047774s - ('scanOp_pushout_seqs_ops', 'PushOutSeqScan', 2) - 0.000s
0.045382s - ('scanOp_pushout_output', 'PushOutScanOutput', 4) - 0.000s
0.045186s - ('scan_pushout_dot1', 'PushOutDot1', 3) - 0.000s
0.387094s - ('ShapeOpt', 'ShapeOptimizer', 1) - 0.000s
0.368549s - ('stabilize', 'EquilibriumOptimizer', 6) - 0.000s
EquilibriumOptimizer stabilize
time 0.368s for 2 passes
nb nodes (start, end, max) 3345 3349 3349
time io_toposort 0.065s
time in local optimizers 0.120s
time in global optimizers 0.079s
time in final optimizers 0.083s
time in cleanup optimizers 0.000s
0 - 0.190s 4 (0.078s in global opts, 0.035s io_toposort) - 3345 nodes - ('local_fill_to_alloc', 4)
1 - 0.178s 0 (0.083s in global opts, 0.030s io_toposort) - 3349 nodes -
times - times applied - nb node created - name:
0.004s - 4 - 8 - local_fill_to_alloc
0.278s - in 39 optimization that where not used (display only those with a runtime > 0)
0.083s - topo_constant_folding
0.079s - crossentropy_to_crossentropy_with_softmax_with_bias
0.040s - local_greedy_distributor
0.027s - local_one_minus_erf2
0.025s - local_sigm_times_exp
0.010s - local_one_minus_erf
0.007s - local_useless_elemwise_comparison
0.007s - local_expm1
0.001s - local_incsubtensor_of_zeros
0.000s - local_exp_over_1_plus_exp
0.000s - local_grad_log_erfc_neg
0.000s - local_setsubtensor_of_constants
0.000s - local_0_dot_x
0.000s - local_subtensor_of_dot
0.000s - local_useless_inc_subtensor_alloc
0.000s - local_useless_alloc
0.000s - local_merge_alloc
0.000s - local_useless_reshape
0.000s - local_reshape_lift
Global, final and clean up optimizers
Iter 0
TopoOptimizer topo_constant_folding
nb_node (start, end, changed) (3349, 3349, 0)
init io_toposort 0.0384531021118
loop time 0.00705313682556
callback_time 0.0
Iter 1
TopoOptimizer topo_constant_folding
nb_node (start, end, changed) (3349, 3349, 0)
init io_toposort 0.0329899787903
loop time 0.00385785102844
callback_time 0.0
0.342618s - ('scanOp_make_inplace', 'ScanInplaceOptimizer', 42) - 0.000s
0.341554s - ('add_destroy_handler', 'AddDestroyHandler', 21) - 0.000s
0.206516s - ('crossentropy_to_crossentropy_with_softmax', 'FromFunctionOptimizer', 12) - 0.000s
0.198069s - ('local_IncSubtensor_serialize', 'TopoOptimizer', 3) - 0.000s
TopoOptimizer pre_local_IncSubtensor_serialize
nb_node (start, end, changed) (4784, 4784, 20)
init io_toposort 0.167459011078
loop time 0.0305531024933
callback_time 0.00435876846313
0.143042s - ('local_inplace_setsubtensor', 'TopoOptimizer', 26) - 0.110s
TopoOptimizer local_inplace_setsubtensor
nb_node (start, end, changed) (1760, 1760, 13)
init io_toposort 0.0189619064331
loop time 0.123964071274
callback_time 0.114995002747
0.107311s - ('gpu_after_fusion', 'SeqOptimizer', 20) - 0.000s
SeqOptimizer gpu_after_fusion time 0.107s for 1760/1760 nodes before/after optimization
0.001s for callback
0.000s for fgraph.validate()
0.056971s - ('gpu_local_optimizations', 'EquilibriumOptimizer', 1) - 0.000s
EquilibriumOptimizer gpu_local_optimizations
time 0.057s for 1 passes
nb nodes (start, end, max) 1762 1762 1762
time io_toposort 0.018s
time in local optimizers 0.028s
time in global optimizers 0.000s
time in final optimizers 0.000s
time in cleanup optimizers 0.000s
0 - 0.057s 0 (0.000s in global opts, 0.018s io_toposort) - 1762 nodes -
Global, final and clean up optimizers
Iter 0
MergeOptimizer
nb fail= 0 merged= 0 constant= 0
time replace=0.00 validate=0.00 callback=0.00
0.049493s - ('gpu_cut_transfers', 'EquilibriumOptimizer', 2) - 0.000s
EquilibriumOptimizer gpu_cut_transfers
time 0.049s for 2 passes
nb nodes (start, end, max) 1762 1760 1762
time io_toposort 0.032s
time in local optimizers 0.006s
time in global optimizers 0.000s
time in final optimizers 0.000s
time in cleanup optimizers 0.000s
0 - 0.026s 2 (0.000s in global opts, 0.017s io_toposort) - 1762 nodes - ('local_cut_gpu_host_gpu', 2)
1 - 0.023s 0 (0.000s in global opts, 0.015s io_toposort) - 1760 nodes -
times - times applied - nb node created - name:
0.001s - 2 - 0 - local_cut_gpu_host_gpu
0.005s - in 1 optimization that where not used (display only those with a runtime > 0)
0.005s - constant_folding
0.000835s - ('InputToGpuOptimizer', 'InputToGpuOptimizer', 0) - 0.000s
0.062530s - ('inplace_elemwise_optimizer', 'FromFunctionOptimizer', 41) - 0.000s
0.047725s - ('local_elemwise_alloc', 'TopoOptimizer', 8) - 0.000s
TopoOptimizer local_elemwise_alloc
nb_node (start, end, changed) (3349, 3349, 0)
init io_toposort 0.0376000404358
loop time 0.010046005249
callback_time 0.0
0.044432s - ('gpu_scanOp_make_inplace', 'ScanInplaceOptimizer', 39) - 0.000s
0.041611s - ('uncanonicalize', 'EquilibriumOptimizer', 13) - 0.000s
EquilibriumOptimizer uncanonicalize
time 0.041s for 1 passes
nb nodes (start, end, max) 3341 3341 3341
time io_toposort 0.036s
time in local optimizers 0.000s
time in global optimizers 0.000s
time in final optimizers 0.000s
time in cleanup optimizers 0.000s
0 - 0.041s 0 (0.000s in global opts, 0.036s io_toposort) - 3341 nodes -
0.038787s - ('local_fill_to_alloc', 'TopoOptimizer', 7) - 0.000s
TopoOptimizer local_fill_to_alloc
nb_node (start, end, changed) (3349, 3349, 0)
init io_toposort 0.0316600799561
loop time 0.00705099105835
callback_time 0.0
0.034755s - ('specialize_device', 'EquilibriumOptimizer', 15) - 0.000s
EquilibriumOptimizer specialize_device
time 0.035s for 1 passes
nb nodes (start, end, max) 2586 2586 2586
time io_toposort 0.023s
time in local optimizers 0.006s
time in global optimizers 0.000s
time in final optimizers 0.000s
time in cleanup optimizers 0.000s
0 - 0.035s 0 (0.000s in global opts, 0.023s io_toposort) - 2586 nodes -
0.028404s - ('gpua_elemwise_fusion', 'FusionOptimizer', 37) - 0.000s
FusionOptimizer
nb_iter 1
nb_replacement 0
nb_inconsistency_replace 0
validate_time 0.0
callback_time 0.0
time_toposort 0.0268249511719
0.026123s - ('AbstractConvCheck', 'TopoOptimizer', 16) - 0.000s
TopoOptimizer AbstractConvCheck
nb_node (start, end, changed) (2586, 2586, 0)
init io_toposort 0.0230870246887
loop time 0.00297999382019
callback_time 0.0
0.023765s - ('local_dnna_conv_inplace', 'TopoOptimizer', 34) - 0.000s
TopoOptimizer local_dnna_conv_inplace
nb_node (start, end, changed) (1788, 1788, 0)
init io_toposort 0.0204739570618
loop time 0.00311684608459
callback_time 0.0
0.023228s - ('InplaceGpuBlasOpt', 'TopoOptimizer', 30) - 0.000s
TopoOptimizer InplaceGpuBlasOpt
nb_node (start, end, changed) (1760, 1760, 0)
init io_toposort 0.0176610946655
loop time 0.00538897514343
callback_time 0.0
0.022926s - ('blas_opt_inplace', 'TopoOptimizer', 29) - 0.000s
TopoOptimizer InplaceBlasOpt
nb_node (start, end, changed) (1760, 1760, 0)
init io_toposort 0.0177640914917
loop time 0.00498390197754
callback_time 0.0
0.020625s - ('local_inplace_sparse_block_outer', 'TopoOptimizer', 28) - 0.000s
TopoOptimizer local_inplace_sparse_block_outer
nb_node (start, end, changed) (1760, 1760, 0)
init io_toposort 0.0186450481415
loop time 0.00185012817383
callback_time 0.0
0.020140s - ('local_gemm16_inplace', 'TopoOptimizer', 35) - 0.000s
TopoOptimizer local_gemm16_inplace
nb_node (start, end, changed) (1788, 1788, 0)
init io_toposort 0.0182089805603
loop time 0.00174903869629
callback_time 0.0
0.019994s - ('cond_make_inplace', 'TopoOptimizer', 43) - 0.000s
TopoOptimizer cond_make_inplace
nb_node (start, end, changed) (1788, 1788, 0)
init io_toposort 0.0180940628052
loop time 0.00182604789734
callback_time 0.0
0.019595s - ('dimshuffle_as_view', 'TopoOptimizer', 22) - 0.000s
TopoOptimizer dimshuffle_as_view
nb_node (start, end, changed) (1760, 1760, 0)
init io_toposort 0.0177409648895
loop time 0.00179815292358
callback_time 0.0
0.019544s - ('local_destructive', 'TopoOptimizer', 44) - 0.000s
TopoOptimizer CURAND_destructive
nb_node (start, end, changed) (1788, 1788, 0)
init io_toposort 0.0177500247955
loop time 0.00173497200012
callback_time 0.0
0.019496s - ('local_inplace_gpu_sparse_block_outer', 'TopoOptimizer', 24) - 0.000s
TopoOptimizer local_inplace_gpu_sparse_block_outer
nb_node (start, end, changed) (1760, 1760, 0)
init io_toposort 0.0176432132721
loop time 0.00175499916077
callback_time 0.0
0.019217s - ('make_ger_destructive', 'TopoOptimizer', 36) - 0.000s
TopoOptimizer make_scipy_blas_destructive
nb_node (start, end, changed) (1788, 1788, 0)
init io_toposort 0.0170040130615
loop time 0.00216507911682
callback_time 0.0
0.019198s - ('gpuablas_opt_inplace', 'TopoOptimizer', 31) - 0.000s
TopoOptimizer InplaceGpuaBlasOpt
nb_node (start, end, changed) (1760, 1760, 0)
init io_toposort 0.0162858963013
loop time 0.00274205207825
callback_time 0.0
0.019189s - ('random_make_inplace', 'TopoOptimizer', 45) - 0.000s
TopoOptimizer random_make_inplace
nb_node (start, end, changed) (1788, 1788, 0)
init io_toposort 0.017333984375
loop time 0.00179004669189
callback_time 0.0
0.018404s - ('local_inplace_sparse_block_gemv', 'TopoOptimizer', 27) - 0.000s
TopoOptimizer local_inplace_sparse_block_gemv
nb_node (start, end, changed) (1760, 1760, 0)
init io_toposort 0.0163660049438
loop time 0.00191783905029
callback_time 0.0
0.018152s - ('inplace_elemwise_optimizer', 'FromFunctionOptimizer', 40) - 0.000s
0.017685s - ('mrg_random_make_inplace', 'TopoOptimizer', 46) - 0.000s
TopoOptimizer random_make_inplace_mrg
nb_node (start, end, changed) (1788, 1788, 0)
init io_toposort 0.0156719684601
loop time 0.00196599960327
callback_time 0.0
0.017610s - ('local_inplace_gpu_sparse_block_gemv', 'TopoOptimizer', 23) - 0.000s
TopoOptimizer local_inplace_gpu_sparse_block_gemv
nb_node (start, end, changed) (1760, 1760, 0)
init io_toposort 0.0160031318665
loop time 0.00151109695435
callback_time 0.0
0.017429s - ('c_blas_destructive', 'TopoOptimizer', 32) - 0.000s
TopoOptimizer c_blas_destructive
nb_node (start, end, changed) (1760, 1760, 0)
init io_toposort 0.0148959159851
loop time 0.00249600410461
callback_time 0.0
0.017334s - ('local_inplace_incsubtensor1', 'TopoOptimizer', 25) - 0.000s
TopoOptimizer local_inplace_incsubtensor1
nb_node (start, end, changed) (1760, 1760, 0)
init io_toposort 0.0155780315399
loop time 0.00165390968323
callback_time 0.0
0.000752s - ('merge3', 'MergeOptimizer', 47) - 0.000s
MergeOptimizer
nb fail= 0 merged= 0 constant= 0
time replace=0.00 validate=0.00 callback=0.00
0.000578s - ('merge2', 'MergeOptimizer', 19) - 0.000s
MergeOptimizer
nb fail= 0 merged= 3 constant= 0
time replace=0.00 validate=0.00 callback=0.00
0.000054s - ('merge1.2', 'MergeOptimizer', 5) - 0.000s
MergeOptimizer
nb fail= 0 merged= 0 constant= 0
time replace=0.00 validate=0.00 callback=0.00
Here are tips to potentially make your code run faster
(if you think of new ones, suggest them on the mailing list).
Test them first, as they are not guaranteed to always provide a speedup.
Sorry, no tip for today.
compiling
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment