Skip to content

Instantly share code, notes, and snippets.

@osiewicz
Last active December 23, 2019 15:17
Show Gist options
  • Save osiewicz/995146990d86c269726368c8fc03d091 to your computer and use it in GitHub Desktop.
Save osiewicz/995146990d86c269726368c8fc03d091 to your computer and use it in GitHub Desktop.
CTRE issue 78 results
Command line:
time g++ -std=c++17 -O3 min_repro.cpp
No forced inlining:
real 0m2,210s
user 0m2,094s
sys 0m0,104s
Binary size: 280064
Forced inlining (current master):
real 1m1,159s
user 0m57,839s
sys 0m1,849s
Binary size: 5370904
== WITH -O3 ==
Command line:
time g++ -std=c++17 -O3 min_repro.cpp
No forced inlining:
real 0m4,076s
user 0m3,986s
sys 0m0,076s
Binary size: 47288
Forced inlining (current master):
real 0m9,929s
user 0m9,458s
sys 0m0,173s
Binary size: 16624
Bonus: optimized debug build (only for modified version):
real 0m5,381s
user 0m5,244s
sys 0m0,112s
Binary size: 1103872
[hiro@hiro-pc ctre_issue_78]$ time g++ -g -ftime-report -O3 -std=c++17 ctre.cpp
^[ ^[
Time variable usr sys wall GGC
phase setup : 0.02 ( 0%) 0.02 ( 0%) 0.08 ( 0%) 1366 kB ( 0%)
phase parsing : 5.81 ( 0%) 2.44 ( 2%) 8.38 ( 1%) 136206 kB ( 4%)
phase lang. deferred : 0.08 ( 0%) 0.04 ( 0%) 0.11 ( 0%) 4805 kB ( 0%)
phase opt and generate :1262.80 (100%) 117.71 ( 98%)1404.25 ( 99%) 3678493 kB ( 96%)
phase last asm : 0.06 ( 0%) 0.00 ( 0%) 0.07 ( 0%) 1208 kB ( 0%)
|name lookup : 0.28 ( 0%) 0.10 ( 0%) 0.35 ( 0%) 2686 kB ( 0%)
|overload resolution : 5.07 ( 0%) 1.91 ( 2%) 7.07 ( 1%) 96445 kB ( 3%)
garbage collection : 7.98 ( 1%) 0.03 ( 0%) 8.09 ( 1%) 0 kB ( 0%)
dump files : 0.03 ( 0%) 0.00 ( 0%) 0.05 ( 0%) 0 kB ( 0%)
callgraph construction : 0.69 ( 0%) 0.01 ( 0%) 0.76 ( 0%) 8127 kB ( 0%)
callgraph optimization : 0.98 ( 0%) 0.01 ( 0%) 1.11 ( 0%) 11 kB ( 0%)
ipa function summary : 0.30 ( 0%) 0.00 ( 0%) 0.31 ( 0%) 22 kB ( 0%)
ipa cp : 0.05 ( 0%) 0.00 ( 0%) 0.05 ( 0%) 4 kB ( 0%)
ipa inlining heuristics : 0.06 ( 0%) 0.00 ( 0%) 0.06 ( 0%) 0 kB ( 0%)
ipa function splitting : 0.23 ( 0%) 0.00 ( 0%) 0.23 ( 0%) 263 kB ( 0%)
ipa pure const : 0.46 ( 0%) 0.00 ( 0%) 0.49 ( 0%) 1 kB ( 0%)
ipa icf : 0.06 ( 0%) 0.00 ( 0%) 0.07 ( 0%) 0 kB ( 0%)
cfg construction : 0.02 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 5 kB ( 0%)
cfg cleanup : 1.29 ( 0%) 0.00 ( 0%) 1.29 ( 0%) 29 kB ( 0%)
trivially dead code :1095.40 ( 86%) 0.25 ( 0%)1115.72 ( 79%) 0 kB ( 0%)
df scan insns : 0.27 ( 0%) 0.00 ( 0%) 0.27 ( 0%) 0 kB ( 0%)
df multiple defs : 0.12 ( 0%) 0.00 ( 0%) 0.11 ( 0%) 0 kB ( 0%)
df reaching defs : 0.11 ( 0%) 0.00 ( 0%) 0.12 ( 0%) 0 kB ( 0%)
df live regs : 1.08 ( 0%) 0.00 ( 0%) 1.12 ( 0%) 0 kB ( 0%)
df live&initialized regs : 0.14 ( 0%) 0.00 ( 0%) 0.13 ( 0%) 0 kB ( 0%)
df must-initialized regs : 0.02 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 kB ( 0%)
df use-def / def-use chains : 0.08 ( 0%) 0.00 ( 0%) 0.10 ( 0%) 0 kB ( 0%)
df reg dead/unused notes : 0.43 ( 0%) 0.00 ( 0%) 0.45 ( 0%) 11 kB ( 0%)
register information : 0.04 ( 0%) 0.00 ( 0%) 0.05 ( 0%) 0 kB ( 0%)
alias analysis : 0.31 ( 0%) 0.00 ( 0%) 0.32 ( 0%) 49 kB ( 0%)
alias stmt walking : 5.38 ( 0%) 1.02 ( 1%) 6.41 ( 0%) 18792 kB ( 0%)
register scan : 0.20 ( 0%) 0.00 ( 0%) 0.21 ( 0%) 4 kB ( 0%)
rebuild jump labels : 0.15 ( 0%) 0.00 ( 0%) 0.16 ( 0%) 0 kB ( 0%)
preprocessing : 0.08 ( 0%) 0.17 ( 0%) 0.23 ( 0%) 1338 kB ( 0%)
parser (global) : 0.22 ( 0%) 0.18 ( 0%) 0.40 ( 0%) 14497 kB ( 0%)
parser struct body : 0.17 ( 0%) 0.06 ( 0%) 0.21 ( 0%) 9373 kB ( 0%)
parser function body : 0.00 ( 0%) 0.01 ( 0%) 0.06 ( 0%) 305 kB ( 0%)
parser inl. func. body : 0.01 ( 0%) 0.03 ( 0%) 0.13 ( 0%) 1264 kB ( 0%)
parser inl. meth. body : 0.11 ( 0%) 0.04 ( 0%) 0.12 ( 0%) 3505 kB ( 0%)
template instantiation : 3.21 ( 0%) 1.01 ( 1%) 4.16 ( 0%) 99016 kB ( 3%)
constant expression evaluation : 1.98 ( 0%) 0.94 ( 1%) 3.07 ( 0%) 2819 kB ( 0%)
early inlining heuristics : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 224 kB ( 0%)
inline parameters : 4.23 ( 0%) 0.00 ( 0%) 4.21 ( 0%) 1803 kB ( 0%)
integration : 36.96 ( 3%) 51.67 ( 43%) 91.90 ( 7%) 2311970 kB ( 60%)
tree gimplify : 0.01 ( 0%) 0.01 ( 0%) 0.04 ( 0%) 1262 kB ( 0%)
tree eh : 0.37 ( 0%) 0.00 ( 0%) 0.40 ( 0%) 63 kB ( 0%)
tree CFG cleanup : 3.37 ( 0%) 3.52 ( 3%) 7.22 ( 1%) 1765 kB ( 0%)
tree tail merge : 0.06 ( 0%) 0.00 ( 0%) 0.06 ( 0%) 17 kB ( 0%)
tree VRP : 1.50 ( 0%) 0.57 ( 0%) 2.04 ( 0%) 35085 kB ( 1%)
tree Early VRP : 7.06 ( 1%) 0.04 ( 0%) 7.26 ( 1%) 6803 kB ( 0%)
tree copy propagation : 0.93 ( 0%) 0.01 ( 0%) 0.92 ( 0%) 0 kB ( 0%)
tree PTA : 2.55 ( 0%) 0.06 ( 0%) 2.58 ( 0%) 6741 kB ( 0%)
tree SSA rewrite : 2.67 ( 0%) 1.46 ( 1%) 4.45 ( 0%) 231036 kB ( 6%)
tree SSA other : 0.00 ( 0%) 0.01 ( 0%) 0.02 ( 0%) 2332 kB ( 0%)
tree SSA incremental : 3.21 ( 0%) 0.09 ( 0%) 3.38 ( 0%) 1887 kB ( 0%)
tree operand scan : 29.94 ( 2%) 48.74 ( 41%) 76.02 ( 5%) 382866 kB ( 10%)
dominator optimization : 1.44 ( 0%) 1.19 ( 1%) 2.79 ( 0%) 89406 kB ( 2%)
backwards jump threading : 0.02 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 kB ( 0%)
tree SRA : 4.92 ( 0%) 1.82 ( 2%) 6.82 ( 0%) 294070 kB ( 8%)
isolate eroneous paths : 0.11 ( 0%) 0.00 ( 0%) 0.12 ( 0%) 0 kB ( 0%)
tree CCP : 6.50 ( 1%) 3.37 ( 3%) 10.76 ( 1%) 806 kB ( 0%)
tree reassociation : 0.37 ( 0%) 0.00 ( 0%) 0.39 ( 0%) 26 kB ( 0%)
tree PRE : 0.26 ( 0%) 0.00 ( 0%) 0.29 ( 0%) 25 kB ( 0%)
tree FRE : 5.05 ( 0%) 1.49 ( 1%) 6.59 ( 0%) 19300 kB ( 1%)
tree code sinking : 0.02 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 16 kB ( 0%)
tree linearize phis : 0.08 ( 0%) 0.00 ( 0%) 0.08 ( 0%) 4 kB ( 0%)
tree backward propagate : 0.03 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 0 kB ( 0%)
tree forward propagate : 3.82 ( 0%) 0.17 ( 0%) 4.11 ( 0%) 243 kB ( 0%)
tree conservative DCE : 0.91 ( 0%) 0.20 ( 0%) 1.22 ( 0%) 102 kB ( 0%)
tree aggressive DCE : 2.42 ( 0%) 0.04 ( 0%) 2.51 ( 0%) 10499 kB ( 0%)
tree buildin call DCE : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 kB ( 0%)
tree DSE : 2.66 ( 0%) 0.01 ( 0%) 2.79 ( 0%) 4221 kB ( 0%)
PHI merge : 0.21 ( 0%) 0.73 ( 1%) 0.82 ( 0%) 121 kB ( 0%)
tree loop invariant motion : 0.08 ( 0%) 0.00 ( 0%) 0.09 ( 0%) 0 kB ( 0%)
tree canonical iv : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 1 kB ( 0%)
complete unrolling : 0.44 ( 0%) 0.65 ( 1%) 1.19 ( 0%) 42025 kB ( 1%)
tree vectorization : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 kB ( 0%)
tree slp vectorization : 0.35 ( 0%) 0.02 ( 0%) 0.37 ( 0%) 25 kB ( 0%)
tree loop distribution : 0.03 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 0 kB ( 0%)
tree iv optimization : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 18 kB ( 0%)
tree SSA uncprop : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 kB ( 0%)
tree switch conversion : 0.10 ( 0%) 0.00 ( 0%) 0.06 ( 0%) 0 kB ( 0%)
gimple CSE sin/cos : 0.05 ( 0%) 0.00 ( 0%) 0.05 ( 0%) 0 kB ( 0%)
gimple widening/fma detection : 0.05 ( 0%) 0.00 ( 0%) 0.05 ( 0%) 0 kB ( 0%)
tree strlen optimization : 0.02 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 0 kB ( 0%)
dominance frontiers : 0.00 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 0 kB ( 0%)
dominance computation : 0.02 ( 0%) 0.01 ( 0%) 0.04 ( 0%) 0 kB ( 0%)
out of ssa : 0.07 ( 0%) 0.00 ( 0%) 0.07 ( 0%) 0 kB ( 0%)
expand vars : 0.03 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 14 kB ( 0%)
expand : 0.52 ( 0%) 0.10 ( 0%) 0.65 ( 0%) 151144 kB ( 4%)
post expand cleanups : 0.17 ( 0%) 0.00 ( 0%) 0.18 ( 0%) 17 kB ( 0%)
varconst : 0.02 ( 0%) 0.01 ( 0%) 0.00 ( 0%) 4 kB ( 0%)
forward prop : 0.18 ( 0%) 0.00 ( 0%) 0.19 ( 0%) 7 kB ( 0%)
CSE : 0.68 ( 0%) 0.01 ( 0%) 0.71 ( 0%) 17 kB ( 0%)
dead code elimination : 0.10 ( 0%) 0.00 ( 0%) 0.11 ( 0%) 0 kB ( 0%)
dead store elim1 : 0.18 ( 0%) 0.02 ( 0%) 0.20 ( 0%) 7 kB ( 0%)
dead store elim2 : 0.16 ( 0%) 0.00 ( 0%) 0.16 ( 0%) 7 kB ( 0%)
loop init : 0.07 ( 0%) 0.00 ( 0%) 0.05 ( 0%) 413 kB ( 0%)
loop invariant motion : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 kB ( 0%)
loop fini : 0.02 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 kB ( 0%)
CPROP : 1.14 ( 0%) 0.00 ( 0%) 1.14 ( 0%) 22 kB ( 0%)
PRE : 0.05 ( 0%) 0.00 ( 0%) 0.05 ( 0%) 0 kB ( 0%)
CSE 2 : 0.43 ( 0%) 0.00 ( 0%) 0.44 ( 0%) 5 kB ( 0%)
branch prediction : 3.08 ( 0%) 0.01 ( 0%) 3.18 ( 0%) 311 kB ( 0%)
combiner : 0.08 ( 0%) 0.00 ( 0%) 0.07 ( 0%) 27 kB ( 0%)
if-conversion : 0.04 ( 0%) 0.00 ( 0%) 0.04 ( 0%) 7 kB ( 0%)
integrated RA : 0.42 ( 0%) 0.00 ( 0%) 0.43 ( 0%) 137 kB ( 0%)
LRA non-specific : 0.59 ( 0%) 0.07 ( 0%) 0.70 ( 0%) 8 kB ( 0%)
LRA virtuals elimination : 0.08 ( 0%) 0.00 ( 0%) 0.07 ( 0%) 0 kB ( 0%)
LRA reload inheritance : 0.03 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 1 kB ( 0%)
LRA create live ranges : 0.02 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 1 kB ( 0%)
reload CSE regs : 0.18 ( 0%) 0.00 ( 0%) 0.18 ( 0%) 15 kB ( 0%)
load CSE after reload : 0.10 ( 0%) 0.00 ( 0%) 0.11 ( 0%) 0 kB ( 0%)
ree : 0.03 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 0 kB ( 0%)
thread pro- & epilogue : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 2 kB ( 0%)
combine stack adjustments : 0.02 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 0 kB ( 0%)
peephole 2 : 0.03 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 0 kB ( 0%)
hard reg cprop : 0.02 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 0 kB ( 0%)
scheduling 2 : 1.71 ( 0%) 0.25 ( 0%) 1.99 ( 0%) 36570 kB ( 1%)
machine dep reorg : 0.05 ( 0%) 0.00 ( 0%) 0.05 ( 0%) 0 kB ( 0%)
reorder blocks : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 5 kB ( 0%)
shorten branches : 0.02 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 kB ( 0%)
final : 0.13 ( 0%) 0.00 ( 0%) 0.13 ( 0%) 4547 kB ( 0%)
symout : 0.15 ( 0%) 0.03 ( 0%) 0.19 ( 0%) 11130 kB ( 0%)
variable tracking : 0.61 ( 0%) 0.00 ( 0%) 0.61 ( 0%) 727 kB ( 0%)
var-tracking dataflow : 0.15 ( 0%) 0.00 ( 0%) 0.16 ( 0%) 0 kB ( 0%)
var-tracking emit : 0.23 ( 0%) 0.00 ( 0%) 0.23 ( 0%) 10650 kB ( 0%)
tree if-combine : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 2 kB ( 0%)
straight-line strength reduction : 0.04 ( 0%) 0.00 ( 0%) 0.04 ( 0%) 0 kB ( 0%)
store merging : 0.03 ( 0%) 0.00 ( 0%) 0.04 ( 0%) 0 kB ( 0%)
initialize rtl : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 12 kB ( 0%)
address lowering : 0.05 ( 0%) 0.00 ( 0%) 0.05 ( 0%) 0 kB ( 0%)
rest of compilation : 0.90 ( 0%) 0.00 ( 0%) 0.87 ( 0%) 24 kB ( 0%)
remove unused locals : 10.20 ( 1%) 0.01 ( 0%) 10.24 ( 1%) 0 kB ( 0%)
address taken : 1.33 ( 0%) 0.03 ( 0%) 1.26 ( 0%) 0 kB ( 0%)
rebuild frequencies : 0.04 ( 0%) 0.00 ( 0%) 0.04 ( 0%) 0 kB ( 0%)
repair loop structures : 0.01 ( 0%) 0.01 ( 0%) 0.00 ( 0%) 0 kB ( 0%)
TOTAL :1268.77 120.21 1412.90 3822089 kB
real 23m33,527s
user 21m9,014s
sys 2m0,373s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment