Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save nicolasvasilache/2a853096befc091d21cd660be16e77e4 to your computer and use it in GitHub Desktop.
Save nicolasvasilache/2a853096befc091d21cd660be16e77e4 to your computer and use it in GitHub Desktop.
register promotion
git log | head -n 32 > /tmp/results.txt; for f in $(ls examples/results_032218/QuadroGP100/logs/2477270-*/*INFO); do echo $f; cat $(dirname $f)/COMMAND; grep -i generation $f | tail -n 4; tail -n 64 $f | grep -A 64 "best option so far"; echo "********************************"; done >> /tmp/results.txt
commit 86c9345d2b651a38270097651700500977491117
Author: nicolasvasilache <nicolas.vasilache@gmail.com>
Date: Thu Mar 22 12:58:46 2018 -0600
Changes for autotuning
commit 815b402064688ec25fe5c67384fb57bab7e14461
Author: Oleksandr Zinenko <git@ozinenko.com>
Date: Wed Mar 21 18:01:20 2018 +0100
Allow copying from global to registers if cannot copy from shared
In cases when the appoximate footprint of the reference group being
promoted to registers is not a subset of any of the approximate
footprints of the reference groups promoted to shared, it is still
possible to promote by copying directly from global memory as long as
all overlapping reference groups have only read the data. It will just
create multiple copies of the data in different storages without
compromising correctness.
commit 188bc31992437cb2d940e28f490677063bf7eb2c
Author: Oleksandr Zinenko <git@ozinenko.com>
Date: Wed Mar 21 17:09:26 2018 +0100
Scop: extract promotionsAtIndexes() from activePromotions()
This creates a private convenience function to obtain a copy of active
promotions specified by a list of their indexes in the storage.
Use this function in Scop::promoteGroup to avoid retraversing the list
of all promotions twice in a row.
examples/results_032218/QuadroGP100/logs/2477270-10/example_MLP_model.INFO
echo CUDA_LAUNCH_BLOCKING=1 ./build/examples/example_MLP_model --gtest_filter=*.2LUT --B=128 --D=64 --L1=50 --E1=10000000 --L2=50 --E2=10000000 --WX=1000 --WY=1024 --M=2000 --N=128 --O=64 --P=32 --Q=2 --debug_tuner=true --dump_cuda=true --disable_version_checks=true --log_dir=${LOG_DIR} --autotune=true --tuner_gen_log_generations=true --tuner_threads=${TUNER_THREADS} --tuner_gpus=${TUNER_GPUS} --save_tuner_proto_prefix=${LOG_DIR}/autotuner/ > ${LOG_DIR}/autotuner/2LUT_B_128_D_64_L1_50_E1_10000000_L2_50_E2_10000000_WX_1000_WY_1024_M_2000_N_128_O_64_P_32_Q_2.log && CUDA_LAUNCH_BLOCKING=1 ./build/examples/example_MLP_model --gtest_filter=*.2LUT --B=128 --D=64 --L1=50 --E1=10000000 --L2=50 --E2=10000000 --WX=1000 --WY=1024 --M=2000 --N=128 --O=64 --P=32 --Q=2 --debug_tuner=true --dump_cuda=true --disable_version_checks=true --log_dir=${LOG_DIR} --autotune=true --tuner_gen_log_generations=true --tuner_threads=${TUNER_THREADS} --tuner_gpus=${TUNER_GPUS} --save_tuner_proto_prefix=${LOG_DIR}/autotuner/ --tuner_gen_restore_from_proto=0 >> ${LOG_DIR}/autotuner/2LUT_B_128_D_64_L1_50_E1_10000000_L2_50_E2_10000000_WX_1000_WY_1024_M_2000_N_128_O_64_P_32_Q_2.log 2>&1
I0322 13:08:10.718683 70897 printer.cc:63] 
Generation 0 Jobs(Compiled, GPU)/total (36, 16)/100 (best/median/worst)us: 323/1406/4397
I0322 13:08:11.718801 70897 printer.cc:63] 
Generation 0 Jobs(Compiled, GPU)/total (40, 20)/100 (best/median/worst)us: 323/1299/4397
I0322 13:08:12.718910 70897 printer.cc:63] 
Generation 0 Jobs(Compiled, GPU)/total (42, 22)/100 (best/median/worst)us: 323/1299/4397
I0322 13:08:13.719033 70897 printer.cc:63] 
Generation 0 Jobs(Compiled, GPU)/total (46, 25)/100 (best/median/worst)us: 323/1299/4397
********************************
examples/results_032218/QuadroGP100/logs/2477270-11/example_MLP_model.INFO
echo CUDA_LAUNCH_BLOCKING=1 ./build/examples/example_MLP_model --gtest_filter=*.C3 --B=128 --D=64 --L1=50 --E1=10000000 --L2=50 --E2=10000000 --WX=1000 --WY=1024 --M=2000 --N=128 --O=64 --P=32 --Q=2 --debug_tuner=true --dump_cuda=true --disable_version_checks=true --log_dir=${LOG_DIR} --autotune=true --tuner_gen_log_generations=true --tuner_threads=${TUNER_THREADS} --tuner_gpus=${TUNER_GPUS} --save_tuner_proto_prefix=${LOG_DIR}/autotuner/ > ${LOG_DIR}/autotuner/C3_B_128_D_64_L1_50_E1_10000000_L2_50_E2_10000000_WX_1000_WY_1024_M_2000_N_128_O_64_P_32_Q_2.log && CUDA_LAUNCH_BLOCKING=1 ./build/examples/example_MLP_model --gtest_filter=*.C3 --B=128 --D=64 --L1=50 --E1=10000000 --L2=50 --E2=10000000 --WX=1000 --WY=1024 --M=2000 --N=128 --O=64 --P=32 --Q=2 --debug_tuner=true --dump_cuda=true --disable_version_checks=true --log_dir=${LOG_DIR} --autotune=true --tuner_gen_log_generations=true --tuner_threads=${TUNER_THREADS} --tuner_gpus=${TUNER_GPUS} --save_tuner_proto_prefix=${LOG_DIR}/autotuner/ --tuner_gen_restore_from_proto=0 >> ${LOG_DIR}/autotuner/C3_B_128_D_64_L1_50_E1_10000000_L2_50_E2_10000000_WX_1000_WY_1024_M_2000_N_128_O_64_P_32_Q_2.log 2>&1
[TUNER][GENERATION LOG] median times of each candidate (in us) 329 341 343 345 347 349 351 351 352 355 356 359 368 381 393 395 397 400 406 414 426 430 433 443 456 461 463 475 480 493 495 510 517 534 552 555 577 597 600 616 617 640 640 700 713 722 741 763 772 773 805 813 855 882 889 890 894 901 907 910 917 932 932 939 960 1000 1026 1082 1088 1116 1117 1118 1120 1130 1153 1179 1216 1220 1319 1376 1438 1449 1468
I0322 13:36:53.824342 9734 printer.cc:63] 
Generation 24 Jobs(Compiled, GPU)/total (100, 100)/100 (best/median/worst)us: 329/640/1468
I0322 13:36:53.824352 9734 printer.cc:70] 
Generation 24 Jobs(Compiled, GPU)/total (100, 100)/100 (best/median/worst)us: 329/640/1468
I0322 13:36:53.824393 81495 genetic_tuning_harness.cc:538] [TUNER][GENERATION LOG] best option so far:
I0322 13:36:53.824393 81495 genetic_tuning_harness.cc:538] [TUNER][GENERATION LOG] best option so far:
I0322 13:36:53.824594 81495 genetic_tuning_harness.cc:542] tc::MappingOptions::makeNaiveMappingOptions()
I0322 13:36:53.824596 81495 genetic_tuning_harness.cc:542] .outerScheduleFusionStrategy(tc::FusionStrategy::Max)
I0322 13:36:53.824599 81495 genetic_tuning_harness.cc:542] .outerScheduleAllowSkewing(false)
I0322 13:36:53.824599 81495 genetic_tuning_harness.cc:542] .outerSchedulePositiveOrthant(true)
I0322 13:36:53.824600 81495 genetic_tuning_harness.cc:542] .intraTileScheduleFusionStrategy(tc::FusionStrategy::Max)
I0322 13:36:53.824601 81495 genetic_tuning_harness.cc:542] .intraTileScheduleAllowSkewing(false)
I0322 13:36:53.824602 81495 genetic_tuning_harness.cc:542] .intraTileSchedulePositiveOrthant(true)
I0322 13:36:53.824604 81495 genetic_tuning_harness.cc:542] .tile(1, 16)
I0322 13:36:53.824604 81495 genetic_tuning_harness.cc:542] .mapToThreads(16, 4)
I0322 13:36:53.824605 81495 genetic_tuning_harness.cc:542] .mapToBlocks(1000, 1024)
I0322 13:36:53.824606 81495 genetic_tuning_harness.cc:542] .unroll(64)
I0322 13:36:53.824607 81495 genetic_tuning_harness.cc:542] .tileImperfectlyNested(false)
I0322 13:36:53.824609 81495 genetic_tuning_harness.cc:542] .useSharedMemory(true)
I0322 13:36:53.824609 81495 genetic_tuning_harness.cc:542] .usePrivateMemory(true)
I0322 13:36:53.824610 81495 genetic_tuning_harness.cc:542] .unrollCopyShared(true)
I0322 13:36:53.824611 81495 genetic_tuning_harness.cc:542] .matchLibraryCalls(true);
********************************
examples/results_032218/QuadroGP100/logs/2477270-12/example_MLP_model.INFO
echo CUDA_LAUNCH_BLOCKING=1 ./build/examples/example_MLP_model --gtest_filter=*.MLP1 --B=128 --D=64 --L1=50 --E1=10000000 --L2=50 --E2=10000000 --WX=1000 --WY=1024 --M=2000 --N=128 --O=64 --P=32 --Q=2 --debug_tuner=true --dump_cuda=true --disable_version_checks=true --log_dir=${LOG_DIR} --autotune=true --tuner_gen_log_generations=true --tuner_threads=${TUNER_THREADS} --tuner_gpus=${TUNER_GPUS} --save_tuner_proto_prefix=${LOG_DIR}/autotuner/ > ${LOG_DIR}/autotuner/MLP1_B_128_D_64_L1_50_E1_10000000_L2_50_E2_10000000_WX_1000_WY_1024_M_2000_N_128_O_64_P_32_Q_2.log && CUDA_LAUNCH_BLOCKING=1 ./build/examples/example_MLP_model --gtest_filter=*.MLP1 --B=128 --D=64 --L1=50 --E1=10000000 --L2=50 --E2=10000000 --WX=1000 --WY=1024 --M=2000 --N=128 --O=64 --P=32 --Q=2 --debug_tuner=true --dump_cuda=true --disable_version_checks=true --log_dir=${LOG_DIR} --autotune=true --tuner_gen_log_generations=true --tuner_threads=${TUNER_THREADS} --tuner_gpus=${TUNER_GPUS} --save_tuner_proto_prefix=${LOG_DIR}/autotuner/ --tuner_gen_restore_from_proto=0 >> ${LOG_DIR}/autotuner/MLP1_B_128_D_64_L1_50_E1_10000000_L2_50_E2_10000000_WX_1000_WY_1024_M_2000_N_128_O_64_P_32_Q_2.log 2>&1
[TUNER][GENERATION LOG] median times of each candidate (in us) 90 90 90 90 91 91 91 91 91 91 91 91 91 91 92 92 92 92 92 92 92 93 93 93 93 93 93 93 93 94 94 94 94 94 94 94 94 94 94 94 94 94 94 95 95 95 95 98 98 117 124 125 126 128 128 128 130 131 131 131 131 132 132 135 137 158 162 163 165 170 211 212 212 214 220 223 225 225 229 238 242 246 262 286 286 321 360
I0322 13:18:21.392019 81363 printer.cc:63] 
Generation 24 Jobs(Compiled, GPU)/total (100, 100)/100 (best/median/worst)us: 90/95/360
I0322 13:18:21.392035 81363 printer.cc:70] 
Generation 24 Jobs(Compiled, GPU)/total (100, 100)/100 (best/median/worst)us: 90/95/360
I0322 13:18:21.392189 76890 genetic_tuning_harness.cc:538] [TUNER][GENERATION LOG] best option so far:
I0322 13:18:21.392189 76890 genetic_tuning_harness.cc:538] [TUNER][GENERATION LOG] best option so far:
I0322 13:18:21.392482 76890 genetic_tuning_harness.cc:542] tc::MappingOptions::makeNaiveMappingOptions()
I0322 13:18:21.392487 76890 genetic_tuning_harness.cc:542] .outerScheduleFusionStrategy(tc::FusionStrategy::Preserve3Coincident)
I0322 13:18:21.392488 76890 genetic_tuning_harness.cc:542] .outerScheduleAllowSkewing(false)
I0322 13:18:21.392488 76890 genetic_tuning_harness.cc:542] .outerSchedulePositiveOrthant(true)
I0322 13:18:21.392490 76890 genetic_tuning_harness.cc:542] .intraTileScheduleFusionStrategy(tc::FusionStrategy::Preserve3Coincident)
I0322 13:18:21.392491 76890 genetic_tuning_harness.cc:542] .intraTileScheduleAllowSkewing(false)
I0322 13:18:21.392494 76890 genetic_tuning_harness.cc:542] .intraTileSchedulePositiveOrthant(true)
I0322 13:18:21.392495 76890 genetic_tuning_harness.cc:542] .tile(1)
I0322 13:18:21.392496 76890 genetic_tuning_harness.cc:542] .mapToThreads(250)
I0322 13:18:21.392498 76890 genetic_tuning_harness.cc:542] .mapToBlocks(512)
I0322 13:18:21.392498 76890 genetic_tuning_harness.cc:542] .unroll(32)
I0322 13:18:21.392500 76890 genetic_tuning_harness.cc:542] .tileImperfectlyNested(false)
I0322 13:18:21.392501 76890 genetic_tuning_harness.cc:542] .useSharedMemory(true)
I0322 13:18:21.392503 76890 genetic_tuning_harness.cc:542] .usePrivateMemory(true)
I0322 13:18:21.392503 76890 genetic_tuning_harness.cc:542] .unrollCopyShared(true)
I0322 13:18:21.392504 76890 genetic_tuning_harness.cc:542] .matchLibraryCalls(false);
********************************
examples/results_032218/QuadroGP100/logs/2477270-13/example_MLP_model.INFO
echo CUDA_LAUNCH_BLOCKING=1 ./build/examples/example_MLP_model --gtest_filter=*.MLP3 --B=128 --D=64 --L1=50 --E1=10000000 --L2=50 --E2=10000000 --WX=1000 --WY=1024 --M=2000 --N=128 --O=64 --P=32 --Q=2 --debug_tuner=true --dump_cuda=true --disable_version_checks=true --log_dir=${LOG_DIR} --autotune=true --tuner_gen_log_generations=true --tuner_threads=${TUNER_THREADS} --tuner_gpus=${TUNER_GPUS} --save_tuner_proto_prefix=${LOG_DIR}/autotuner/ > ${LOG_DIR}/autotuner/MLP3_B_128_D_64_L1_50_E1_10000000_L2_50_E2_10000000_WX_1000_WY_1024_M_2000_N_128_O_64_P_32_Q_2.log && CUDA_LAUNCH_BLOCKING=1 ./build/examples/example_MLP_model --gtest_filter=*.MLP3 --B=128 --D=64 --L1=50 --E1=10000000 --L2=50 --E2=10000000 --WX=1000 --WY=1024 --M=2000 --N=128 --O=64 --P=32 --Q=2 --debug_tuner=true --dump_cuda=true --disable_version_checks=true --log_dir=${LOG_DIR} --autotune=true --tuner_gen_log_generations=true --tuner_threads=${TUNER_THREADS} --tuner_gpus=${TUNER_GPUS} --save_tuner_proto_prefix=${LOG_DIR}/autotuner/ --tuner_gen_restore_from_proto=0 >> ${LOG_DIR}/autotuner/MLP3_B_128_D_64_L1_50_E1_10000000_L2_50_E2_10000000_WX_1000_WY_1024_M_2000_N_128_O_64_P_32_Q_2.log 2>&1
[TUNER][GENERATION LOG] median times of each candidate (in us) 20 21 21 22 26 26 27 28 28 28 29 30 30 30 31 31 32 32 32 32 33 33 33 33 33 35 35 35 36 36 36 36 37 37 38 39 39 39 40 41 41 42 42 42 42 42 42 42 43 43 43 44 45 45 47 47 49 49 49 50 50 50 51 52 53 53 54 54 55 55 57 57 57 57 58 59 59 60 61 61 63 64 64 68 70 73 75 75
I0322 13:56:05.702594 66488 printer.cc:63] 
Generation 24 Jobs(Compiled, GPU)/total (100, 100)/100 (best/median/worst)us: 20/42/75
I0322 13:56:05.702610 66488 printer.cc:70] 
Generation 24 Jobs(Compiled, GPU)/total (100, 100)/100 (best/median/worst)us: 20/42/75
I0322 13:56:05.702731 48104 genetic_tuning_harness.cc:538] [TUNER][GENERATION LOG] best option so far:
I0322 13:56:05.702731 48104 genetic_tuning_harness.cc:538] [TUNER][GENERATION LOG] best option so far:
I0322 13:56:05.703217 48104 genetic_tuning_harness.cc:542] tc::MappingOptions::makeNaiveMappingOptions()
I0322 13:56:05.703224 48104 genetic_tuning_harness.cc:542] .outerScheduleFusionStrategy(tc::FusionStrategy::Max)
I0322 13:56:05.703228 48104 genetic_tuning_harness.cc:542] .outerScheduleAllowSkewing(false)
I0322 13:56:05.703232 48104 genetic_tuning_harness.cc:542] .outerSchedulePositiveOrthant(true)
I0322 13:56:05.703234 48104 genetic_tuning_harness.cc:542] .intraTileScheduleFusionStrategy(tc::FusionStrategy::Preserve3Coincident)
I0322 13:56:05.703238 48104 genetic_tuning_harness.cc:542] .intraTileScheduleAllowSkewing(false)
I0322 13:56:05.703240 48104 genetic_tuning_harness.cc:542] .intraTileSchedulePositiveOrthant(true)
I0322 13:56:05.703244 48104 genetic_tuning_harness.cc:542] .tile(4, 16, 4)
I0322 13:56:05.703248 48104 genetic_tuning_harness.cc:542] .mapToThreads(32, 8)
I0322 13:56:05.703250 48104 genetic_tuning_harness.cc:542] .mapToBlocks(256, 8, 16)
I0322 13:56:05.703253 48104 genetic_tuning_harness.cc:542] .unroll(32)
I0322 13:56:05.703256 48104 genetic_tuning_harness.cc:542] .tileImperfectlyNested(false)
I0322 13:56:05.703259 48104 genetic_tuning_harness.cc:542] .useSharedMemory(true)
I0322 13:56:05.703263 48104 genetic_tuning_harness.cc:542] .usePrivateMemory(true)
I0322 13:56:05.703265 48104 genetic_tuning_harness.cc:542] .unrollCopyShared(true)
I0322 13:56:05.703269 48104 genetic_tuning_harness.cc:542] .matchLibraryCalls(true);
********************************
examples/results_032218/QuadroGP100/logs/2477270-14/example_MLP_model.INFO
echo CUDA_LAUNCH_BLOCKING=1 ./build/examples/example_MLP_model --gtest_filter=*.MLP3 --B=128 --D=64 --L1=50 --E1=10000000 --L2=50 --E2=10000000 --WX=1000 --WY=1024 --M=2000 --N=128 --O=64 --P=32 --Q=2 --debug_tuner=true --dump_cuda=true --disable_version_checks=true --log_dir=${LOG_DIR} --autotune=true --tuner_gen_log_generations=true --tuner_threads=${TUNER_THREADS} --tuner_gpus=${TUNER_GPUS} --save_tuner_proto_prefix=${LOG_DIR}/autotuner/ > ${LOG_DIR}/autotuner/MLP3_B_128_D_64_L1_50_E1_10000000_L2_50_E2_10000000_WX_1000_WY_1024_M_2000_N_128_O_64_P_32_Q_2.log && CUDA_LAUNCH_BLOCKING=1 ./build/examples/example_MLP_model --gtest_filter=*.MLP3 --B=128 --D=64 --L1=50 --E1=10000000 --L2=50 --E2=10000000 --WX=1000 --WY=1024 --M=2000 --N=128 --O=64 --P=32 --Q=2 --debug_tuner=true --dump_cuda=true --disable_version_checks=true --log_dir=${LOG_DIR} --autotune=true --tuner_gen_log_generations=true --tuner_threads=${TUNER_THREADS} --tuner_gpus=${TUNER_GPUS} --save_tuner_proto_prefix=${LOG_DIR}/autotuner/ --tuner_gen_restore_from_proto=0 >> ${LOG_DIR}/autotuner/MLP3_B_128_D_64_L1_50_E1_10000000_L2_50_E2_10000000_WX_1000_WY_1024_M_2000_N_128_O_64_P_32_Q_2.log 2>&1
[TUNER][GENERATION LOG] median times of each candidate (in us) 19 20 22 23 23 23 24 25 26 26 26 26 26 27 27 27 27 27 28 28 28 28 28 29 29 29 29 29 29 30 31 31 32 32 32 32 32 33 33 34 34 34 35 35 35 35 35 36 36 36 36 37 37 38 38 39 40 41 41 41 42 42 43 46 47 48 49 52 53 53 54 60 61 62 62 62 63 64 65 72 73 74 77 79
I0322 13:55:46.409674 60456 printer.cc:63] 
Generation 24 Jobs(Compiled, GPU)/total (100, 100)/100 (best/median/worst)us: 19/35/79
I0322 13:55:46.409690 60456 printer.cc:70] 
Generation 24 Jobs(Compiled, GPU)/total (100, 100)/100 (best/median/worst)us: 19/35/79
I0322 13:55:46.409806 42344 genetic_tuning_harness.cc:538] [TUNER][GENERATION LOG] best option so far:
I0322 13:55:46.409806 42344 genetic_tuning_harness.cc:538] [TUNER][GENERATION LOG] best option so far:
I0322 13:55:46.410284 42344 genetic_tuning_harness.cc:542] tc::MappingOptions::makeNaiveMappingOptions()
I0322 13:55:46.410292 42344 genetic_tuning_harness.cc:542] .outerScheduleFusionStrategy(tc::FusionStrategy::Max)
I0322 13:55:46.410296 42344 genetic_tuning_harness.cc:542] .outerScheduleAllowSkewing(false)
I0322 13:55:46.410300 42344 genetic_tuning_harness.cc:542] .outerSchedulePositiveOrthant(true)
I0322 13:55:46.410302 42344 genetic_tuning_harness.cc:542] .intraTileScheduleFusionStrategy(tc::FusionStrategy::Preserve3Coincident)
I0322 13:55:46.410305 42344 genetic_tuning_harness.cc:542] .intraTileScheduleAllowSkewing(false)
I0322 13:55:46.410308 42344 genetic_tuning_harness.cc:542] .intraTileSchedulePositiveOrthant(true)
I0322 13:55:46.410312 42344 genetic_tuning_harness.cc:542] .tile(4, 8)
I0322 13:55:46.410315 42344 genetic_tuning_harness.cc:542] .mapToThreads(64, 4, 2)
I0322 13:55:46.410318 42344 genetic_tuning_harness.cc:542] .mapToBlocks(64, 256)
I0322 13:55:46.410321 42344 genetic_tuning_harness.cc:542] .unroll(32)
I0322 13:55:46.410324 42344 genetic_tuning_harness.cc:542] .tileImperfectlyNested(false)
I0322 13:55:46.410327 42344 genetic_tuning_harness.cc:542] .useSharedMemory(true)
I0322 13:55:46.410331 42344 genetic_tuning_harness.cc:542] .usePrivateMemory(true)
I0322 13:55:46.410333 42344 genetic_tuning_harness.cc:542] .unrollCopyShared(true)
I0322 13:55:46.410336 42344 genetic_tuning_harness.cc:542] .matchLibraryCalls(false);
********************************
examples/results_032218/QuadroGP100/logs/2477270-15/example_MLP_model.INFO
echo CUDA_LAUNCH_BLOCKING=1 ./build/examples/example_MLP_model --gtest_filter=*.MLP3 --B=128 --D=64 --L1=50 --E1=10000000 --L2=50 --E2=10000000 --WX=1000 --WY=1024 --M=2000 --N=128 --O=64 --P=32 --Q=2 --debug_tuner=true --dump_cuda=true --disable_version_checks=true --log_dir=${LOG_DIR} --autotune=true --tuner_gen_log_generations=true --tuner_threads=${TUNER_THREADS} --tuner_gpus=${TUNER_GPUS} --save_tuner_proto_prefix=${LOG_DIR}/autotuner/ > ${LOG_DIR}/autotuner/MLP3_B_128_D_64_L1_50_E1_10000000_L2_50_E2_10000000_WX_1000_WY_1024_M_2000_N_128_O_64_P_32_Q_2.log && CUDA_LAUNCH_BLOCKING=1 ./build/examples/example_MLP_model --gtest_filter=*.MLP3 --B=128 --D=64 --L1=50 --E1=10000000 --L2=50 --E2=10000000 --WX=1000 --WY=1024 --M=2000 --N=128 --O=64 --P=32 --Q=2 --debug_tuner=true --dump_cuda=true --disable_version_checks=true --log_dir=${LOG_DIR} --autotune=true --tuner_gen_log_generations=true --tuner_threads=${TUNER_THREADS} --tuner_gpus=${TUNER_GPUS} --save_tuner_proto_prefix=${LOG_DIR}/autotuner/ --tuner_gen_restore_from_proto=0 >> ${LOG_DIR}/autotuner/MLP3_B_128_D_64_L1_50_E1_10000000_L2_50_E2_10000000_WX_1000_WY_1024_M_2000_N_128_O_64_P_32_Q_2.log 2>&1
[TUNER][GENERATION LOG] median times of each candidate (in us) 20 21 22 22 24 25 25 26 26 27 27 31 31 31 31 31 32 32 32 32 32 34 34 34 34 34 34 35 35 35 35 36 36 37 37 37 37 37 37 38 38 38 38 38 39 39 39 40 40 41 41 41 42 45 45 46 47 48 49 49 50 52 53 58 59 62 63 64 65 65 68 69 69 71 73 74 76 77 77 78 79 86 87 87 92
I0322 14:10:51.267225 75033 printer.cc:63] 
Generation 24 Jobs(Compiled, GPU)/total (100, 100)/100 (best/median/worst)us: 20/38/92
I0322 14:10:51.267241 75033 printer.cc:70] 
Generation 24 Jobs(Compiled, GPU)/total (100, 100)/100 (best/median/worst)us: 20/38/92
I0322 14:10:51.267319 54913 genetic_tuning_harness.cc:538] [TUNER][GENERATION LOG] best option so far:
I0322 14:10:51.267319 54913 genetic_tuning_harness.cc:538] [TUNER][GENERATION LOG] best option so far:
I0322 14:10:51.267585 54913 genetic_tuning_harness.cc:542] tc::MappingOptions::makeNaiveMappingOptions()
I0322 14:10:51.267592 54913 genetic_tuning_harness.cc:542] .outerScheduleFusionStrategy(tc::FusionStrategy::Max)
I0322 14:10:51.267596 54913 genetic_tuning_harness.cc:542] .outerScheduleAllowSkewing(false)
I0322 14:10:51.267599 54913 genetic_tuning_harness.cc:542] .outerSchedulePositiveOrthant(true)
I0322 14:10:51.267602 54913 genetic_tuning_harness.cc:542] .intraTileScheduleFusionStrategy(tc::FusionStrategy::Preserve3Coincident)
I0322 14:10:51.267606 54913 genetic_tuning_harness.cc:542] .intraTileScheduleAllowSkewing(false)
I0322 14:10:51.267608 54913 genetic_tuning_harness.cc:542] .intraTileSchedulePositiveOrthant(true)
I0322 14:10:51.267611 54913 genetic_tuning_harness.cc:542] .tile(4, 128)
I0322 14:10:51.267614 54913 genetic_tuning_harness.cc:542] .mapToThreads(128, 4)
I0322 14:10:51.267617 54913 genetic_tuning_harness.cc:542] .mapToBlocks(128)
I0322 14:10:51.267619 54913 genetic_tuning_harness.cc:542] .unroll(32)
I0322 14:10:51.267622 54913 genetic_tuning_harness.cc:542] .tileImperfectlyNested(false)
I0322 14:10:51.267626 54913 genetic_tuning_harness.cc:542] .useSharedMemory(true)
I0322 14:10:51.267629 54913 genetic_tuning_harness.cc:542] .usePrivateMemory(true)
I0322 14:10:51.267632 54913 genetic_tuning_harness.cc:542] .unrollCopyShared(true)
I0322 14:10:51.267634 54913 genetic_tuning_harness.cc:542] .matchLibraryCalls(true);
********************************
examples/results_032218/QuadroGP100/logs/2477270-16/example_MLP_model.INFO
echo CUDA_LAUNCH_BLOCKING=1 ./build/examples/example_MLP_model --gtest_filter=*.MLP3 --B=128 --D=64 --L1=50 --E1=10000000 --L2=50 --E2=10000000 --WX=1000 --WY=1024 --M=2000 --N=128 --O=64 --P=32 --Q=2 --debug_tuner=true --dump_cuda=true --disable_version_checks=true --log_dir=${LOG_DIR} --autotune=true --tuner_gen_log_generations=true --tuner_threads=${TUNER_THREADS} --tuner_gpus=${TUNER_GPUS} --save_tuner_proto_prefix=${LOG_DIR}/autotuner/ > ${LOG_DIR}/autotuner/MLP3_B_128_D_64_L1_50_E1_10000000_L2_50_E2_10000000_WX_1000_WY_1024_M_2000_N_128_O_64_P_32_Q_2.log && CUDA_LAUNCH_BLOCKING=1 ./build/examples/example_MLP_model --gtest_filter=*.MLP3 --B=128 --D=64 --L1=50 --E1=10000000 --L2=50 --E2=10000000 --WX=1000 --WY=1024 --M=2000 --N=128 --O=64 --P=32 --Q=2 --debug_tuner=true --dump_cuda=true --disable_version_checks=true --log_dir=${LOG_DIR} --autotune=true --tuner_gen_log_generations=true --tuner_threads=${TUNER_THREADS} --tuner_gpus=${TUNER_GPUS} --save_tuner_proto_prefix=${LOG_DIR}/autotuner/ --tuner_gen_restore_from_proto=0 >> ${LOG_DIR}/autotuner/MLP3_B_128_D_64_L1_50_E1_10000000_L2_50_E2_10000000_WX_1000_WY_1024_M_2000_N_128_O_64_P_32_Q_2.log 2>&1
I0322 13:41:14.944701 62088 printer.cc:63] 
Generation 14 Jobs(Compiled, GPU)/total (100, 99)/100 (best/median/worst)us: 24/42/104
I0322 13:41:15.944895 62088 printer.cc:63] 
Generation 14 Jobs(Compiled, GPU)/total (100, 99)/100 (best/median/worst)us: 24/42/104
I0322 13:41:16.945083 62088 printer.cc:63] 
Generation 14 Jobs(Compiled, GPU)/total (100, 99)/100 (best/median/worst)us: 24/42/104
I0322 13:41:17.945266 62088 printer.cc:63] 
Generation 14 Jobs(Compiled, GPU)/total (100, 99)/100 (best/median/worst)us: 24/42/104
********************************
examples/results_032218/QuadroGP100/logs/2477270-17/example_MLP_model.INFO
echo CUDA_LAUNCH_BLOCKING=1 ./build/examples/example_MLP_model --gtest_filter=*.MLP3 --B=128 --D=64 --L1=50 --E1=10000000 --L2=50 --E2=10000000 --WX=1000 --WY=1024 --M=2000 --N=128 --O=64 --P=32 --Q=2 --debug_tuner=true --dump_cuda=true --disable_version_checks=true --log_dir=${LOG_DIR} --autotune=true --tuner_gen_log_generations=true --tuner_threads=${TUNER_THREADS} --tuner_gpus=${TUNER_GPUS} --save_tuner_proto_prefix=${LOG_DIR}/autotuner/ > ${LOG_DIR}/autotuner/MLP3_B_128_D_64_L1_50_E1_10000000_L2_50_E2_10000000_WX_1000_WY_1024_M_2000_N_128_O_64_P_32_Q_2.log && CUDA_LAUNCH_BLOCKING=1 ./build/examples/example_MLP_model --gtest_filter=*.MLP3 --B=128 --D=64 --L1=50 --E1=10000000 --L2=50 --E2=10000000 --WX=1000 --WY=1024 --M=2000 --N=128 --O=64 --P=32 --Q=2 --debug_tuner=true --dump_cuda=true --disable_version_checks=true --log_dir=${LOG_DIR} --autotune=true --tuner_gen_log_generations=true --tuner_threads=${TUNER_THREADS} --tuner_gpus=${TUNER_GPUS} --save_tuner_proto_prefix=${LOG_DIR}/autotuner/ --tuner_gen_restore_from_proto=0 >> ${LOG_DIR}/autotuner/MLP3_B_128_D_64_L1_50_E1_10000000_L2_50_E2_10000000_WX_1000_WY_1024_M_2000_N_128_O_64_P_32_Q_2.log 2>&1
[TUNER][GENERATION LOG] median times of each candidate (in us) 28 31 32 34 34 34 35 36 36 36 37 37 38 38 38 38 38 39 39 39 39 39 40 40 40 41 41 41 41 42 42 43 43 43 43 44 45 45 47 48 48 48 48 51 52 52 52 56 56 56 57 58 59 60 60 65 66 66 68 74 75 76 77 79 80 80 82 82 82 84 85 91 102 105 105 108 108 111 119 119 123 124 129 130 131 133
I0322 13:55:44.639704 73565 printer.cc:63] 
Generation 24 Jobs(Compiled, GPU)/total (100, 100)/100 (best/median/worst)us: 28/51/133
I0322 13:55:44.639727 73565 printer.cc:70] 
Generation 24 Jobs(Compiled, GPU)/total (100, 100)/100 (best/median/worst)us: 28/51/133
I0322 13:55:44.639823 41319 genetic_tuning_harness.cc:538] [TUNER][GENERATION LOG] best option so far:
I0322 13:55:44.639823 41319 genetic_tuning_harness.cc:538] [TUNER][GENERATION LOG] best option so far:
I0322 13:55:44.640282 41319 genetic_tuning_harness.cc:542] tc::MappingOptions::makeNaiveMappingOptions()
I0322 13:55:44.640293 41319 genetic_tuning_harness.cc:542] .outerScheduleFusionStrategy(tc::FusionStrategy::Max)
I0322 13:55:44.640296 41319 genetic_tuning_harness.cc:542] .outerScheduleAllowSkewing(false)
I0322 13:55:44.640300 41319 genetic_tuning_harness.cc:542] .outerSchedulePositiveOrthant(true)
I0322 13:55:44.640302 41319 genetic_tuning_harness.cc:542] .intraTileScheduleFusionStrategy(tc::FusionStrategy::Max)
I0322 13:55:44.640305 41319 genetic_tuning_harness.cc:542] .intraTileScheduleAllowSkewing(false)
I0322 13:55:44.640310 41319 genetic_tuning_harness.cc:542] .intraTileSchedulePositiveOrthant(true)
I0322 13:55:44.640312 41319 genetic_tuning_harness.cc:542] .tile(4, 128, 4)
I0322 13:55:44.640316 41319 genetic_tuning_harness.cc:542] .mapToThreads(256)
I0322 13:55:44.640318 41319 genetic_tuning_harness.cc:542] .mapToBlocks(32, 8)
I0322 13:55:44.640321 41319 genetic_tuning_harness.cc:542] .unroll(4)
I0322 13:55:44.640324 41319 genetic_tuning_harness.cc:542] .tileImperfectlyNested(false)
I0322 13:55:44.640327 41319 genetic_tuning_harness.cc:542] .useSharedMemory(true)
I0322 13:55:44.640337 41319 genetic_tuning_harness.cc:542] .usePrivateMemory(true)
I0322 13:55:44.640341 41319 genetic_tuning_harness.cc:542] .unrollCopyShared(true)
I0322 13:55:44.640343 41319 genetic_tuning_harness.cc:542] .matchLibraryCalls(false);
********************************
examples/results_032218/QuadroGP100/logs/2477270-18/example_MLP_model.INFO
echo CUDA_LAUNCH_BLOCKING=1 ./build/examples/example_MLP_model --gtest_filter=*.MLP3 --B=128 --D=64 --L1=50 --E1=10000000 --L2=50 --E2=10000000 --WX=1000 --WY=1024 --M=2000 --N=128 --O=64 --P=32 --Q=2 --debug_tuner=true --dump_cuda=true --disable_version_checks=true --log_dir=${LOG_DIR} --autotune=true --tuner_gen_log_generations=true --tuner_threads=${TUNER_THREADS} --tuner_gpus=${TUNER_GPUS} --save_tuner_proto_prefix=${LOG_DIR}/autotuner/ > ${LOG_DIR}/autotuner/MLP3_B_128_D_64_L1_50_E1_10000000_L2_50_E2_10000000_WX_1000_WY_1024_M_2000_N_128_O_64_P_32_Q_2.log && CUDA_LAUNCH_BLOCKING=1 ./build/examples/example_MLP_model --gtest_filter=*.MLP3 --B=128 --D=64 --L1=50 --E1=10000000 --L2=50 --E2=10000000 --WX=1000 --WY=1024 --M=2000 --N=128 --O=64 --P=32 --Q=2 --debug_tuner=true --dump_cuda=true --disable_version_checks=true --log_dir=${LOG_DIR} --autotune=true --tuner_gen_log_generations=true --tuner_threads=${TUNER_THREADS} --tuner_gpus=${TUNER_GPUS} --save_tuner_proto_prefix=${LOG_DIR}/autotuner/ --tuner_gen_restore_from_proto=0 >> ${LOG_DIR}/autotuner/MLP3_B_128_D_64_L1_50_E1_10000000_L2_50_E2_10000000_WX_1000_WY_1024_M_2000_N_128_O_64_P_32_Q_2.log 2>&1
[TUNER][GENERATION LOG] median times of each candidate (in us) 21 22 22 23 24 24 24 25 25 27 28 28 28 29 30 30 30 31 32 33 34 34 34 35 36 36 36 39 39 40 40 41 42 42 43 43 43 44 44 45 45 47 48 48 51 51 52 53 54 54 56 60 60 61 62 62 63 63 64 64 64 67 68 70 70 71 73 74 76 77 77 78 78 79 80 80 89 95 101 105
I0322 14:03:16.618028 57483 printer.cc:63] 
Generation 24 Jobs(Compiled, GPU)/total (100, 100)/100 (best/median/worst)us: 21/45/105
I0322 14:03:16.618050 57483 printer.cc:70] 
Generation 24 Jobs(Compiled, GPU)/total (100, 100)/100 (best/median/worst)us: 21/45/105
I0322 14:03:16.618175 39836 genetic_tuning_harness.cc:538] [TUNER][GENERATION LOG] best option so far:
I0322 14:03:16.618175 39836 genetic_tuning_harness.cc:538] [TUNER][GENERATION LOG] best option so far:
I0322 14:03:16.618588 39836 genetic_tuning_harness.cc:542] tc::MappingOptions::makeNaiveMappingOptions()
I0322 14:03:16.618597 39836 genetic_tuning_harness.cc:542] .outerScheduleFusionStrategy(tc::FusionStrategy::Max)
I0322 14:03:16.618599 39836 genetic_tuning_harness.cc:542] .outerScheduleAllowSkewing(false)
I0322 14:03:16.618602 39836 genetic_tuning_harness.cc:542] .outerSchedulePositiveOrthant(true)
I0322 14:03:16.618605 39836 genetic_tuning_harness.cc:542] .intraTileScheduleFusionStrategy(tc::FusionStrategy::Min)
I0322 14:03:16.618608 39836 genetic_tuning_harness.cc:542] .intraTileScheduleAllowSkewing(false)
I0322 14:03:16.618610 39836 genetic_tuning_harness.cc:542] .intraTileSchedulePositiveOrthant(true)
I0322 14:03:16.618613 39836 genetic_tuning_harness.cc:542] .tile(4, 4)
I0322 14:03:16.618616 39836 genetic_tuning_harness.cc:542] .mapToThreads(64, 4)
I0322 14:03:16.618619 39836 genetic_tuning_harness.cc:542] .mapToBlocks(256)
I0322 14:03:16.618623 39836 genetic_tuning_harness.cc:542] .unroll(64)
I0322 14:03:16.618624 39836 genetic_tuning_harness.cc:542] .tileImperfectlyNested(false)
I0322 14:03:16.618628 39836 genetic_tuning_harness.cc:542] .useSharedMemory(true)
I0322 14:03:16.618630 39836 genetic_tuning_harness.cc:542] .usePrivateMemory(false)
I0322 14:03:16.618633 39836 genetic_tuning_harness.cc:542] .unrollCopyShared(true)
I0322 14:03:16.618636 39836 genetic_tuning_harness.cc:542] .matchLibraryCalls(true);
********************************
examples/results_032218/QuadroGP100/logs/2477270-19/example_MLP_model.INFO
echo CUDA_LAUNCH_BLOCKING=1 ./build/examples/example_MLP_model --gtest_filter=*.MLP3 --B=128 --D=64 --L1=50 --E1=10000000 --L2=50 --E2=10000000 --WX=1000 --WY=1024 --M=2000 --N=128 --O=64 --P=32 --Q=2 --debug_tuner=true --dump_cuda=true --disable_version_checks=true --log_dir=${LOG_DIR} --autotune=true --tuner_gen_log_generations=true --tuner_threads=${TUNER_THREADS} --tuner_gpus=${TUNER_GPUS} --save_tuner_proto_prefix=${LOG_DIR}/autotuner/ > ${LOG_DIR}/autotuner/MLP3_B_128_D_64_L1_50_E1_10000000_L2_50_E2_10000000_WX_1000_WY_1024_M_2000_N_128_O_64_P_32_Q_2.log && CUDA_LAUNCH_BLOCKING=1 ./build/examples/example_MLP_model --gtest_filter=*.MLP3 --B=128 --D=64 --L1=50 --E1=10000000 --L2=50 --E2=10000000 --WX=1000 --WY=1024 --M=2000 --N=128 --O=64 --P=32 --Q=2 --debug_tuner=true --dump_cuda=true --disable_version_checks=true --log_dir=${LOG_DIR} --autotune=true --tuner_gen_log_generations=true --tuner_threads=${TUNER_THREADS} --tuner_gpus=${TUNER_GPUS} --save_tuner_proto_prefix=${LOG_DIR}/autotuner/ --tuner_gen_restore_from_proto=0 >> ${LOG_DIR}/autotuner/MLP3_B_128_D_64_L1_50_E1_10000000_L2_50_E2_10000000_WX_1000_WY_1024_M_2000_N_128_O_64_P_32_Q_2.log 2>&1
[TUNER][GENERATION LOG] median times of each candidate (in us) 23 23 23 24 24 25 26 28 28 28 29 29 30 30 30 31 31 31 32 33 33 33 33 33 34 34 34 35 35 36 36 37 37 37 38 39 39 39 39 39 40 40 40 40 42 43 43 43 44 44 44 45 47 47 48 49 50 50 55 56 59 59 60 61 61 61 61 61 62 62 62 65 69 74 74 75 75 76 76 78 88 93
I0322 14:20:57.084686 60434 printer.cc:63] 
Generation 24 Jobs(Compiled, GPU)/total (100, 100)/100 (best/median/worst)us: 23/40/93
I0322 14:20:57.084702 60434 printer.cc:70] 
Generation 24 Jobs(Compiled, GPU)/total (100, 100)/100 (best/median/worst)us: 23/40/93
I0322 14:20:57.084834 37020 genetic_tuning_harness.cc:538] [TUNER][GENERATION LOG] best option so far:
I0322 14:20:57.084834 37020 genetic_tuning_harness.cc:538] [TUNER][GENERATION LOG] best option so far:
I0322 14:20:57.085300 37020 genetic_tuning_harness.cc:542] tc::MappingOptions::makeNaiveMappingOptions()
I0322 14:20:57.085309 37020 genetic_tuning_harness.cc:542] .outerScheduleFusionStrategy(tc::FusionStrategy::Max)
I0322 14:20:57.085312 37020 genetic_tuning_harness.cc:542] .outerScheduleAllowSkewing(false)
I0322 14:20:57.085315 37020 genetic_tuning_harness.cc:542] .outerSchedulePositiveOrthant(true)
I0322 14:20:57.085319 37020 genetic_tuning_harness.cc:542] .intraTileScheduleFusionStrategy(tc::FusionStrategy::Preserve3Coincident)
I0322 14:20:57.085321 37020 genetic_tuning_harness.cc:542] .intraTileScheduleAllowSkewing(false)
I0322 14:20:57.085325 37020 genetic_tuning_harness.cc:542] .intraTileSchedulePositiveOrthant(true)
I0322 14:20:57.085326 37020 genetic_tuning_harness.cc:542] .tile(4, 32)
I0322 14:20:57.085330 37020 genetic_tuning_harness.cc:542] .mapToThreads(64, 8)
I0322 14:20:57.085332 37020 genetic_tuning_harness.cc:542] .mapToBlocks(128, 8)
I0322 14:20:57.085335 37020 genetic_tuning_harness.cc:542] .unroll(128)
I0322 14:20:57.085337 37020 genetic_tuning_harness.cc:542] .tileImperfectlyNested(false)
I0322 14:20:57.085340 37020 genetic_tuning_harness.cc:542] .useSharedMemory(true)
I0322 14:20:57.085343 37020 genetic_tuning_harness.cc:542] .usePrivateMemory(false)
I0322 14:20:57.085346 37020 genetic_tuning_harness.cc:542] .unrollCopyShared(true)
I0322 14:20:57.085350 37020 genetic_tuning_harness.cc:542] .matchLibraryCalls(true);
********************************
examples/results_032218/QuadroGP100/logs/2477270-1/example_batchmatmul.INFO
echo CUDA_LAUNCH_BLOCKING=1 ./build/examples/example_batchmatmul --gtest_filter=*.TransposedBatchMatMul --B=500 --K=26 --M=72 --N=26 --debug_tuner=true --dump_cuda=true --disable_version_checks=true --log_dir=${LOG_DIR} --autotune=true --tuner_gen_log_generations=true --tuner_threads=${TUNER_THREADS} --tuner_gpus=${TUNER_GPUS} --save_tuner_proto_prefix=${LOG_DIR}/autotuner/ > ${LOG_DIR}/autotuner/TransposedBatchMatMul_B_500_K_26_M_72_N_26.log && CUDA_LAUNCH_BLOCKING=1 ./build/examples/example_batchmatmul --gtest_filter=*.TransposedBatchMatMul --B=500 --K=26 --M=72 --N=26 --debug_tuner=true --dump_cuda=true --disable_version_checks=true --log_dir=${LOG_DIR} --autotune=true --tuner_gen_log_generations=true --tuner_threads=${TUNER_THREADS} --tuner_gpus=${TUNER_GPUS} --save_tuner_proto_prefix=${LOG_DIR}/autotuner/ --tuner_gen_restore_from_proto=0 >> ${LOG_DIR}/autotuner/TransposedBatchMatMul_B_500_K_26_M_72_N_26.log 2>&1
[TUNER][GENERATION LOG] median times of each candidate (in us) 65 65 65 65 65 66 66 66 68 68 69 69 69 69 69 69 70 70 70 70 70 70 70 71 72 73 73 76 76 76 76 76 77 77 81 82 83 84 84 84 91 91 91 92 92 92 92 93 95 95 96 96 97 104 106 109 110 115 118 119 122 125 126 129 146 146 146 155 161 162 167 173 175 184 194 194 196 197 198 199 201 203 208 219 228 251 273 274 280 290
I0322 13:22:15.669152 15134 printer.cc:63] 
Generation 24 Jobs(Compiled, GPU)/total (100, 100)/100 (best/median/worst)us: 65/92/290
I0322 13:22:15.669193 15134 printer.cc:70] 
Generation 24 Jobs(Compiled, GPU)/total (100, 100)/100 (best/median/worst)us: 65/92/290
I0322 13:22:15.669320 6113 genetic_tuning_harness.cc:538] [TUNER][GENERATION LOG] best option so far:
I0322 13:22:15.669320 6113 genetic_tuning_harness.cc:538] [TUNER][GENERATION LOG] best option so far:
I0322 13:22:15.669612 6113 genetic_tuning_harness.cc:542] tc::MappingOptions::makeNaiveMappingOptions()
I0322 13:22:15.669618 6113 genetic_tuning_harness.cc:542] .outerScheduleFusionStrategy(tc::FusionStrategy::Preserve3Coincident)
I0322 13:22:15.669621 6113 genetic_tuning_harness.cc:542] .outerScheduleAllowSkewing(false)
I0322 13:22:15.669625 6113 genetic_tuning_harness.cc:542] .outerSchedulePositiveOrthant(true)
I0322 13:22:15.669627 6113 genetic_tuning_harness.cc:542] .intraTileScheduleFusionStrategy(tc::FusionStrategy::Preserve3Coincident)
I0322 13:22:15.669629 6113 genetic_tuning_harness.cc:542] .intraTileScheduleAllowSkewing(false)
I0322 13:22:15.669632 6113 genetic_tuning_harness.cc:542] .intraTileSchedulePositiveOrthant(true)
I0322 13:22:15.669636 6113 genetic_tuning_harness.cc:542] .tile(1)
I0322 13:22:15.669638 6113 genetic_tuning_harness.cc:542] .mapToThreads(128)
I0322 13:22:15.669641 6113 genetic_tuning_harness.cc:542] .mapToBlocks(500)
I0322 13:22:15.669643 6113 genetic_tuning_harness.cc:542] .unroll(256)
I0322 13:22:15.669646 6113 genetic_tuning_harness.cc:542] .tileImperfectlyNested(false)
I0322 13:22:15.669649 6113 genetic_tuning_harness.cc:542] .useSharedMemory(true)
I0322 13:22:15.669652 6113 genetic_tuning_harness.cc:542] .usePrivateMemory(true)
I0322 13:22:15.669656 6113 genetic_tuning_harness.cc:542] .unrollCopyShared(false)
I0322 13:22:15.669657 6113 genetic_tuning_harness.cc:542] .matchLibraryCalls(false);
********************************
examples/results_032218/QuadroGP100/logs/2477270-20/example_MLP_model.INFO
echo CUDA_LAUNCH_BLOCKING=1 ./build/examples/example_MLP_model --gtest_filter=*.MLP3 --B=128 --D=64 --L1=50 --E1=10000000 --L2=50 --E2=10000000 --WX=1000 --WY=1024 --M=2000 --N=128 --O=64 --P=32 --Q=2 --debug_tuner=true --dump_cuda=true --disable_version_checks=true --log_dir=${LOG_DIR} --autotune=true --tuner_gen_log_generations=true --tuner_threads=${TUNER_THREADS} --tuner_gpus=${TUNER_GPUS} --save_tuner_proto_prefix=${LOG_DIR}/autotuner/ > ${LOG_DIR}/autotuner/MLP3_B_128_D_64_L1_50_E1_10000000_L2_50_E2_10000000_WX_1000_WY_1024_M_2000_N_128_O_64_P_32_Q_2.log && CUDA_LAUNCH_BLOCKING=1 ./build/examples/example_MLP_model --gtest_filter=*.MLP3 --B=128 --D=64 --L1=50 --E1=10000000 --L2=50 --E2=10000000 --WX=1000 --WY=1024 --M=2000 --N=128 --O=64 --P=32 --Q=2 --debug_tuner=true --dump_cuda=true --disable_version_checks=true --log_dir=${LOG_DIR} --autotune=true --tuner_gen_log_generations=true --tuner_threads=${TUNER_THREADS} --tuner_gpus=${TUNER_GPUS} --save_tuner_proto_prefix=${LOG_DIR}/autotuner/ --tuner_gen_restore_from_proto=0 >> ${LOG_DIR}/autotuner/MLP3_B_128_D_64_L1_50_E1_10000000_L2_50_E2_10000000_WX_1000_WY_1024_M_2000_N_128_O_64_P_32_Q_2.log 2>&1
[TUNER][GENERATION LOG] median times of each candidate (in us) 22 22 25 26 26 27 27 28 28 29 29 29 29 29 29 30 30 31 31 32 32 32 33 33 33 33 33 34 34 34 35 35 35 35 35 35 35 35 35 35 35 35 36 36 36 37 37 37 38 38 38 38 39 39 40 40 42 43 44 46 47 47 47 47 47 47 48 49 50 52 57 58 62 63 64 66 66 67 69 70 70 72 72 72 74 74 74 76 80 82 84 84
I0322 14:06:17.744632 77240 printer.cc:63] 
Generation 24 Jobs(Compiled, GPU)/total (100, 100)/100 (best/median/worst)us: 22/37/84
I0322 14:06:17.744639 77240 printer.cc:70] 
Generation 24 Jobs(Compiled, GPU)/total (100, 100)/100 (best/median/worst)us: 22/37/84
I0322 14:06:17.744685 45347 genetic_tuning_harness.cc:538] [TUNER][GENERATION LOG] best option so far:
I0322 14:06:17.744685 45347 genetic_tuning_harness.cc:538] [TUNER][GENERATION LOG] best option so far:
I0322 14:06:17.745069 45347 genetic_tuning_harness.cc:542] tc::MappingOptions::makeNaiveMappingOptions()
I0322 14:06:17.745076 45347 genetic_tuning_harness.cc:542] .outerScheduleFusionStrategy(tc::FusionStrategy::Max)
I0322 14:06:17.745079 45347 genetic_tuning_harness.cc:542] .outerScheduleAllowSkewing(false)
I0322 14:06:17.745082 45347 genetic_tuning_harness.cc:542] .outerSchedulePositiveOrthant(true)
I0322 14:06:17.745085 45347 genetic_tuning_harness.cc:542] .intraTileScheduleFusionStrategy(tc::FusionStrategy::Preserve3Coincident)
I0322 14:06:17.745088 45347 genetic_tuning_harness.cc:542] .intraTileScheduleAllowSkewing(false)
I0322 14:06:17.745091 45347 genetic_tuning_harness.cc:542] .intraTileSchedulePositiveOrthant(true)
I0322 14:06:17.745095 45347 genetic_tuning_harness.cc:542] .tile(4)
I0322 14:06:17.745097 45347 genetic_tuning_harness.cc:542] .mapToThreads(128, 4)
I0322 14:06:17.745100 45347 genetic_tuning_harness.cc:542] .mapToBlocks(256)
I0322 14:06:17.745103 45347 genetic_tuning_harness.cc:542] .unroll(1)
I0322 14:06:17.745106 45347 genetic_tuning_harness.cc:542] .tileImperfectlyNested(false)
I0322 14:06:17.745110 45347 genetic_tuning_harness.cc:542] .useSharedMemory(true)
I0322 14:06:17.745112 45347 genetic_tuning_harness.cc:542] .usePrivateMemory(true)
I0322 14:06:17.745115 45347 genetic_tuning_harness.cc:542] .unrollCopyShared(false)
I0322 14:06:17.745118 45347 genetic_tuning_harness.cc:542] .matchLibraryCalls(false);
********************************
examples/results_032218/QuadroGP100/logs/2477270-21/example_MLP_model.INFO
echo CUDA_LAUNCH_BLOCKING=1 ./build/examples/example_MLP_model --gtest_filter=*.MLP3 --B=128 --D=64 --L1=50 --E1=10000000 --L2=50 --E2=10000000 --WX=1000 --WY=1024 --M=2000 --N=128 --O=64 --P=32 --Q=2 --debug_tuner=true --dump_cuda=true --disable_version_checks=true --log_dir=${LOG_DIR} --autotune=true --tuner_gen_log_generations=true --tuner_threads=${TUNER_THREADS} --tuner_gpus=${TUNER_GPUS} --save_tuner_proto_prefix=${LOG_DIR}/autotuner/ > ${LOG_DIR}/autotuner/MLP3_B_128_D_64_L1_50_E1_10000000_L2_50_E2_10000000_WX_1000_WY_1024_M_2000_N_128_O_64_P_32_Q_2.log && CUDA_LAUNCH_BLOCKING=1 ./build/examples/example_MLP_model --gtest_filter=*.MLP3 --B=128 --D=64 --L1=50 --E1=10000000 --L2=50 --E2=10000000 --WX=1000 --WY=1024 --M=2000 --N=128 --O=64 --P=32 --Q=2 --debug_tuner=true --dump_cuda=true --disable_version_checks=true --log_dir=${LOG_DIR} --autotune=true --tuner_gen_log_generations=true --tuner_threads=${TUNER_THREADS} --tuner_gpus=${TUNER_GPUS} --save_tuner_proto_prefix=${LOG_DIR}/autotuner/ --tuner_gen_restore_from_proto=0 >> ${LOG_DIR}/autotuner/MLP3_B_128_D_64_L1_50_E1_10000000_L2_50_E2_10000000_WX_1000_WY_1024_M_2000_N_128_O_64_P_32_Q_2.log 2>&1
[TUNER][GENERATION LOG] median times of each candidate (in us) 22 26 27 27 28 28 28 30 30 30 31 32 33 33 34 34 35 35 35 35 35 36 36 36 37 37 37 38 38 39 39 39 39 39 39 39 40 40 41 41 41 42 42 42 42 43 43 43 44 45 46 47 48 48 49 50 50 52 56 57 59 59 60 64 64 65 66 68 68 69 71 74 74 74 78 78 80 84 84 87 91 91
I0322 14:07:53.804105 65151 printer.cc:63] 
Generation 24 Jobs(Compiled, GPU)/total (100, 100)/100 (best/median/worst)us: 22/42/91
I0322 14:07:53.804119 65151 printer.cc:70] 
Generation 24 Jobs(Compiled, GPU)/total (100, 100)/100 (best/median/worst)us: 22/42/91
I0322 14:07:53.804234 45779 genetic_tuning_harness.cc:538] [TUNER][GENERATION LOG] best option so far:
I0322 14:07:53.804234 45779 genetic_tuning_harness.cc:538] [TUNER][GENERATION LOG] best option so far:
I0322 14:07:53.804664 45779 genetic_tuning_harness.cc:542] tc::MappingOptions::makeNaiveMappingOptions()
I0322 14:07:53.804672 45779 genetic_tuning_harness.cc:542] .outerScheduleFusionStrategy(tc::FusionStrategy::Max)
I0322 14:07:53.804674 45779 genetic_tuning_harness.cc:542] .outerScheduleAllowSkewing(false)
I0322 14:07:53.804677 45779 genetic_tuning_harness.cc:542] .outerSchedulePositiveOrthant(true)
I0322 14:07:53.804680 45779 genetic_tuning_harness.cc:542] .intraTileScheduleFusionStrategy(tc::FusionStrategy::Preserve3Coincident)
I0322 14:07:53.804687 45779 genetic_tuning_harness.cc:542] .intraTileScheduleAllowSkewing(false)
I0322 14:07:53.804689 45779 genetic_tuning_harness.cc:542] .intraTileSchedulePositiveOrthant(true)
I0322 14:07:53.804692 45779 genetic_tuning_harness.cc:542] .tile(4)
I0322 14:07:53.804695 45779 genetic_tuning_harness.cc:542] .mapToThreads(128, 4)
I0322 14:07:53.804698 45779 genetic_tuning_harness.cc:542] .mapToBlocks(64, 16, 16)
I0322 14:07:53.804702 45779 genetic_tuning_harness.cc:542] .unroll(128)
I0322 14:07:53.804704 45779 genetic_tuning_harness.cc:542] .tileImperfectlyNested(false)
I0322 14:07:53.804708 45779 genetic_tuning_harness.cc:542] .useSharedMemory(true)
I0322 14:07:53.804711 45779 genetic_tuning_harness.cc:542] .usePrivateMemory(true)
I0322 14:07:53.804714 45779 genetic_tuning_harness.cc:542] .unrollCopyShared(false)
I0322 14:07:53.804718 45779 genetic_tuning_harness.cc:542] .matchLibraryCalls(true);
********************************
examples/results_032218/QuadroGP100/logs/2477270-22/example_MLP_model.INFO
echo CUDA_LAUNCH_BLOCKING=1 ./build/examples/example_MLP_model --gtest_filter=*.MLP3 --B=128 --D=64 --L1=50 --E1=10000000 --L2=50 --E2=10000000 --WX=1000 --WY=1024 --M=2000 --N=128 --O=64 --P=32 --Q=2 --debug_tuner=true --dump_cuda=true --disable_version_checks=true --log_dir=${LOG_DIR} --autotune=true --tuner_gen_log_generations=true --tuner_threads=${TUNER_THREADS} --tuner_gpus=${TUNER_GPUS} --save_tuner_proto_prefix=${LOG_DIR}/autotuner/ > ${LOG_DIR}/autotuner/MLP3_B_128_D_64_L1_50_E1_10000000_L2_50_E2_10000000_WX_1000_WY_1024_M_2000_N_128_O_64_P_32_Q_2.log && CUDA_LAUNCH_BLOCKING=1 ./build/examples/example_MLP_model --gtest_filter=*.MLP3 --B=128 --D=64 --L1=50 --E1=10000000 --L2=50 --E2=10000000 --WX=1000 --WY=1024 --M=2000 --N=128 --O=64 --P=32 --Q=2 --debug_tuner=true --dump_cuda=true --disable_version_checks=true --log_dir=${LOG_DIR} --autotune=true --tuner_gen_log_generations=true --tuner_threads=${TUNER_THREADS} --tuner_gpus=${TUNER_GPUS} --save_tuner_proto_prefix=${LOG_DIR}/autotuner/ --tuner_gen_restore_from_proto=0 >> ${LOG_DIR}/autotuner/MLP3_B_128_D_64_L1_50_E1_10000000_L2_50_E2_10000000_WX_1000_WY_1024_M_2000_N_128_O_64_P_32_Q_2.log 2>&1
[TUNER][GENERATION LOG] median times of each candidate (in us) 20 22 22 27 27 28 28 28 29 29 29 29 31 31 31 31 32 32 32 32 33 33 34 34 34 35 35 35 36 36 36 36 37 37 37 37 37 38 38 38 39 40 40 40 41 41 42 42 43 44 45 48 50 50 50 51 52 52 52 53 55 55 55 55 57 59 59 66 68 71 72 72 73 74 74 77 79 81 81 81 83 85 85
I0322 13:55:06.768587 55151 printer.cc:63] 
Generation 24 Jobs(Compiled, GPU)/total (100, 100)/100 (best/median/worst)us: 20/40/85
I0322 13:55:06.768592 55151 printer.cc:70] 
Generation 24 Jobs(Compiled, GPU)/total (100, 100)/100 (best/median/worst)us: 20/40/85
I0322 13:55:06.768676 35951 genetic_tuning_harness.cc:538] [TUNER][GENERATION LOG] best option so far:
I0322 13:55:06.768676 35951 genetic_tuning_harness.cc:538] [TUNER][GENERATION LOG] best option so far:
I0322 13:55:06.769037 35951 genetic_tuning_harness.cc:542] tc::MappingOptions::makeNaiveMappingOptions()
I0322 13:55:06.769043 35951 genetic_tuning_harness.cc:542] .outerScheduleFusionStrategy(tc::FusionStrategy::Max)
I0322 13:55:06.769045 35951 genetic_tuning_harness.cc:542] .outerScheduleAllowSkewing(false)
I0322 13:55:06.769048 35951 genetic_tuning_harness.cc:542] .outerSchedulePositiveOrthant(true)
I0322 13:55:06.769050 35951 genetic_tuning_harness.cc:542] .intraTileScheduleFusionStrategy(tc::FusionStrategy::Preserve3Coincident)
I0322 13:55:06.769053 35951 genetic_tuning_harness.cc:542] .intraTileScheduleAllowSkewing(false)
I0322 13:55:06.769055 35951 genetic_tuning_harness.cc:542] .intraTileSchedulePositiveOrthant(true)
I0322 13:55:06.769057 35951 genetic_tuning_harness.cc:542] .tile(4)
I0322 13:55:06.769060 35951 genetic_tuning_harness.cc:542] .mapToThreads(64, 4)
I0322 13:55:06.769062 35951 genetic_tuning_harness.cc:542] .mapToBlocks(128, 2)
I0322 13:55:06.769064 35951 genetic_tuning_harness.cc:542] .unroll(4)
I0322 13:55:06.769068 35951 genetic_tuning_harness.cc:542] .tileImperfectlyNested(false)
I0322 13:55:06.769069 35951 genetic_tuning_harness.cc:542] .useSharedMemory(true)
I0322 13:55:06.769084 35951 genetic_tuning_harness.cc:542] .usePrivateMemory(true)
I0322 13:55:06.769088 35951 genetic_tuning_harness.cc:542] .unrollCopyShared(true)
I0322 13:55:06.769090 35951 genetic_tuning_harness.cc:542] .matchLibraryCalls(true);
********************************
examples/results_032218/QuadroGP100/logs/2477270-2/example_group_convolution.INFO
echo CUDA_LAUNCH_BLOCKING=1 ./build/examples/example_group_convolution --gtest_filter=*.GroupConvolution --N=32 --G=32 --C=4 --F=4 --W=56 --H=56 --KW=3 --KH=3 --debug_tuner=true --dump_cuda=true --disable_version_checks=true --log_dir=${LOG_DIR} --autotune=true --tuner_gen_log_generations=true --tuner_threads=${TUNER_THREADS} --tuner_gpus=${TUNER_GPUS} --save_tuner_proto_prefix=${LOG_DIR}/autotuner/ > ${LOG_DIR}/autotuner/GroupConvolution_N_32_G_32_C_4_F_4_W_56_H_56_KW_3_KH_3.log && CUDA_LAUNCH_BLOCKING=1 ./build/examples/example_group_convolution --gtest_filter=*.GroupConvolution --N=32 --G=32 --C=4 --F=4 --W=56 --H=56 --KW=3 --KH=3 --debug_tuner=true --dump_cuda=true --disable_version_checks=true --log_dir=${LOG_DIR} --autotune=true --tuner_gen_log_generations=true --tuner_threads=${TUNER_THREADS} --tuner_gpus=${TUNER_GPUS} --save_tuner_proto_prefix=${LOG_DIR}/autotuner/ --tuner_gen_restore_from_proto=0 >> ${LOG_DIR}/autotuner/GroupConvolution_N_32_G_32_C_4_F_4_W_56_H_56_KW_3_KH_3.log 2>&1
[TUNER][GENERATION LOG] median times of each candidate (in us) 1364 1392 1404 1405 1414 1569 1589 1609 1659 1659 1672 1690 1712 1723 1747 1804 1818 1833 1839 1856 1862 1867 1891 1904 1921 2037 2044 2089 2113 2113 2128 2131 2158 2230 2329 2338 2339 2423 2455 2467 2494 2513 2550 2552 2572 2605 2620 2626 2644 2674 2741 2749 2946 3027 3216 3250 3256 3365 3389 3439 3534 3699 3891 3893 3907 3912 3912 3934 3984 4080 4145 4193 4211 4639 4837 5283 5406 5737 6093
I0322 13:27:38.594058 52835 printer.cc:63] 
Generation 24 Jobs(Compiled, GPU)/total (100, 100)/100 (best/median/worst)us: 1364/2467/6093
I0322 13:27:38.594074 52835 printer.cc:70] 
Generation 24 Jobs(Compiled, GPU)/total (100, 100)/100 (best/median/worst)us: 1364/2467/6093
I0322 13:27:38.594195 46213 genetic_tuning_harness.cc:538] [TUNER][GENERATION LOG] best option so far:
I0322 13:27:38.594195 46213 genetic_tuning_harness.cc:538] [TUNER][GENERATION LOG] best option so far:
I0322 13:27:38.594810 46213 genetic_tuning_harness.cc:542] tc::MappingOptions::makeNaiveMappingOptions()
I0322 13:27:38.594818 46213 genetic_tuning_harness.cc:542] .outerScheduleFusionStrategy(tc::FusionStrategy::Max)
I0322 13:27:38.594822 46213 genetic_tuning_harness.cc:542] .outerScheduleAllowSkewing(false)
I0322 13:27:38.594825 46213 genetic_tuning_harness.cc:542] .outerSchedulePositiveOrthant(true)
I0322 13:27:38.594828 46213 genetic_tuning_harness.cc:542] .intraTileScheduleFusionStrategy(tc::FusionStrategy::Preserve3Coincident)
I0322 13:27:38.594831 46213 genetic_tuning_harness.cc:542] .intraTileScheduleAllowSkewing(false)
I0322 13:27:38.594835 46213 genetic_tuning_harness.cc:542] .intraTileSchedulePositiveOrthant(true)
I0322 13:27:38.594837 46213 genetic_tuning_harness.cc:542] .tile(1, 1, 64)
I0322 13:27:38.594841 46213 genetic_tuning_harness.cc:542] .mapToThreads(128)
I0322 13:27:38.594843 46213 genetic_tuning_harness.cc:542] .mapToBlocks(64, 32, 128)
I0322 13:27:38.594846 46213 genetic_tuning_harness.cc:542] .unroll(64)
I0322 13:27:38.594848 46213 genetic_tuning_harness.cc:542] .tileImperfectlyNested(false)
I0322 13:27:38.594851 46213 genetic_tuning_harness.cc:542] .useSharedMemory(true)
I0322 13:27:38.594856 46213 genetic_tuning_harness.cc:542] .usePrivateMemory(false)
I0322 13:27:38.594857 46213 genetic_tuning_harness.cc:542] .unrollCopyShared(true)
I0322 13:27:38.594861 46213 genetic_tuning_harness.cc:542] .matchLibraryCalls(false);
********************************
examples/results_032218/QuadroGP100/logs/2477270-3/example_group_convolution.INFO
echo CUDA_LAUNCH_BLOCKING=1 ./build/examples/example_group_convolution --gtest_filter=*.GroupConvolution --N=32 --G=32 --C=8 --F=8 --W=28 --H=28 --KW=3 --KH=3 --debug_tuner=true --dump_cuda=true --disable_version_checks=true --log_dir=${LOG_DIR} --autotune=true --tuner_gen_log_generations=true --tuner_threads=${TUNER_THREADS} --tuner_gpus=${TUNER_GPUS} --save_tuner_proto_prefix=${LOG_DIR}/autotuner/ > ${LOG_DIR}/autotuner/GroupConvolution_N_32_G_32_C_8_F_8_W_28_H_28_KW_3_KH_3.log && CUDA_LAUNCH_BLOCKING=1 ./build/examples/example_group_convolution --gtest_filter=*.GroupConvolution --N=32 --G=32 --C=8 --F=8 --W=28 --H=28 --KW=3 --KH=3 --debug_tuner=true --dump_cuda=true --disable_version_checks=true --log_dir=${LOG_DIR} --autotune=true --tuner_gen_log_generations=true --tuner_threads=${TUNER_THREADS} --tuner_gpus=${TUNER_GPUS} --save_tuner_proto_prefix=${LOG_DIR}/autotuner/ --tuner_gen_restore_from_proto=0 >> ${LOG_DIR}/autotuner/GroupConvolution_N_32_G_32_C_8_F_8_W_28_H_28_KW_3_KH_3.log 2>&1
[TUNER][GENERATION LOG] median times of each candidate (in us) 495 496 497 497 497 501 541 548 549 551 552 583 584 592 594 595 612 635 640 645 648 650 652 659 686 692 696 697 701 701 705 709 720 721 733 745 793 794 795 801 804 813 813 859 879 901 902 913 952 968 968 969 969 1033 1042 1064 1071 1077 1087 1090 1119 1181 1218 1320 1380 1434 1495 1508 1511 1590 1793 1866 1935 1997 2246 2274
I0322 13:26:49.345052 23861 printer.cc:63] 
Generation 24 Jobs(Compiled, GPU)/total (100, 100)/100 (best/median/worst)us: 495/795/2274
I0322 13:26:49.345068 23861 printer.cc:70] 
Generation 24 Jobs(Compiled, GPU)/total (100, 100)/100 (best/median/worst)us: 495/795/2274
I0322 13:26:49.345227 16033 genetic_tuning_harness.cc:538] [TUNER][GENERATION LOG] best option so far:
I0322 13:26:49.345227 16033 genetic_tuning_harness.cc:538] [TUNER][GENERATION LOG] best option so far:
I0322 13:26:49.345863 16033 genetic_tuning_harness.cc:542] tc::MappingOptions::makeNaiveMappingOptions()
I0322 13:26:49.345875 16033 genetic_tuning_harness.cc:542] .outerScheduleFusionStrategy(tc::FusionStrategy::Preserve3Coincident)
I0322 13:26:49.345877 16033 genetic_tuning_harness.cc:542] .outerScheduleAllowSkewing(false)
I0322 13:26:49.345881 16033 genetic_tuning_harness.cc:542] .outerSchedulePositiveOrthant(true)
I0322 13:26:49.345885 16033 genetic_tuning_harness.cc:542] .intraTileScheduleFusionStrategy(tc::FusionStrategy::Preserve3Coincident)
I0322 13:26:49.345887 16033 genetic_tuning_harness.cc:542] .intraTileScheduleAllowSkewing(false)
I0322 13:26:49.345890 16033 genetic_tuning_harness.cc:542] .intraTileSchedulePositiveOrthant(true)
I0322 13:26:49.345893 16033 genetic_tuning_harness.cc:542] .tile(1, 1, 28)
I0322 13:26:49.345896 16033 genetic_tuning_harness.cc:542] .mapToThreads(14, 32)
I0322 13:26:49.345899 16033 genetic_tuning_harness.cc:542] .mapToBlocks(32, 256)
I0322 13:26:49.345901 16033 genetic_tuning_harness.cc:542] .unroll(8)
I0322 13:26:49.345904 16033 genetic_tuning_harness.cc:542] .tileImperfectlyNested(false)
I0322 13:26:49.345907 16033 genetic_tuning_harness.cc:542] .useSharedMemory(true)
I0322 13:26:49.345911 16033 genetic_tuning_harness.cc:542] .usePrivateMemory(true)
I0322 13:26:49.345914 16033 genetic_tuning_harness.cc:542] .unrollCopyShared(true)
I0322 13:26:49.345916 16033 genetic_tuning_harness.cc:542] .matchLibraryCalls(false);
********************************
examples/results_032218/QuadroGP100/logs/2477270-4/example_group_convolution.INFO
echo CUDA_LAUNCH_BLOCKING=1 ./build/examples/example_group_convolution --gtest_filter=*.GroupConvolution --N=32 --G=32 --C=16 --F=16 --W=14 --H=14 --KW=3 --KH=3 --debug_tuner=true --dump_cuda=true --disable_version_checks=true --log_dir=${LOG_DIR} --autotune=true --tuner_gen_log_generations=true --tuner_threads=${TUNER_THREADS} --tuner_gpus=${TUNER_GPUS} --save_tuner_proto_prefix=${LOG_DIR}/autotuner/ > ${LOG_DIR}/autotuner/GroupConvolution_N_32_G_32_C_16_F_16_W_14_H_14_KW_3_KH_3.log && CUDA_LAUNCH_BLOCKING=1 ./build/examples/example_group_convolution --gtest_filter=*.GroupConvolution --N=32 --G=32 --C=16 --F=16 --W=14 --H=14 --KW=3 --KH=3 --debug_tuner=true --dump_cuda=true --disable_version_checks=true --log_dir=${LOG_DIR} --autotune=true --tuner_gen_log_generations=true --tuner_threads=${TUNER_THREADS} --tuner_gpus=${TUNER_GPUS} --save_tuner_proto_prefix=${LOG_DIR}/autotuner/ --tuner_gen_restore_from_proto=0 >> ${LOG_DIR}/autotuner/GroupConvolution_N_32_G_32_C_16_F_16_W_14_H_14_KW_3_KH_3.log 2>&1
[TUNER][GENERATION LOG] median times of each candidate (in us) 324 353 354 358 363 364 370 372 372 381 382 382 383 383 384 384 385 386 386 386 387 387 390 392 398 401 401 402 412 413 417 423 436 437 466 469 472 477 527 532 537 545 558 558 561 566 581 624 626 630 656 661 673 698 721 751 760 770 775 784 795 821 833 853 864 890 932 1077 1113 1154 1228 1280 1286 1322 1353 1440
I0322 13:28:51.364197 38661 printer.cc:63] 
Generation 24 Jobs(Compiled, GPU)/total (100, 100)/100 (best/median/worst)us: 324/527/1440
I0322 13:28:51.364222 38661 printer.cc:70] 
Generation 24 Jobs(Compiled, GPU)/total (100, 100)/100 (best/median/worst)us: 324/527/1440
I0322 13:28:51.364362 29818 genetic_tuning_harness.cc:538] [TUNER][GENERATION LOG] best option so far:
I0322 13:28:51.364362 29818 genetic_tuning_harness.cc:538] [TUNER][GENERATION LOG] best option so far:
I0322 13:28:51.364888 29818 genetic_tuning_harness.cc:542] tc::MappingOptions::makeNaiveMappingOptions()
I0322 13:28:51.364897 29818 genetic_tuning_harness.cc:542] .outerScheduleFusionStrategy(tc::FusionStrategy::Preserve3Coincident)
I0322 13:28:51.364902 29818 genetic_tuning_harness.cc:542] .outerScheduleAllowSkewing(false)
I0322 13:28:51.364904 29818 genetic_tuning_harness.cc:542] .outerSchedulePositiveOrthant(true)
I0322 13:28:51.364907 29818 genetic_tuning_harness.cc:542] .intraTileScheduleFusionStrategy(tc::FusionStrategy::Preserve3Coincident)
I0322 13:28:51.364910 29818 genetic_tuning_harness.cc:542] .intraTileScheduleAllowSkewing(false)
I0322 13:28:51.364912 29818 genetic_tuning_harness.cc:542] .intraTileSchedulePositiveOrthant(true)
I0322 13:28:51.364915 29818 genetic_tuning_harness.cc:542] .tile(1, 1)
I0322 13:28:51.364918 29818 genetic_tuning_harness.cc:542] .mapToThreads(2, 14, 16)
I0322 13:28:51.364922 29818 genetic_tuning_harness.cc:542] .mapToBlocks(32, 128)
I0322 13:28:51.364924 29818 genetic_tuning_harness.cc:542] .unroll(256)
I0322 13:28:51.364928 29818 genetic_tuning_harness.cc:542] .tileImperfectlyNested(false)
I0322 13:28:51.364930 29818 genetic_tuning_harness.cc:542] .useSharedMemory(true)
I0322 13:28:51.364935 29818 genetic_tuning_harness.cc:542] .usePrivateMemory(true)
I0322 13:28:51.364938 29818 genetic_tuning_harness.cc:542] .unrollCopyShared(true)
I0322 13:28:51.364941 29818 genetic_tuning_harness.cc:542] .matchLibraryCalls(false);
********************************
examples/results_032218/QuadroGP100/logs/2477270-5/example_group_convolution.INFO
echo CUDA_LAUNCH_BLOCKING=1 ./build/examples/example_group_convolution --gtest_filter=*.GroupConvolution --N=32 --G=32 --C=32 --F=32 --W=7 --H=7 --KW=3 --KH=3 --debug_tuner=true --dump_cuda=true --disable_version_checks=true --log_dir=${LOG_DIR} --autotune=true --tuner_gen_log_generations=true --tuner_threads=${TUNER_THREADS} --tuner_gpus=${TUNER_GPUS} --save_tuner_proto_prefix=${LOG_DIR}/autotuner/ > ${LOG_DIR}/autotuner/GroupConvolution_N_32_G_32_C_32_F_32_W_7_H_7_KW_3_KH_3.log && CUDA_LAUNCH_BLOCKING=1 ./build/examples/example_group_convolution --gtest_filter=*.GroupConvolution --N=32 --G=32 --C=32 --F=32 --W=7 --H=7 --KW=3 --KH=3 --debug_tuner=true --dump_cuda=true --disable_version_checks=true --log_dir=${LOG_DIR} --autotune=true --tuner_gen_log_generations=true --tuner_threads=${TUNER_THREADS} --tuner_gpus=${TUNER_GPUS} --save_tuner_proto_prefix=${LOG_DIR}/autotuner/ --tuner_gen_restore_from_proto=0 >> ${LOG_DIR}/autotuner/GroupConvolution_N_32_G_32_C_32_F_32_W_7_H_7_KW_3_KH_3.log 2>&1
[TUNER][GENERATION LOG] median times of each candidate (in us) 466 475 476 493 494 499 502 530 537 538 543 545 547 548 550 553 569 608 624 642 646 646 651 675 678 683 684 688 695 700 701 749 813 846 850 855 861 876 915 920 932 945 946 957 958 1007 1073 1097 1102 1107 1133 1155 1157 1174 1192 1209 1262 1308 1320 1321 1329 1331 1355 1390 1467 1515 1535 1547 1576 1732 1752 1799 1850 1893 1897 1898 1901 1921 1943 1952 1962 2067 2117 2141
I0322 13:30:08.420614 55428 printer.cc:63] 
Generation 24 Jobs(Compiled, GPU)/total (100, 100)/100 (best/median/worst)us: 466/946/2141
I0322 13:30:08.420629 55428 printer.cc:70] 
Generation 24 Jobs(Compiled, GPU)/total (100, 100)/100 (best/median/worst)us: 466/946/2141
I0322 13:30:08.420706 44683 genetic_tuning_harness.cc:538] [TUNER][GENERATION LOG] best option so far:
I0322 13:30:08.420706 44683 genetic_tuning_harness.cc:538] [TUNER][GENERATION LOG] best option so far:
I0322 13:30:08.421010 44683 genetic_tuning_harness.cc:542] tc::MappingOptions::makeNaiveMappingOptions()
I0322 13:30:08.421016 44683 genetic_tuning_harness.cc:542] .outerScheduleFusionStrategy(tc::FusionStrategy::Preserve3Coincident)
I0322 13:30:08.421020 44683 genetic_tuning_harness.cc:542] .outerScheduleAllowSkewing(false)
I0322 13:30:08.421023 44683 genetic_tuning_harness.cc:542] .outerSchedulePositiveOrthant(true)
I0322 13:30:08.421026 44683 genetic_tuning_harness.cc:542] .intraTileScheduleFusionStrategy(tc::FusionStrategy::Preserve3Coincident)
I0322 13:30:08.421030 44683 genetic_tuning_harness.cc:542] .intraTileScheduleAllowSkewing(false)
I0322 13:30:08.421032 44683 genetic_tuning_harness.cc:542] .intraTileSchedulePositiveOrthant(true)
I0322 13:30:08.421036 44683 genetic_tuning_harness.cc:542] .tile(1, 1, 8)
I0322 13:30:08.421038 44683 genetic_tuning_harness.cc:542] .mapToThreads(4, 8, 16)
I0322 13:30:08.421041 44683 genetic_tuning_harness.cc:542] .mapToBlocks(7, 256, 128)
I0322 13:30:08.421044 44683 genetic_tuning_harness.cc:542] .unroll(64)
I0322 13:30:08.421046 44683 genetic_tuning_harness.cc:542] .tileImperfectlyNested(false)
I0322 13:30:08.421051 44683 genetic_tuning_harness.cc:542] .useSharedMemory(true)
I0322 13:30:08.421053 44683 genetic_tuning_harness.cc:542] .usePrivateMemory(true)
I0322 13:30:08.421056 44683 genetic_tuning_harness.cc:542] .unrollCopyShared(true)
I0322 13:30:08.421058 44683 genetic_tuning_harness.cc:542] .matchLibraryCalls(false);
********************************
examples/results_032218/QuadroGP100/logs/2477270-6/example_tmm.INFO
echo CUDA_LAUNCH_BLOCKING=1 ./build/examples/example_tmm --gtest_filter=*.TransposedMatMul --M=128 --K=32 --N=256 --debug_tuner=true --dump_cuda=true --disable_version_checks=true --log_dir=${LOG_DIR} --autotune=true --tuner_gen_log_generations=true --tuner_threads=${TUNER_THREADS} --tuner_gpus=${TUNER_GPUS} --save_tuner_proto_prefix=${LOG_DIR}/autotuner/ > ${LOG_DIR}/autotuner/TransposedMatMul_M_128_K_32_N_256.log && CUDA_LAUNCH_BLOCKING=1 ./build/examples/example_tmm --gtest_filter=*.TransposedMatMul --M=128 --K=32 --N=256 --debug_tuner=true --dump_cuda=true --disable_version_checks=true --log_dir=${LOG_DIR} --autotune=true --tuner_gen_log_generations=true --tuner_threads=${TUNER_THREADS} --tuner_gpus=${TUNER_GPUS} --save_tuner_proto_prefix=${LOG_DIR}/autotuner/ --tuner_gen_restore_from_proto=0 >> ${LOG_DIR}/autotuner/TransposedMatMul_M_128_K_32_N_256.log 2>&1
[TUNER][GENERATION LOG] median times of each candidate (in us) 17 17 18 18 19 20 20 20 20 21 22 22 22 23 23 23 24 24 24 24 24 25 25 25 25 25 26 26 26 26 26 26 26 26 27 27 27 27 27 28 28 29 29 29 29 30 31 31 32 32 35 35 35 35 36 36 36 36 37 39 40 40 40 41 42 43 43 44 45 46 46 46 47 47 48 48 49 50 50 51 52 52 53 53 54 58 63 64 70 71 75
I0322 13:27:18.133028 62029 printer.cc:63] 
Generation 24 Jobs(Compiled, GPU)/total (100, 100)/100 (best/median/worst)us: 17/30/75
I0322 13:27:18.133044 62029 printer.cc:70] 
Generation 24 Jobs(Compiled, GPU)/total (100, 100)/100 (best/median/worst)us: 17/30/75
I0322 13:27:18.133183 55314 genetic_tuning_harness.cc:538] [TUNER][GENERATION LOG] best option so far:
I0322 13:27:18.133183 55314 genetic_tuning_harness.cc:538] [TUNER][GENERATION LOG] best option so far:
I0322 13:27:18.133565 55314 genetic_tuning_harness.cc:542] tc::MappingOptions::makeNaiveMappingOptions()
I0322 13:27:18.133571 55314 genetic_tuning_harness.cc:542] .outerScheduleFusionStrategy(tc::FusionStrategy::Preserve3Coincident)
I0322 13:27:18.133572 55314 genetic_tuning_harness.cc:542] .outerScheduleAllowSkewing(false)
I0322 13:27:18.133574 55314 genetic_tuning_harness.cc:542] .outerSchedulePositiveOrthant(true)
I0322 13:27:18.133575 55314 genetic_tuning_harness.cc:542] .intraTileScheduleFusionStrategy(tc::FusionStrategy::Preserve3Coincident)
I0322 13:27:18.133576 55314 genetic_tuning_harness.cc:542] .intraTileScheduleAllowSkewing(false)
I0322 13:27:18.133577 55314 genetic_tuning_harness.cc:542] .intraTileSchedulePositiveOrthant(true)
I0322 13:27:18.133579 55314 genetic_tuning_harness.cc:542] .tile(32, 32, 32)
I0322 13:27:18.133579 55314 genetic_tuning_harness.cc:542] .mapToThreads(32, 32)
I0322 13:27:18.133580 55314 genetic_tuning_harness.cc:542] .mapToBlocks(4, 8)
I0322 13:27:18.133581 55314 genetic_tuning_harness.cc:542] .unroll(256)
I0322 13:27:18.133582 55314 genetic_tuning_harness.cc:542] .tileImperfectlyNested(false)
I0322 13:27:18.133584 55314 genetic_tuning_harness.cc:542] .useSharedMemory(true)
I0322 13:27:18.133584 55314 genetic_tuning_harness.cc:542] .usePrivateMemory(true)
I0322 13:27:18.133585 55314 genetic_tuning_harness.cc:542] .unrollCopyShared(false)
I0322 13:27:18.133586 55314 genetic_tuning_harness.cc:542] .matchLibraryCalls(false);
********************************
examples/results_032218/QuadroGP100/logs/2477270-7/example_tmm.INFO
echo CUDA_LAUNCH_BLOCKING=1 ./build/examples/example_tmm --gtest_filter=*.TransposedMatMul --M=128 --K=1024 --N=1024 --debug_tuner=true --dump_cuda=true --disable_version_checks=true --log_dir=${LOG_DIR} --autotune=true --tuner_gen_log_generations=true --tuner_threads=${TUNER_THREADS} --tuner_gpus=${TUNER_GPUS} --save_tuner_proto_prefix=${LOG_DIR}/autotuner/ > ${LOG_DIR}/autotuner/TransposedMatMul_M_128_K_1024_N_1024.log && CUDA_LAUNCH_BLOCKING=1 ./build/examples/example_tmm --gtest_filter=*.TransposedMatMul --M=128 --K=1024 --N=1024 --debug_tuner=true --dump_cuda=true --disable_version_checks=true --log_dir=${LOG_DIR} --autotune=true --tuner_gen_log_generations=true --tuner_threads=${TUNER_THREADS} --tuner_gpus=${TUNER_GPUS} --save_tuner_proto_prefix=${LOG_DIR}/autotuner/ --tuner_gen_restore_from_proto=0 >> ${LOG_DIR}/autotuner/TransposedMatMul_M_128_K_1024_N_1024.log 2>&1
[TUNER][GENERATION LOG] median times of each candidate (in us) 313 319 323 326 326 334 336 338 341 351 351 352 352 353 356 356 356 358 360 361 363 364 373 374 377 389 391 395 401 407 419 420 444 457 479 480 484 485 494 496 497 499 508 522 535 537 563 577 593 610 611 628 641 646 675 677 678 691 696 698 717 722 723 731 746 747 748 749 760 775 803 837 897 953 991 1004 1030 1180 1285 1499
I0322 13:30:37.930837 13369 printer.cc:63] 
Generation 24 Jobs(Compiled, GPU)/total (100, 100)/100 (best/median/worst)us: 313/497/1499
I0322 13:30:37.930865 13369 printer.cc:70] 
Generation 24 Jobs(Compiled, GPU)/total (100, 100)/100 (best/median/worst)us: 313/497/1499
I0322 13:30:37.930925 4007 genetic_tuning_harness.cc:538] [TUNER][GENERATION LOG] best option so far:
I0322 13:30:37.930925 4007 genetic_tuning_harness.cc:538] [TUNER][GENERATION LOG] best option so far:
I0322 13:30:37.931147 4007 genetic_tuning_harness.cc:542] tc::MappingOptions::makeNaiveMappingOptions()
I0322 13:30:37.931151 4007 genetic_tuning_harness.cc:542] .outerScheduleFusionStrategy(tc::FusionStrategy::Max)
I0322 13:30:37.931152 4007 genetic_tuning_harness.cc:542] .outerScheduleAllowSkewing(false)
I0322 13:30:37.931154 4007 genetic_tuning_harness.cc:542] .outerSchedulePositiveOrthant(true)
I0322 13:30:37.931154 4007 genetic_tuning_harness.cc:542] .intraTileScheduleFusionStrategy(tc::FusionStrategy::Preserve3Coincident)
I0322 13:30:37.931155 4007 genetic_tuning_harness.cc:542] .intraTileScheduleAllowSkewing(false)
I0322 13:30:37.931156 4007 genetic_tuning_harness.cc:542] .intraTileSchedulePositiveOrthant(true)
I0322 13:30:37.931159 4007 genetic_tuning_harness.cc:542] .tile(1, 32)
I0322 13:30:37.931159 4007 genetic_tuning_harness.cc:542] .mapToThreads(32, 4)
I0322 13:30:37.931160 4007 genetic_tuning_harness.cc:542] .mapToBlocks(256, 256, 2)
I0322 13:30:37.931161 4007 genetic_tuning_harness.cc:542] .unroll(32)
I0322 13:30:37.931162 4007 genetic_tuning_harness.cc:542] .tileImperfectlyNested(false)
I0322 13:30:37.931164 4007 genetic_tuning_harness.cc:542] .useSharedMemory(true)
I0322 13:30:37.931164 4007 genetic_tuning_harness.cc:542] .usePrivateMemory(true)
I0322 13:30:37.931165 4007 genetic_tuning_harness.cc:542] .unrollCopyShared(false)
I0322 13:30:37.931166 4007 genetic_tuning_harness.cc:542] .matchLibraryCalls(true);
********************************
examples/results_032218/QuadroGP100/logs/2477270-8/example_tmm.INFO
echo CUDA_LAUNCH_BLOCKING=1 ./build/examples/example_tmm --gtest_filter=*.TransposedMatMul --M=128 --K=4096 --N=16384 --debug_tuner=true --dump_cuda=true --disable_version_checks=true --log_dir=${LOG_DIR} --autotune=true --tuner_gen_log_generations=true --tuner_threads=${TUNER_THREADS} --tuner_gpus=${TUNER_GPUS} --save_tuner_proto_prefix=${LOG_DIR}/autotuner/ > ${LOG_DIR}/autotuner/TransposedMatMul_M_128_K_4096_N_16384.log 2>&1 && CUDA_LAUNCH_BLOCKING=1 ./build/examples/example_tmm --gtest_filter=*.TransposedMatMul --M=128 --K=4096 --N=16384 --debug_tuner=true --dump_cuda=true --disable_version_checks=true --log_dir=${LOG_DIR} --autotune=true --tuner_gen_log_generations=true --tuner_threads=${TUNER_THREADS} --tuner_gpus=${TUNER_GPUS} --save_tuner_proto_prefix=${LOG_DIR}/autotuner/ --tuner_gen_restore_from_proto=0 >> ${LOG_DIR}/autotuner/TransposedMatMul_M_128_K_4096_N_16384.log 2>&1
[TUNER][GENERATION LOG] median times of each candidate (in us) 11256 11481 11614 11632 18108 18334 18372 18557 18566 18567 18583 18596 18614 18935 19264 19327 19788 20102 20107 20288 20315 20336 20372 20375 20423 20424 20432 20435 20585 21276 22736 22851 23009 23061 23403 23478 23873 24182 25640 26630 29528 29915 30859 32167 33047 33054 34531 35263 35443 35796 35819 36943 39261 39313 39397 40910 40976 41955 42509 43413 45146 45296 45724 47090 48117 51233 52324 52523 54375 54786
I0322 13:50:30.538462 52520 printer.cc:63] 
Generation 24 Jobs(Compiled, GPU)/total (100, 100)/100 (best/median/worst)us: 11256/23478/54786
I0322 13:50:30.538487 52520 printer.cc:70] 
Generation 24 Jobs(Compiled, GPU)/total (100, 100)/100 (best/median/worst)us: 11256/23478/54786
I0322 13:50:30.538579 38403 genetic_tuning_harness.cc:538] [TUNER][GENERATION LOG] best option so far:
I0322 13:50:30.538579 38403 genetic_tuning_harness.cc:538] [TUNER][GENERATION LOG] best option so far:
I0322 13:50:30.538923 38403 genetic_tuning_harness.cc:542] tc::MappingOptions::makeNaiveMappingOptions()
I0322 13:50:30.538928 38403 genetic_tuning_harness.cc:542] .outerScheduleFusionStrategy(tc::FusionStrategy::Max)
I0322 13:50:30.538929 38403 genetic_tuning_harness.cc:542] .outerScheduleAllowSkewing(false)
I0322 13:50:30.538931 38403 genetic_tuning_harness.cc:542] .outerSchedulePositiveOrthant(true)
I0322 13:50:30.538933 38403 genetic_tuning_harness.cc:542] .intraTileScheduleFusionStrategy(tc::FusionStrategy::Preserve3Coincident)
I0322 13:50:30.538933 38403 genetic_tuning_harness.cc:542] .intraTileScheduleAllowSkewing(false)
I0322 13:50:30.538935 38403 genetic_tuning_harness.cc:542] .intraTileSchedulePositiveOrthant(true)
I0322 13:50:30.538936 38403 genetic_tuning_harness.cc:542] .tile(256, 16, 2)
I0322 13:50:30.538938 38403 genetic_tuning_harness.cc:542] .mapToThreads(4, 8)
I0322 13:50:30.538939 38403 genetic_tuning_harness.cc:542] .mapToBlocks(16, 16384)
I0322 13:50:30.538940 38403 genetic_tuning_harness.cc:542] .unroll(256)
I0322 13:50:30.538941 38403 genetic_tuning_harness.cc:542] .tileImperfectlyNested(false)
I0322 13:50:30.538944 38403 genetic_tuning_harness.cc:542] .useSharedMemory(true)
I0322 13:50:30.538944 38403 genetic_tuning_harness.cc:542] .usePrivateMemory(true)
I0322 13:50:30.538945 38403 genetic_tuning_harness.cc:542] .unrollCopyShared(false)
I0322 13:50:30.538947 38403 genetic_tuning_harness.cc:542] .matchLibraryCalls(true);
********************************
examples/results_032218/QuadroGP100/logs/2477270-9/example_MLP_model.INFO
echo CUDA_LAUNCH_BLOCKING=1 ./build/examples/example_MLP_model --gtest_filter=*.1LUT --B=128 --D=64 --L1=50 --E1=10000000 --L2=50 --E2=10000000 --WX=1000 --WY=1024 --M=2000 --N=128 --O=64 --P=32 --Q=2 --debug_tuner=true --dump_cuda=true --disable_version_checks=true --log_dir=${LOG_DIR} --autotune=true --tuner_gen_log_generations=true --tuner_threads=${TUNER_THREADS} --tuner_gpus=${TUNER_GPUS} --save_tuner_proto_prefix=${LOG_DIR}/autotuner/ > ${LOG_DIR}/autotuner/1LUT_B_128_D_64_L1_50_E1_10000000_L2_50_E2_10000000_WX_1000_WY_1024_M_2000_N_128_O_64_P_32_Q_2.log && CUDA_LAUNCH_BLOCKING=1 ./build/examples/example_MLP_model --gtest_filter=*.1LUT --B=128 --D=64 --L1=50 --E1=10000000 --L2=50 --E2=10000000 --WX=1000 --WY=1024 --M=2000 --N=128 --O=64 --P=32 --Q=2 --debug_tuner=true --dump_cuda=true --disable_version_checks=true --log_dir=${LOG_DIR} --autotune=true --tuner_gen_log_generations=true --tuner_threads=${TUNER_THREADS} --tuner_gpus=${TUNER_GPUS} --save_tuner_proto_prefix=${LOG_DIR}/autotuner/ --tuner_gen_restore_from_proto=0 >> ${LOG_DIR}/autotuner/1LUT_B_128_D_64_L1_50_E1_10000000_L2_50_E2_10000000_WX_1000_WY_1024_M_2000_N_128_O_64_P_32_Q_2.log 2>&1
[TUNER][GENERATION LOG] median times of each candidate (in us) 15 15 15 16 16 17 17 17 17 17 17 17 17 18 18 18 18 19 19 19 20 20 20 20 21 21 21 21 21 21 21 21 22 22 22 22 22 22 22 22 23 23 23 23 24 24 25 25 25 25 25 25 26 26 26 26 26 28 28 28 28 28 29 29 30 31 32 32 32 32 33 33 36 37 38 39 39 40 48 48 48 49 52 52 52 54 60 66 66
I0322 13:21:11.146277 9427 printer.cc:63] 
Generation 24 Jobs(Compiled, GPU)/total (100, 100)/100 (best/median/worst)us: 15/24/66
I0322 13:21:11.146296 9427 printer.cc:70] 
Generation 24 Jobs(Compiled, GPU)/total (100, 100)/100 (best/median/worst)us: 15/24/66
I0322 13:21:11.146435 3756 genetic_tuning_harness.cc:538] [TUNER][GENERATION LOG] best option so far:
I0322 13:21:11.146435 3756 genetic_tuning_harness.cc:538] [TUNER][GENERATION LOG] best option so far:
I0322 13:21:11.146914 3756 genetic_tuning_harness.cc:542] tc::MappingOptions::makeNaiveMappingOptions()
I0322 13:21:11.146919 3756 genetic_tuning_harness.cc:542] .outerScheduleFusionStrategy(tc::FusionStrategy::Max)
I0322 13:21:11.146920 3756 genetic_tuning_harness.cc:542] .outerScheduleAllowSkewing(false)
I0322 13:21:11.146921 3756 genetic_tuning_harness.cc:542] .outerSchedulePositiveOrthant(true)
I0322 13:21:11.146922 3756 genetic_tuning_harness.cc:542] .intraTileScheduleFusionStrategy(tc::FusionStrategy::Max)
I0322 13:21:11.146924 3756 genetic_tuning_harness.cc:542] .intraTileScheduleAllowSkewing(false)
I0322 13:21:11.146924 3756 genetic_tuning_harness.cc:542] .intraTileSchedulePositiveOrthant(true)
I0322 13:21:11.146925 3756 genetic_tuning_harness.cc:542] .tile(1, 64, 16777216)
I0322 13:21:11.146926 3756 genetic_tuning_harness.cc:542] .mapToThreads(128)
I0322 13:21:11.146927 3756 genetic_tuning_harness.cc:542] .mapToBlocks(611, 16384)
I0322 13:21:11.146929 3756 genetic_tuning_harness.cc:542] .unroll(32)
I0322 13:21:11.146929 3756 genetic_tuning_harness.cc:542] .tileImperfectlyNested(false)
I0322 13:21:11.146930 3756 genetic_tuning_harness.cc:542] .useSharedMemory(true)
I0322 13:21:11.146931 3756 genetic_tuning_harness.cc:542] .usePrivateMemory(true)
I0322 13:21:11.146932 3756 genetic_tuning_harness.cc:542] .unrollCopyShared(true)
I0322 13:21:11.146934 3756 genetic_tuning_harness.cc:542] .matchLibraryCalls(false);
********************************
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment