-
-
Save bollu/05a6c30d64cf5b4dc27a71428640ee89 to your computer and use it in GitHub Desktop.
inv_th, only call to coe_th, coe_th does not have calls to EXP, EXP replaced by nop, openacc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
==14651== Generated result file: /scratch/snx1600/siddhart/playground/standalone/run/standalone-nvprof-output.prof | |
+ + + + + + + + + + + + + + + + | |
+ RUNNING IN DOUBLE PRECISION + | |
+ + + + + + + + + + + + + + + + | |
STANDALONE PHYSICS | |
RUNNING ON GPU | |
RADIATION TEST | |
Iteration 1 | |
Initialize test | |
>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<< | |
>>> WARNING: SERIALIZATION IS ON <<< | |
>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<< | |
ALL INPUTS READ | |
***************************************************** | |
* Radiative transfer calculations employ data * | |
* provided in routine rad_aibi * | |
***************************************************** | |
Run test | |
Finalize test | |
Iteration 2 | |
Initialize test | |
>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<< | |
>>> WARNING: SERIALIZATION IS ON <<< | |
>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<< | |
ALL INPUTS READ | |
***************************************************** | |
* Radiative transfer calculations employ data * | |
* provided in routine rad_aibi * | |
***************************************************** | |
Run test | |
Finalize test | |
Iteration 3 | |
Initialize test | |
>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<< | |
>>> WARNING: SERIALIZATION IS ON <<< | |
>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<< | |
ALL INPUTS READ | |
***************************************************** | |
* Radiative transfer calculations employ data * | |
* provided in routine rad_aibi * | |
***************************************************** | |
Run test | |
Finalize test | |
Iteration 4 | |
Initialize test | |
>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<< | |
>>> WARNING: SERIALIZATION IS ON <<< | |
>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<< | |
ALL INPUTS READ | |
***************************************************** | |
* Radiative transfer calculations employ data * | |
* provided in routine rad_aibi * | |
***************************************************** | |
Run test | |
Finalize test | |
Domain size, ie,je,ke : 80 60 60 | |
nproma : 4800 | |
data_set type :full | |
-------------------------------------------------------------------------- | |
Local timers: | |
NCOMP_PE= 1 | |
-------------------------------------------------------------------------- | |
Id Tag Ncalls min[s] max[s] mean[s] | |
1 Total Phys 4 0.3008 0.3008 0.3008 | |
2 Copy block 8 0.0073 0.0073 0.0073 | |
3 Radiation 4 0.2470 0.2470 0.2470 | |
-------------------------------------------------------------------------- | |
>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<< | |
>>> WARNING: SERIALIZATION IS ON <<< | |
>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<< | |
ALL OUTPUTS WRITTEN | |
+ nvprof -i standalone-nvprof-output.prof | |
======== Profiling result: | |
Time(%) Time Calls Avg Min Max Name | |
24.14% 52.895ms 104 508.61us 470.15us 545.42us fesft_dp$radiation_rg_$ck_L2596_78 | |
12.99% 28.464ms 176 161.73us 140.79us 236.90us inv_th$radiation_rg_$ck_L4241_263 | |
9.36% 20.516ms 459 44.696us 831ns 1.4919ms [CUDA memcpy HtoD] | |
8.27% 18.127ms 104 174.30us 168.79us 180.30us fesft_dp$radiation_rg_$ck_L2596_80 | |
5.37% 11.761ms 132 89.099us 85.705us 93.255us fesft_dp$radiation_rg_$ck_L2638_82 | |
3.70% 8.1080ms 216 37.537us 35.351us 41.237us fesft_dp$radiation_rg_$ck_L2161_40 | |
3.20% 7.0126ms 1200 5.8430us 5.3420us 9.2450us fesft_dp$radiation_rg_$ck_L1914_28 | |
2.23% 4.8822ms 161 30.323us 639ns 191.41us [CUDA memcpy DtoH] | |
2.14% 4.6912ms 12 390.93us 374.78us 407.06us fesft_dp$radiation_rg_$ck_L2497_69 | |
2.11% 4.6163ms 720 6.4110us 5.9180us 10.174us fesft_dp$radiation_rg_$ck_L2383_51 | |
2.09% 4.5713ms 1200 3.8090us 3.3280us 7.2940us fesft_dp$radiation_rg_$ck_L1914_29 | |
1.91% 4.1963ms 1200 3.4960us 3.1990us 5.3430us fesft_dp$radiation_rg_$ck_L1914_31 | |
1.90% 4.1648ms 72 57.845us 9.9170us 66.894us copytoblock3d$src_block_fields_$ck_L1401_1 | |
1.46% 3.1970ms 1200 2.6640us 2.4310us 4.8630us fesft_dp$radiation_rg_$ck_L1914_30 | |
1.37% 3.0095ms 720 4.1790us 3.7750us 7.4220us fesft_dp$radiation_rg_$ck_L2383_52 | |
1.36% 2.9735ms 720 4.1290us 3.7750us 6.5260us fesft_dp$radiation_rg_$ck_L2383_55 | |
1.19% 2.6062ms 20 130.31us 125.47us 135.48us fesft_dp$radiation_rg_$ck_L1925_32 | |
1.11% 2.4279ms 720 3.3720us 3.1030us 5.2790us fesft_dp$radiation_rg_$ck_L2383_54 | |
1.00% 2.1956ms 720 3.0490us 2.7510us 7.5180us fesft_dp$radiation_rg_$ck_L2383_53 | |
0.96% 2.0932ms 12 174.43us 169.30us 179.63us fesft_dp$radiation_rg_$ck_L2497_71 | |
0.87% 1.9047ms 16 119.05us 115.78us 121.50us fesft_dp$radiation_rg_$ck_L2051_36 | |
0.76% 1.6659ms 104 16.017us 14.780us 17.436us fesft_dp$radiation_rg_$ck_L2596_75 | |
0.68% 1.5008ms 28 53.598us 52.658us 55.314us fesft_dp$radiation_rg_$ck_L2617_81 | |
0.63% 1.3880ms 12 115.66us 111.84us 119.30us fesft_dp$radiation_rg_$ck_L2515_72 | |
0.59% 1.2919ms 40 32.298us 30.424us 34.103us fesft_dp$radiation_rg_$ck_L2144_39 | |
0.56% 1.2211ms 4 305.27us 295.38us 317.58us radiation_rg_organize$radiation_rg_org_$ck_L2210_14 | |
0.56% 1.2184ms 28 43.515us 42.676us 44.660us fesft_dp$radiation_rg_$ck_L2675_83 | |
0.55% 1.2120ms 44 27.546us 26.041us 28.729us fesft_dp$radiation_rg_$ck_L2190_41 | |
0.55% 1.1985ms 4 299.62us 289.46us 310.19us fesft_dp$radiation_rg_$ck_L1813_24 | |
0.48% 1.0490ms 4 262.26us 251.55us 272.63us radiation_rg_organize$radiation_rg_org_$ck_L2602_30 | |
0.44% 973.28us 24 40.553us 38.710us 42.900us copyfromblock3d$src_block_fields_$ck_L1610_6 | |
0.36% 796.10us 104 7.6540us 7.1020us 8.2540us fesft_dp$radiation_rg_$ck_L2596_77 | |
0.33% 732.86us 16 45.803us 44.117us 47.508us fesft_dp$radiation_rg_$ck_L2067_37 | |
0.32% 708.23us 104 6.8090us 6.1420us 7.3580us fesft_dp$radiation_rg_$ck_L2596_79 | |
0.32% 707.65us 12 58.971us 57.168us 60.400us fesft_dp$radiation_rg_$ck_L2786_87 | |
0.29% 645.11us 28 23.039us 21.499us 24.506us fesft_dp$radiation_rg_$ck_L2568_73 | |
0.27% 580.90us 20 29.045us 27.992us 30.680us fesft_dp$radiation_rg_$ck_L2244_44 | |
0.26% 575.79us 20 28.789us 27.961us 29.816us fesft_dp$radiation_rg_$ck_L1891_26 | |
0.26% 565.15us 104 5.4340us 5.0540us 5.8220us fesft_dp$radiation_rg_$ck_L2596_76 | |
0.23% 499.23us 20 24.961us 23.769us 25.785us fesft_dp$radiation_rg_$ck_L2223_43 | |
0.20% 448.68us 12 37.390us 35.703us 38.965us fesft_dp$radiation_rg_$ck_L2709_85 | |
0.20% 430.63us 200 2.1530us 1.6630us 3.8710us copytoblock2d$src_block_fields_$ck_L1439_4 | |
0.19% 416.24us 4 104.06us 100.81us 106.82us radiation_rg_organize$radiation_rg_org_$ck_L2344_15 | |
0.17% 369.92us 12 30.826us 30.264us 31.224us fesft_dp$radiation_rg_$ck_L2689_84 | |
0.16% 359.01us 4 89.752us 86.665us 93.192us radiation_rg_organize$radiation_rg_org_$ck_L3130_34 | |
0.16% 357.44us 104 3.4360us 3.1030us 3.8710us fesft_dp$radiation_rg_$ck_L2596_74 | |
0.15% 325.80us 4 81.450us 78.091us 84.585us fesft_dp$radiation_rg_$ck_L1751_20 | |
0.13% 276.47us 12 23.039us 21.626us 24.345us fesft_dp$radiation_rg_$ck_L2353_48 | |
0.12% 263.87us 4 65.966us 62.064us 70.349us radiation_rg_organize$radiation_rg_org_$ck_L2041_9 | |
0.12% 258.17us 44 5.8670us 5.3100us 6.9420us fesft_dp$radiation_rg_$ck_L2099_38 | |
0.12% 254.01us 4 63.503us 61.104us 65.422us radiation_rg_organize$radiation_rg_org_$ck_L3467_36 | |
0.12% 253.69us 16 15.855us 15.068us 16.667us fesft_dp$radiation_rg_$ck_L2203_42 | |
0.09% 191.34us 96 1.9930us 1.7590us 3.3270us copyfromblock2d$src_block_fields_$ck_L1652_9 | |
0.08% 170.87us 4 42.716us 42.453us 43.285us fesft_dp$radiation_rg_$ck_L1783_22 | |
0.07% 146.36us 12 12.196us 11.517us 13.148us fesft_dp$radiation_rg_$ck_L2497_66 | |
0.06% 140.64us 4 35.158us 33.719us 36.567us radiation_rg_organize$radiation_rg_org_$ck_L2512_26 | |
0.06% 130.94us 4 32.735us 31.447us 34.071us radiation_rg_organize$radiation_rg_org_$ck_L2380_17 | |
0.05% 119.68us 4 29.919us 28.888us 31.063us radiation_rg_organize$radiation_rg_org_$ck_L2419_20 | |
0.04% 92.360us 12 7.6960us 7.3900us 8.1580us fesft_dp$radiation_rg_$ck_L2497_68 | |
0.04% 85.801us 8 10.725us 10.205us 11.357us compute_sunshine_conditions$radiation_interface_$ck_L1704_13 | |
0.04% 83.530us 12 6.9600us 6.5580us 7.2940us fesft_dp$radiation_rg_$ck_L2746_86 | |
0.04% 78.027us 12 6.5020us 5.9820us 6.8140us fesft_dp$radiation_rg_$ck_L2497_70 | |
0.03% 75.981us 4 18.995us 17.948us 19.995us fesft_dp$radiation_rg_$ck_L1669_17 | |
0.03% 66.767us 12 5.5630us 5.2470us 6.0470us fesft_dp$radiation_rg_$ck_L2497_67 | |
0.03% 64.399us 12 5.3660us 5.0540us 5.8230us fesft_dp$radiation_rg_$ck_L2806_88 | |
0.03% 59.536us 4 14.884us 14.652us 15.164us radiation_rg_organize$radiation_rg_org_$ck_L2395_18 | |
0.02% 53.172us 4 13.293us 10.782us 14.876us calc_rad_corrections$radiation_interface_$ck_L2218_32 | |
0.02% 50.067us 4 12.516us 11.932us 13.117us radiation_rg_organize$radiation_rg_org_$ck_L2469_24 | |
0.02% 44.596us 20 2.2290us 2.0470us 2.4960us fesft_dp$radiation_rg_$ck_L1914_27 | |
0.02% 40.661us 4 10.165us 10.013us 10.365us radiation_rg_organize$radiation_rg_org_$ck_L3094_33 | |
0.02% 39.002us 12 3.2500us 3.0400us 3.4240us fesft_dp$radiation_rg_$ck_L2497_65 | |
0.02% 38.168us 12 3.1800us 2.8160us 3.5510us fesft_dp$radiation_rg_$ck_L2367_49 | |
0.02% 35.256us 4 8.8140us 8.2860us 9.3100us radiation_rg_organize$radiation_rg_org_$ck_L2444_22 | |
0.01% 26.136us 4 6.5340us 6.3340us 6.9740us fesft_dp$radiation_rg_$ck_L1735_19 | |
0.01% 23.865us 12 1.9880us 1.8870us 2.1750us fesft_dp$radiation_rg_$ck_L2383_50 | |
0.01% 23.546us 4 5.8860us 5.5980us 6.2700us radiation_rg_organize$radiation_rg_org_$ck_L2565_28 | |
0.01% 22.843us 4 5.7100us 5.5990us 5.9190us fesft_dp$radiation_rg_$ck_L2032_35 | |
0.01% 22.809us 4 5.7020us 5.4380us 5.9180us fesft_dp$radiation_rg_$ck_L1771_21 | |
0.01% 22.426us 8 2.8030us 2.2070us 3.5510us compute_sunshine_conditions$radiation_interface_$ck_L1759_14 | |
0.01% 20.186us 4 5.0460us 4.8620us 5.2140us radiation_rg_organize$radiation_rg_org_$ck_L1824_4 | |
0.01% 19.483us 4 4.8700us 4.5430us 5.1500us fesft_dp$radiation_rg_$ck_L2837_90 | |
0.01% 17.659us 4 4.4140us 4.2230us 4.5750us radiation_rg_organize$radiation_rg_org_$ck_L1948_7 | |
0.01% 16.955us 8 2.1190us 1.6630us 2.7520us compute_sunshine_conditions$radiation_interface_$ck_L1597_10 | |
0.01% 16.124us 4 4.0310us 3.7750us 4.4150us radiation_rg_organize$radiation_rg_org_$ck_L3443_35 | |
0.01% 14.908us 4 3.7270us 3.4230us 3.9350us radiation_rg_organize$radiation_rg_org_$ck_L2530_27 | |
0.01% 14.076us 4 3.5190us 3.2950us 3.8070us fesft_dp$radiation_rg_$ck_L2335_47 | |
0.01% 13.693us 4 3.4230us 3.2310us 3.5200us fesft_dp$radiation_rg_$ck_L2265_45 | |
0.01% 13.149us 4 3.2870us 2.8790us 3.5830us radiation_prepare$radiation_interface_$ck_L1349_1 | |
0.01% 12.350us 4 3.0870us 2.9760us 3.2000us fesft_dp$radiation_rg_$ck_L1701_18 | |
0.00% 10.524us 4 2.6310us 2.4950us 2.7830us radiation_organize$radiation_interface_$ck_L2047_30 | |
0.00% 10.396us 4 2.5990us 2.5910us 2.6230us radiation_rg_organize$radiation_rg_org_$ck_L2434_21 | |
0.00% 10.367us 4 2.5910us 2.3680us 2.8470us radiation_rg_organize$radiation_rg_org_$ck_L2459_23 | |
0.00% 9.9810us 4 2.4950us 2.3030us 2.8150us radiation_rg_organize$radiation_rg_org_$ck_L2370_16 | |
0.00% 9.6950us 4 2.4230us 2.2390us 2.7200us fesft_dp$radiation_rg_$ck_L1800_23 | |
0.00% 9.6620us 4 2.4150us 2.3030us 2.6550us fesft_dp$radiation_rg_$ck_L2298_46 | |
0.00% 9.2770us 4 2.3190us 2.1750us 2.4320us radiation_rg_organize$radiation_rg_org_$ck_L2409_19 | |
0.00% 8.4460us 4 2.1110us 2.0790us 2.1760us radiation_rg_organize$radiation_rg_org_$ck_L2855_101 | |
0.00% 8.2870us 4 2.0710us 1.9190us 2.3360us radiation_rg_organize$radiation_rg_org_$ck_L2482_25 | |
0.00% 8.2850us 4 2.0710us 1.8870us 2.2710us fesft_dp$radiation_rg_$ck_L1853_25 | |
0.00% 2.0790us 1 2.0790us 2.0790us 2.0790us test_physics$src_physics_$ck_L346_2 | |
======== API calls: | |
Time(%) Time Calls Avg Min Max Name | |
51.29% 327.21ms 1 327.21ms 327.21ms 327.21ms cuCtxCreate | |
16.71% 106.57ms 459 232.17us 6.5600us 9.9472ms cuMemcpyHtoD | |
12.85% 81.965ms 993 82.543us 1.2060us 7.2075ms cuStreamSynchronize | |
7.01% 44.739ms 10785 4.1480us 3.4370us 720.14us cuLaunchKernel | |
6.23% 39.750ms 5 7.9499ms 268.61us 24.745ms cuModuleLoadData | |
3.55% 22.670ms 332 68.284us 2.1550us 321.73us cuMemAlloc | |
1.42% 9.0468ms 161 56.191us 14.694us 288.37us cuMemcpyDtoH | |
0.75% 4.8097ms 32 150.30us 1.7810us 993.12us cuCtxSynchronize | |
0.13% 824.58us 1 824.58us 824.58us 824.58us cuMemHostAlloc | |
0.02% 138.44us 1 138.44us 138.44us 138.44us cuStreamCreate | |
0.02% 109.81us 98 1.1200us 287ns 4.8770us cuModuleGetFunction | |
0.01% 37.700us 196 192ns 138ns 937ns cuFuncGetAttribute | |
0.00% 22.018us 34 647ns 268ns 2.7370us cuEventCreate | |
0.00% 17.955us 98 183ns 137ns 335ns cuFuncSetCacheConfig | |
0.00% 5.6310us 5 1.1260us 845ns 1.5610us cuMemHostGetDevicePointer | |
0.00% 4.9130us 5 982ns 403ns 1.2740us cuModuleGetGlobal | |
0.00% 2.4700us 3 823ns 351ns 1.5940us cuDeviceGetCount | |
0.00% 1.7010us 7 243ns 168ns 382ns cuDeviceGetAttribute | |
0.00% 1.0700us 3 356ns 213ns 583ns cuDeviceGet | |
0.00% 579ns 1 579ns 579ns 579ns cuCtxSetCurrent | |
0.00% 353ns 1 353ns 353ns 353ns cuCtxGetCurrent | |
+ /project/c01/install_old/daint/serialbox/gnu/bin/compare Field_rank0.json radiation-standalone_rank0.json |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment