Skip to content

Instantly share code, notes, and snippets.

@bollu
Created November 22, 2017 21:48
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bollu/05a6c30d64cf5b4dc27a71428640ee89 to your computer and use it in GitHub Desktop.
Save bollu/05a6c30d64cf5b4dc27a71428640ee89 to your computer and use it in GitHub Desktop.
inv_th, only call to coe_th, coe_th does not have calls to EXP, EXP replaced by nop, openacc
==14651== Generated result file: /scratch/snx1600/siddhart/playground/standalone/run/standalone-nvprof-output.prof
+ + + + + + + + + + + + + + + +
+ RUNNING IN DOUBLE PRECISION +
+ + + + + + + + + + + + + + + +
STANDALONE PHYSICS
RUNNING ON GPU
RADIATION TEST
Iteration 1
Initialize test
>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<
>>> WARNING: SERIALIZATION IS ON <<<
>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<
ALL INPUTS READ
*****************************************************
* Radiative transfer calculations employ data *
* provided in routine rad_aibi *
*****************************************************
Run test
Finalize test
Iteration 2
Initialize test
>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<
>>> WARNING: SERIALIZATION IS ON <<<
>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<
ALL INPUTS READ
*****************************************************
* Radiative transfer calculations employ data *
* provided in routine rad_aibi *
*****************************************************
Run test
Finalize test
Iteration 3
Initialize test
>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<
>>> WARNING: SERIALIZATION IS ON <<<
>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<
ALL INPUTS READ
*****************************************************
* Radiative transfer calculations employ data *
* provided in routine rad_aibi *
*****************************************************
Run test
Finalize test
Iteration 4
Initialize test
>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<
>>> WARNING: SERIALIZATION IS ON <<<
>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<
ALL INPUTS READ
*****************************************************
* Radiative transfer calculations employ data *
* provided in routine rad_aibi *
*****************************************************
Run test
Finalize test
Domain size, ie,je,ke : 80 60 60
nproma : 4800
data_set type :full
--------------------------------------------------------------------------
Local timers:
NCOMP_PE= 1
--------------------------------------------------------------------------
Id Tag Ncalls min[s] max[s] mean[s]
1 Total Phys 4 0.3008 0.3008 0.3008
2 Copy block 8 0.0073 0.0073 0.0073
3 Radiation 4 0.2470 0.2470 0.2470
--------------------------------------------------------------------------
>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<
>>> WARNING: SERIALIZATION IS ON <<<
>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<
ALL OUTPUTS WRITTEN
+ nvprof -i standalone-nvprof-output.prof
======== Profiling result:
Time(%) Time Calls Avg Min Max Name
24.14% 52.895ms 104 508.61us 470.15us 545.42us fesft_dp$radiation_rg_$ck_L2596_78
12.99% 28.464ms 176 161.73us 140.79us 236.90us inv_th$radiation_rg_$ck_L4241_263
9.36% 20.516ms 459 44.696us 831ns 1.4919ms [CUDA memcpy HtoD]
8.27% 18.127ms 104 174.30us 168.79us 180.30us fesft_dp$radiation_rg_$ck_L2596_80
5.37% 11.761ms 132 89.099us 85.705us 93.255us fesft_dp$radiation_rg_$ck_L2638_82
3.70% 8.1080ms 216 37.537us 35.351us 41.237us fesft_dp$radiation_rg_$ck_L2161_40
3.20% 7.0126ms 1200 5.8430us 5.3420us 9.2450us fesft_dp$radiation_rg_$ck_L1914_28
2.23% 4.8822ms 161 30.323us 639ns 191.41us [CUDA memcpy DtoH]
2.14% 4.6912ms 12 390.93us 374.78us 407.06us fesft_dp$radiation_rg_$ck_L2497_69
2.11% 4.6163ms 720 6.4110us 5.9180us 10.174us fesft_dp$radiation_rg_$ck_L2383_51
2.09% 4.5713ms 1200 3.8090us 3.3280us 7.2940us fesft_dp$radiation_rg_$ck_L1914_29
1.91% 4.1963ms 1200 3.4960us 3.1990us 5.3430us fesft_dp$radiation_rg_$ck_L1914_31
1.90% 4.1648ms 72 57.845us 9.9170us 66.894us copytoblock3d$src_block_fields_$ck_L1401_1
1.46% 3.1970ms 1200 2.6640us 2.4310us 4.8630us fesft_dp$radiation_rg_$ck_L1914_30
1.37% 3.0095ms 720 4.1790us 3.7750us 7.4220us fesft_dp$radiation_rg_$ck_L2383_52
1.36% 2.9735ms 720 4.1290us 3.7750us 6.5260us fesft_dp$radiation_rg_$ck_L2383_55
1.19% 2.6062ms 20 130.31us 125.47us 135.48us fesft_dp$radiation_rg_$ck_L1925_32
1.11% 2.4279ms 720 3.3720us 3.1030us 5.2790us fesft_dp$radiation_rg_$ck_L2383_54
1.00% 2.1956ms 720 3.0490us 2.7510us 7.5180us fesft_dp$radiation_rg_$ck_L2383_53
0.96% 2.0932ms 12 174.43us 169.30us 179.63us fesft_dp$radiation_rg_$ck_L2497_71
0.87% 1.9047ms 16 119.05us 115.78us 121.50us fesft_dp$radiation_rg_$ck_L2051_36
0.76% 1.6659ms 104 16.017us 14.780us 17.436us fesft_dp$radiation_rg_$ck_L2596_75
0.68% 1.5008ms 28 53.598us 52.658us 55.314us fesft_dp$radiation_rg_$ck_L2617_81
0.63% 1.3880ms 12 115.66us 111.84us 119.30us fesft_dp$radiation_rg_$ck_L2515_72
0.59% 1.2919ms 40 32.298us 30.424us 34.103us fesft_dp$radiation_rg_$ck_L2144_39
0.56% 1.2211ms 4 305.27us 295.38us 317.58us radiation_rg_organize$radiation_rg_org_$ck_L2210_14
0.56% 1.2184ms 28 43.515us 42.676us 44.660us fesft_dp$radiation_rg_$ck_L2675_83
0.55% 1.2120ms 44 27.546us 26.041us 28.729us fesft_dp$radiation_rg_$ck_L2190_41
0.55% 1.1985ms 4 299.62us 289.46us 310.19us fesft_dp$radiation_rg_$ck_L1813_24
0.48% 1.0490ms 4 262.26us 251.55us 272.63us radiation_rg_organize$radiation_rg_org_$ck_L2602_30
0.44% 973.28us 24 40.553us 38.710us 42.900us copyfromblock3d$src_block_fields_$ck_L1610_6
0.36% 796.10us 104 7.6540us 7.1020us 8.2540us fesft_dp$radiation_rg_$ck_L2596_77
0.33% 732.86us 16 45.803us 44.117us 47.508us fesft_dp$radiation_rg_$ck_L2067_37
0.32% 708.23us 104 6.8090us 6.1420us 7.3580us fesft_dp$radiation_rg_$ck_L2596_79
0.32% 707.65us 12 58.971us 57.168us 60.400us fesft_dp$radiation_rg_$ck_L2786_87
0.29% 645.11us 28 23.039us 21.499us 24.506us fesft_dp$radiation_rg_$ck_L2568_73
0.27% 580.90us 20 29.045us 27.992us 30.680us fesft_dp$radiation_rg_$ck_L2244_44
0.26% 575.79us 20 28.789us 27.961us 29.816us fesft_dp$radiation_rg_$ck_L1891_26
0.26% 565.15us 104 5.4340us 5.0540us 5.8220us fesft_dp$radiation_rg_$ck_L2596_76
0.23% 499.23us 20 24.961us 23.769us 25.785us fesft_dp$radiation_rg_$ck_L2223_43
0.20% 448.68us 12 37.390us 35.703us 38.965us fesft_dp$radiation_rg_$ck_L2709_85
0.20% 430.63us 200 2.1530us 1.6630us 3.8710us copytoblock2d$src_block_fields_$ck_L1439_4
0.19% 416.24us 4 104.06us 100.81us 106.82us radiation_rg_organize$radiation_rg_org_$ck_L2344_15
0.17% 369.92us 12 30.826us 30.264us 31.224us fesft_dp$radiation_rg_$ck_L2689_84
0.16% 359.01us 4 89.752us 86.665us 93.192us radiation_rg_organize$radiation_rg_org_$ck_L3130_34
0.16% 357.44us 104 3.4360us 3.1030us 3.8710us fesft_dp$radiation_rg_$ck_L2596_74
0.15% 325.80us 4 81.450us 78.091us 84.585us fesft_dp$radiation_rg_$ck_L1751_20
0.13% 276.47us 12 23.039us 21.626us 24.345us fesft_dp$radiation_rg_$ck_L2353_48
0.12% 263.87us 4 65.966us 62.064us 70.349us radiation_rg_organize$radiation_rg_org_$ck_L2041_9
0.12% 258.17us 44 5.8670us 5.3100us 6.9420us fesft_dp$radiation_rg_$ck_L2099_38
0.12% 254.01us 4 63.503us 61.104us 65.422us radiation_rg_organize$radiation_rg_org_$ck_L3467_36
0.12% 253.69us 16 15.855us 15.068us 16.667us fesft_dp$radiation_rg_$ck_L2203_42
0.09% 191.34us 96 1.9930us 1.7590us 3.3270us copyfromblock2d$src_block_fields_$ck_L1652_9
0.08% 170.87us 4 42.716us 42.453us 43.285us fesft_dp$radiation_rg_$ck_L1783_22
0.07% 146.36us 12 12.196us 11.517us 13.148us fesft_dp$radiation_rg_$ck_L2497_66
0.06% 140.64us 4 35.158us 33.719us 36.567us radiation_rg_organize$radiation_rg_org_$ck_L2512_26
0.06% 130.94us 4 32.735us 31.447us 34.071us radiation_rg_organize$radiation_rg_org_$ck_L2380_17
0.05% 119.68us 4 29.919us 28.888us 31.063us radiation_rg_organize$radiation_rg_org_$ck_L2419_20
0.04% 92.360us 12 7.6960us 7.3900us 8.1580us fesft_dp$radiation_rg_$ck_L2497_68
0.04% 85.801us 8 10.725us 10.205us 11.357us compute_sunshine_conditions$radiation_interface_$ck_L1704_13
0.04% 83.530us 12 6.9600us 6.5580us 7.2940us fesft_dp$radiation_rg_$ck_L2746_86
0.04% 78.027us 12 6.5020us 5.9820us 6.8140us fesft_dp$radiation_rg_$ck_L2497_70
0.03% 75.981us 4 18.995us 17.948us 19.995us fesft_dp$radiation_rg_$ck_L1669_17
0.03% 66.767us 12 5.5630us 5.2470us 6.0470us fesft_dp$radiation_rg_$ck_L2497_67
0.03% 64.399us 12 5.3660us 5.0540us 5.8230us fesft_dp$radiation_rg_$ck_L2806_88
0.03% 59.536us 4 14.884us 14.652us 15.164us radiation_rg_organize$radiation_rg_org_$ck_L2395_18
0.02% 53.172us 4 13.293us 10.782us 14.876us calc_rad_corrections$radiation_interface_$ck_L2218_32
0.02% 50.067us 4 12.516us 11.932us 13.117us radiation_rg_organize$radiation_rg_org_$ck_L2469_24
0.02% 44.596us 20 2.2290us 2.0470us 2.4960us fesft_dp$radiation_rg_$ck_L1914_27
0.02% 40.661us 4 10.165us 10.013us 10.365us radiation_rg_organize$radiation_rg_org_$ck_L3094_33
0.02% 39.002us 12 3.2500us 3.0400us 3.4240us fesft_dp$radiation_rg_$ck_L2497_65
0.02% 38.168us 12 3.1800us 2.8160us 3.5510us fesft_dp$radiation_rg_$ck_L2367_49
0.02% 35.256us 4 8.8140us 8.2860us 9.3100us radiation_rg_organize$radiation_rg_org_$ck_L2444_22
0.01% 26.136us 4 6.5340us 6.3340us 6.9740us fesft_dp$radiation_rg_$ck_L1735_19
0.01% 23.865us 12 1.9880us 1.8870us 2.1750us fesft_dp$radiation_rg_$ck_L2383_50
0.01% 23.546us 4 5.8860us 5.5980us 6.2700us radiation_rg_organize$radiation_rg_org_$ck_L2565_28
0.01% 22.843us 4 5.7100us 5.5990us 5.9190us fesft_dp$radiation_rg_$ck_L2032_35
0.01% 22.809us 4 5.7020us 5.4380us 5.9180us fesft_dp$radiation_rg_$ck_L1771_21
0.01% 22.426us 8 2.8030us 2.2070us 3.5510us compute_sunshine_conditions$radiation_interface_$ck_L1759_14
0.01% 20.186us 4 5.0460us 4.8620us 5.2140us radiation_rg_organize$radiation_rg_org_$ck_L1824_4
0.01% 19.483us 4 4.8700us 4.5430us 5.1500us fesft_dp$radiation_rg_$ck_L2837_90
0.01% 17.659us 4 4.4140us 4.2230us 4.5750us radiation_rg_organize$radiation_rg_org_$ck_L1948_7
0.01% 16.955us 8 2.1190us 1.6630us 2.7520us compute_sunshine_conditions$radiation_interface_$ck_L1597_10
0.01% 16.124us 4 4.0310us 3.7750us 4.4150us radiation_rg_organize$radiation_rg_org_$ck_L3443_35
0.01% 14.908us 4 3.7270us 3.4230us 3.9350us radiation_rg_organize$radiation_rg_org_$ck_L2530_27
0.01% 14.076us 4 3.5190us 3.2950us 3.8070us fesft_dp$radiation_rg_$ck_L2335_47
0.01% 13.693us 4 3.4230us 3.2310us 3.5200us fesft_dp$radiation_rg_$ck_L2265_45
0.01% 13.149us 4 3.2870us 2.8790us 3.5830us radiation_prepare$radiation_interface_$ck_L1349_1
0.01% 12.350us 4 3.0870us 2.9760us 3.2000us fesft_dp$radiation_rg_$ck_L1701_18
0.00% 10.524us 4 2.6310us 2.4950us 2.7830us radiation_organize$radiation_interface_$ck_L2047_30
0.00% 10.396us 4 2.5990us 2.5910us 2.6230us radiation_rg_organize$radiation_rg_org_$ck_L2434_21
0.00% 10.367us 4 2.5910us 2.3680us 2.8470us radiation_rg_organize$radiation_rg_org_$ck_L2459_23
0.00% 9.9810us 4 2.4950us 2.3030us 2.8150us radiation_rg_organize$radiation_rg_org_$ck_L2370_16
0.00% 9.6950us 4 2.4230us 2.2390us 2.7200us fesft_dp$radiation_rg_$ck_L1800_23
0.00% 9.6620us 4 2.4150us 2.3030us 2.6550us fesft_dp$radiation_rg_$ck_L2298_46
0.00% 9.2770us 4 2.3190us 2.1750us 2.4320us radiation_rg_organize$radiation_rg_org_$ck_L2409_19
0.00% 8.4460us 4 2.1110us 2.0790us 2.1760us radiation_rg_organize$radiation_rg_org_$ck_L2855_101
0.00% 8.2870us 4 2.0710us 1.9190us 2.3360us radiation_rg_organize$radiation_rg_org_$ck_L2482_25
0.00% 8.2850us 4 2.0710us 1.8870us 2.2710us fesft_dp$radiation_rg_$ck_L1853_25
0.00% 2.0790us 1 2.0790us 2.0790us 2.0790us test_physics$src_physics_$ck_L346_2
======== API calls:
Time(%) Time Calls Avg Min Max Name
51.29% 327.21ms 1 327.21ms 327.21ms 327.21ms cuCtxCreate
16.71% 106.57ms 459 232.17us 6.5600us 9.9472ms cuMemcpyHtoD
12.85% 81.965ms 993 82.543us 1.2060us 7.2075ms cuStreamSynchronize
7.01% 44.739ms 10785 4.1480us 3.4370us 720.14us cuLaunchKernel
6.23% 39.750ms 5 7.9499ms 268.61us 24.745ms cuModuleLoadData
3.55% 22.670ms 332 68.284us 2.1550us 321.73us cuMemAlloc
1.42% 9.0468ms 161 56.191us 14.694us 288.37us cuMemcpyDtoH
0.75% 4.8097ms 32 150.30us 1.7810us 993.12us cuCtxSynchronize
0.13% 824.58us 1 824.58us 824.58us 824.58us cuMemHostAlloc
0.02% 138.44us 1 138.44us 138.44us 138.44us cuStreamCreate
0.02% 109.81us 98 1.1200us 287ns 4.8770us cuModuleGetFunction
0.01% 37.700us 196 192ns 138ns 937ns cuFuncGetAttribute
0.00% 22.018us 34 647ns 268ns 2.7370us cuEventCreate
0.00% 17.955us 98 183ns 137ns 335ns cuFuncSetCacheConfig
0.00% 5.6310us 5 1.1260us 845ns 1.5610us cuMemHostGetDevicePointer
0.00% 4.9130us 5 982ns 403ns 1.2740us cuModuleGetGlobal
0.00% 2.4700us 3 823ns 351ns 1.5940us cuDeviceGetCount
0.00% 1.7010us 7 243ns 168ns 382ns cuDeviceGetAttribute
0.00% 1.0700us 3 356ns 213ns 583ns cuDeviceGet
0.00% 579ns 1 579ns 579ns 579ns cuCtxSetCurrent
0.00% 353ns 1 353ns 353ns 353ns cuCtxGetCurrent
+ /project/c01/install_old/daint/serialbox/gnu/bin/compare Field_rank0.json radiation-standalone_rank0.json
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment