Skip to content

Instantly share code, notes, and snippets.

@yueyericardo
Last active March 9, 2023 17:29
Show Gist options
  • Save yueyericardo/e7409a2e7faae40bfeb2c3f08d9106ac to your computer and use it in GitHub Desktop.
Save yueyericardo/e7409a2e7faae40bfeb2c3f08d9106ac to your computer and use it in GitHub Desktop.

CUAEV benchmark: https://github.com/roitberg-group/torchani_sandbox/blob/ed90fa65a7f07e59a95e75c962371a37ffb69ba0/tools/aev-benchmark-size.py

intrinsics on: python setup.py develop --ext --cuaev-opt use_fast_math need nvcc args: -use_fast_math

A100 Result

Turn on use_fast_math, Turn on intrinsics

File: small.pdb, Molecule size: 264 / 264, Species: [1, 6, 7, 8]

Original TorchANI:
   GPU Memory Cached (pytorch) :   134.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
   GPU Memory Used (nvidia-smi):   982.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
  Duration: 0.37 s
  Speed: 1.87 ms/it

CUaev:
   GPU Memory Cached (pytorch) :   128.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
   GPU Memory Used (nvidia-smi):   976.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
  Duration: 0.07 s
  Speed: 0.36 ms/it
  aev_error: 3.10e-06
  Speed up: 5.14 X

----------------------------------------------------------------------

File: 1hz5.pdb, Molecule size: 973 / 973, Species: [1, 6, 7, 8, 16]

Original TorchANI:
   GPU Memory Cached (pytorch) :   186.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
   GPU Memory Used (nvidia-smi):  1034.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
  Duration: 0.36 s
  Speed: 1.81 ms/it

CUaev:
   GPU Memory Cached (pytorch) :   128.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
   GPU Memory Used (nvidia-smi):   976.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
  Duration: 0.07 s
  Speed: 0.35 ms/it
  aev_error: 4.77e-06
  Speed up: 5.15 X

----------------------------------------------------------------------

File: 6W8H.pdb, Molecule size: 3410 / 3410, Species: [6, 7, 8, 16]

Original TorchANI:
   GPU Memory Cached (pytorch) :   496.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
   GPU Memory Used (nvidia-smi):  1344.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
  Duration: 0.50 s
  Speed: 2.52 ms/it

CUaev:
   GPU Memory Cached (pytorch) :   188.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
   GPU Memory Used (nvidia-smi):  1036.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
  Duration: 0.08 s
  Speed: 0.38 ms/it
  aev_error: 3.10e-06
  Speed up: 6.59 X

----------------------------------------------------------------------

File: 1C17.pdb, Molecule size: 6000 / 16649, Species: [1, 6, 7, 8, 16]

Original TorchANI:
   GPU Memory Cached (pytorch) :  1166.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
   GPU Memory Used (nvidia-smi):  2014.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
  Duration: 1.03 s
  Speed: 5.17 ms/it

CUaev:
   GPU Memory Cached (pytorch) :   244.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
   GPU Memory Used (nvidia-smi):  1092.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
  Duration: 0.15 s
  Speed: 0.76 ms/it
  aev_error: 5.48e-06
  Speed up: 6.84 X

----------------------------------------------------------------------

File: 1C17.pdb, Molecule size: 10000 / 16649, Species: [1, 6, 7, 8, 16]

Original TorchANI:
   GPU Memory Cached (pytorch) :  3028.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
   GPU Memory Used (nvidia-smi):  3876.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
  Duration: 2.22 s
  Speed: 11.08 ms/it

CUaev:
   GPU Memory Cached (pytorch) :   334.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
   GPU Memory Used (nvidia-smi):  1182.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
  Duration: 0.23 s
  Speed: 1.15 ms/it
  aev_error: 5.72e-06
  Speed up: 9.61 X

----------------------------------------------------------------------

Add Backward

File: small.pdb, Molecule size: 264 / 264, Species: [1, 6, 7, 8]

Original TorchANI:
   GPU Memory Cached (pytorch) :   190.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
   GPU Memory Used (nvidia-smi):  1038.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
  Duration: 0.69 s
  Speed: 3.44 ms/it

CUaev:
   GPU Memory Cached (pytorch) :   190.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
   GPU Memory Used (nvidia-smi):  1038.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
  Duration: 0.10 s
  Speed: 0.49 ms/it
  aev_error: 2.86e-06
  force_error: 3.43e-05
  Speed up: 7.00 X

----------------------------------------------------------------------

File: 1hz5.pdb, Molecule size: 973 / 973, Species: [1, 6, 7, 8, 16]

Original TorchANI:
   GPU Memory Cached (pytorch) :   418.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
   GPU Memory Used (nvidia-smi):  1266.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
  Duration: 0.70 s
  Speed: 3.51 ms/it

CUaev:
   GPU Memory Cached (pytorch) :   392.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
   GPU Memory Used (nvidia-smi):  1240.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
  Duration: 0.11 s
  Speed: 0.53 ms/it
  aev_error: 5.01e-06
  force_error: 3.81e-05
  Speed up: 6.57 X

----------------------------------------------------------------------

File: 6W8H.pdb, Molecule size: 3410 / 3410, Species: [6, 7, 8, 16]

Original TorchANI:
   GPU Memory Cached (pytorch) :   782.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
   GPU Memory Used (nvidia-smi):  1630.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
  Duration: 0.85 s
  Speed: 4.26 ms/it

CUaev:
   GPU Memory Cached (pytorch) :   486.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
   GPU Memory Used (nvidia-smi):  1334.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
  Duration: 0.11 s
  Speed: 0.57 ms/it
  aev_error: 3.10e-06
  force_error: 9.06e-05
  Speed up: 7.52 X

----------------------------------------------------------------------

File: 1C17.pdb, Molecule size: 6000 / 16649, Species: [1, 6, 7, 8, 16]

Original TorchANI:
   GPU Memory Cached (pytorch) :  3242.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
   GPU Memory Used (nvidia-smi):  4090.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
  Duration: 1.81 s
  Speed: 9.03 ms/it

CUaev:
   GPU Memory Cached (pytorch) :  2390.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
   GPU Memory Used (nvidia-smi):  3238.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
  Duration: 0.26 s
  Speed: 1.32 ms/it
  aev_error: 5.48e-06
  force_error: 4.58e-05
  Speed up: 6.86 X

----------------------------------------------------------------------

File: 1C17.pdb, Molecule size: 10000 / 16649, Species: [1, 6, 7, 8, 16]

Original TorchANI:
   GPU Memory Cached (pytorch) :  6674.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
   GPU Memory Used (nvidia-smi):  7522.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
  Duration: 3.44 s
  Speed: 17.19 ms/it

CUaev:
   GPU Memory Cached (pytorch) :  3994.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
   GPU Memory Used (nvidia-smi):  4842.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
  Duration: 0.40 s
  Speed: 1.99 ms/it
  aev_error: 5.48e-06
  force_error: 5.34e-05
  Speed up: 8.63 X

----------------------------------------------------------------------


-----------------------------------------------------------------------------------------------------------------------------------------
RUN                 PDB          Size         forward      backward     Others       Total        Total(200)   Speedup      GPU
-----------------------------------------------------------------------------------------------------------------------------------------
01 py aev fd        small.pdb    264          1.8 ms       0.0 ms       0.1 ms       1.9 ms       373.5 ms     -            982.0MB
02 cu aev fd        small.pdb    264          0.4 ms       0.0 ms       0.0 ms       0.4 ms       72.7 ms      5.14         976.0MB
03 py aev fd        1hz5.pdb     973          1.8 ms       0.0 ms       0.0 ms       1.8 ms       362.6 ms     -            1034.0MB
04 cu aev fd        1hz5.pdb     973          0.3 ms       0.0 ms       0.0 ms       0.4 ms       70.4 ms      5.15         976.0MB
05 py aev fd        6W8H.pdb     3410         2.5 ms       0.0 ms       0.0 ms       2.5 ms       503.5 ms     -            1344.0MB
06 cu aev fd        6W8H.pdb     3410         0.4 ms       0.0 ms       0.0 ms       0.4 ms       76.4 ms      6.59         1036.0MB
07 py aev fd        1C17.pdb     6000         5.2 ms       0.0 ms       0.0 ms       5.2 ms       1.034 sec    -            2014.0MB
08 cu aev fd        1C17.pdb     6000         0.7 ms       0.0 ms       0.0 ms       0.8 ms       151.1 ms     6.84         1092.0MB
09 py aev fd        1C17.pdb     10000        11.1 ms      0.0 ms       0.0 ms       11.1 ms      2.216 sec    -            3876.0MB
10 cu aev fd        1C17.pdb     10000        1.1 ms       0.0 ms       0.0 ms       1.2 ms       230.7 ms     9.61         1182.0MB
-----------------------------------------------------------------------------------------------------------------------------------------
11 py aev fd+bd     small.pdb    264          1.8 ms       1.6 ms       0.0 ms       3.4 ms       687.1 ms     -            1038.0MB
12 cu aev fd+bd     small.pdb    264          0.3 ms       0.1 ms       0.0 ms       0.5 ms       98.2 ms      7.00         1038.0MB
13 py aev fd+bd     1hz5.pdb     973          1.8 ms       1.6 ms       0.0 ms       3.5 ms       701.4 ms     -            1266.0MB
14 cu aev fd+bd     1hz5.pdb     973          0.3 ms       0.2 ms       0.0 ms       0.5 ms       106.8 ms     6.57         1240.0MB
15 py aev fd+bd     6W8H.pdb     3410         2.6 ms       1.6 ms       0.0 ms       4.3 ms       851.4 ms     -            1630.0MB
16 cu aev fd+bd     6W8H.pdb     3410         0.4 ms       0.2 ms       0.0 ms       0.6 ms       113.3 ms     7.52         1334.0MB
17 py aev fd+bd     1C17.pdb     6000         5.2 ms       3.8 ms       0.0 ms       9.0 ms       1.807 sec    -            4090.0MB
18 cu aev fd+bd     1C17.pdb     6000         0.7 ms       0.6 ms       0.0 ms       1.3 ms       263.4 ms     6.86         3238.0MB
19 py aev fd+bd     1C17.pdb     10000        11.1 ms      6.1 ms       0.0 ms       17.2 ms      3.437 sec    -            7522.0MB
20 cu aev fd+bd     1C17.pdb     10000        1.1 ms       0.8 ms       0.0 ms       2.0 ms       398.5 ms     8.63         4842.0MB
-----------------------------------------------------------------------------------------------------------------------------------------

Turn off use_fast_math, Turn off intrinsics

File: small.pdb, Molecule size: 264 / 264, Species: [1, 6, 7, 8]

Original TorchANI:
   GPU Memory Cached (pytorch) :   134.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
   GPU Memory Used (nvidia-smi):   982.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
  Duration: 0.37 s
  Speed: 1.85 ms/it

CUaev:
   GPU Memory Cached (pytorch) :   128.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
   GPU Memory Used (nvidia-smi):   976.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
  Duration: 0.08 s
  Speed: 0.40 ms/it
  aev_error: 1.19e-06
  Speed up: 4.64 X

----------------------------------------------------------------------

File: 1hz5.pdb, Molecule size: 973 / 973, Species: [1, 6, 7, 8, 16]

Original TorchANI:
   GPU Memory Cached (pytorch) :   186.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
   GPU Memory Used (nvidia-smi):  1034.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
  Duration: 0.36 s
  Speed: 1.79 ms/it

CUaev:
   GPU Memory Cached (pytorch) :   128.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
   GPU Memory Used (nvidia-smi):   976.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
  Duration: 0.08 s
  Speed: 0.39 ms/it
  aev_error: 1.67e-06
  Speed up: 4.63 X

----------------------------------------------------------------------

File: 6W8H.pdb, Molecule size: 3410 / 3410, Species: [6, 7, 8, 16]

Original TorchANI:
   GPU Memory Cached (pytorch) :   496.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
   GPU Memory Used (nvidia-smi):  1344.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
  Duration: 0.50 s
  Speed: 2.50 ms/it

CUaev:
   GPU Memory Cached (pytorch) :   188.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
   GPU Memory Used (nvidia-smi):  1036.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
  Duration: 0.09 s
  Speed: 0.46 ms/it
  aev_error: 1.19e-06
  Speed up: 5.48 X

----------------------------------------------------------------------

File: 1C17.pdb, Molecule size: 6000 / 16649, Species: [1, 6, 7, 8, 16]

Original TorchANI:
   GPU Memory Cached (pytorch) :  1166.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
   GPU Memory Used (nvidia-smi):  2014.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
  Duration: 1.03 s
  Speed: 5.15 ms/it

CUaev:
   GPU Memory Cached (pytorch) :   244.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
   GPU Memory Used (nvidia-smi):  1092.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
  Duration: 0.19 s
  Speed: 0.96 ms/it
  aev_error: 2.15e-06
  Speed up: 5.39 X

----------------------------------------------------------------------

File: 1C17.pdb, Molecule size: 10000 / 16649, Species: [1, 6, 7, 8, 16]

Original TorchANI:
   GPU Memory Cached (pytorch) :  3028.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
   GPU Memory Used (nvidia-smi):  3876.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
  Duration: 2.22 s
  Speed: 11.09 ms/it

CUaev:
   GPU Memory Cached (pytorch) :   334.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
   GPU Memory Used (nvidia-smi):  1182.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
  Duration: 0.30 s
  Speed: 1.49 ms/it
  aev_error: 2.38e-06
  Speed up: 7.42 X

----------------------------------------------------------------------

Add Backward

File: small.pdb, Molecule size: 264 / 264, Species: [1, 6, 7, 8]

Original TorchANI:
   GPU Memory Cached (pytorch) :   190.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
   GPU Memory Used (nvidia-smi):  1038.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
  Duration: 0.69 s
  Speed: 3.45 ms/it

CUaev:
   GPU Memory Cached (pytorch) :   190.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
   GPU Memory Used (nvidia-smi):  1038.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
  Duration: 0.12 s
  Speed: 0.58 ms/it
  aev_error: 1.19e-06
  force_error: 1.53e-05
  Speed up: 5.93 X

----------------------------------------------------------------------

File: 1hz5.pdb, Molecule size: 973 / 973, Species: [1, 6, 7, 8, 16]

Original TorchANI:
   GPU Memory Cached (pytorch) :   418.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
   GPU Memory Used (nvidia-smi):  1266.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
  Duration: 0.70 s
  Speed: 3.48 ms/it

CUaev:
   GPU Memory Cached (pytorch) :   392.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
   GPU Memory Used (nvidia-smi):  1240.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
  Duration: 0.13 s
  Speed: 0.66 ms/it
  aev_error: 1.91e-06
  force_error: 2.29e-05
  Speed up: 5.30 X

----------------------------------------------------------------------

File: 6W8H.pdb, Molecule size: 3410 / 3410, Species: [6, 7, 8, 16]

Original TorchANI:
   GPU Memory Cached (pytorch) :   782.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
   GPU Memory Used (nvidia-smi):  1630.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
  Duration: 0.85 s
  Speed: 4.24 ms/it

CUaev:
   GPU Memory Cached (pytorch) :   486.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
   GPU Memory Used (nvidia-smi):  1334.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
  Duration: 0.14 s
  Speed: 0.68 ms/it
  aev_error: 1.19e-06
  force_error: 1.81e-05
  Speed up: 6.26 X

----------------------------------------------------------------------

File: 1C17.pdb, Molecule size: 6000 / 16649, Species: [1, 6, 7, 8, 16]

Original TorchANI:
   GPU Memory Cached (pytorch) :  3242.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
   GPU Memory Used (nvidia-smi):  4090.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
  Duration: 1.81 s
  Speed: 9.03 ms/it

CUaev:
   GPU Memory Cached (pytorch) :  2412.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
   GPU Memory Used (nvidia-smi):  3260.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
  Duration: 0.39 s
  Speed: 1.97 ms/it
  aev_error: 2.15e-06
  force_error: 2.48e-05
  Speed up: 4.59 X

----------------------------------------------------------------------

File: 1C17.pdb, Molecule size: 10000 / 16649, Species: [1, 6, 7, 8, 16]

Original TorchANI:
   GPU Memory Cached (pytorch) :  6674.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
   GPU Memory Used (nvidia-smi):  7522.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
  Duration: 3.43 s
  Speed: 17.15 ms/it

CUaev:
   GPU Memory Cached (pytorch) :  3994.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
   GPU Memory Used (nvidia-smi):  4842.0MB / 81251.2MB (NVIDIA A100-SXM4-80GB)
  Duration: 0.61 s
  Speed: 3.03 ms/it
  aev_error: 2.62e-06
  force_error: 2.67e-05
  Speed up: 5.65 X

----------------------------------------------------------------------


-----------------------------------------------------------------------------------------------------------------------------------------
RUN                 PDB          Size         forward      backward     Others       Total        Total(200)   Speedup      GPU
-----------------------------------------------------------------------------------------------------------------------------------------
01 py aev fd        small.pdb    264          1.8 ms       0.0 ms       0.1 ms       1.9 ms       370.8 ms     -            982.0MB
02 cu aev fd        small.pdb    264          0.4 ms       0.0 ms       0.0 ms       0.4 ms       79.8 ms      4.64         976.0MB
03 py aev fd        1hz5.pdb     973          1.8 ms       0.0 ms       0.0 ms       1.8 ms       358.3 ms     -            1034.0MB
04 cu aev fd        1hz5.pdb     973          0.4 ms       0.0 ms       0.0 ms       0.4 ms       77.4 ms      4.63         976.0MB
05 py aev fd        6W8H.pdb     3410         2.5 ms       0.0 ms       0.0 ms       2.5 ms       500.1 ms     -            1344.0MB
06 cu aev fd        6W8H.pdb     3410         0.4 ms       0.0 ms       0.0 ms       0.5 ms       91.2 ms      5.48         1036.0MB
07 py aev fd        1C17.pdb     6000         5.1 ms       0.0 ms       0.0 ms       5.2 ms       1.031 sec    -            2014.0MB
08 cu aev fd        1C17.pdb     6000         0.9 ms       0.0 ms       0.0 ms       1.0 ms       191.1 ms     5.39         1092.0MB
09 py aev fd        1C17.pdb     10000        11.1 ms      0.0 ms       0.0 ms       11.1 ms      2.218 sec    -            3876.0MB
10 cu aev fd        1C17.pdb     10000        1.5 ms       0.0 ms       0.0 ms       1.5 ms       298.7 ms     7.42         1182.0MB
-----------------------------------------------------------------------------------------------------------------------------------------
11 py aev fd+bd     small.pdb    264          1.8 ms       1.6 ms       0.0 ms       3.5 ms       690.6 ms     -            1038.0MB
12 cu aev fd+bd     small.pdb    264          0.4 ms       0.2 ms       0.0 ms       0.6 ms       116.5 ms     5.93         1038.0MB
13 py aev fd+bd     1hz5.pdb     973          1.8 ms       1.6 ms       0.0 ms       3.5 ms       697.0 ms     -            1266.0MB
14 cu aev fd+bd     1hz5.pdb     973          0.4 ms       0.3 ms       0.0 ms       0.7 ms       131.6 ms     5.30         1240.0MB
15 py aev fd+bd     6W8H.pdb     3410         2.6 ms       1.6 ms       0.0 ms       4.2 ms       847.3 ms     -            1630.0MB
16 cu aev fd+bd     6W8H.pdb     3410         0.4 ms       0.3 ms       0.0 ms       0.7 ms       135.4 ms     6.26         1334.0MB
17 py aev fd+bd     1C17.pdb     6000         5.2 ms       3.8 ms       0.0 ms       9.0 ms       1.806 sec    -            4090.0MB
18 cu aev fd+bd     1C17.pdb     6000         0.9 ms       1.0 ms       0.0 ms       2.0 ms       393.8 ms     4.59         3260.0MB
19 py aev fd+bd     1C17.pdb     10000        11.1 ms      6.1 ms       0.0 ms       17.2 ms      3.431 sec    -            7522.0MB
20 cu aev fd+bd     1C17.pdb     10000        1.5 ms       1.6 ms       0.0 ms       3.0 ms       606.9 ms     5.65         4842.0MB
-----------------------------------------------------------------------------------------------------------------------------------------

2080 Ti Result

Turn on use_fast_math, Turn on intrinsics

 python aev-benchmark-size.py
 Check args: Namespace(N=200, backward=0, infer_model=0, mnp=0, nsight=False, plot=0, run_energy=0, single_nn=0, use_cell_list=False, use_cuaev_interface=False)
/blue/roitberg/apps/lammps-ani/external/torchani_sandbox/torchani/models.py:99: UserWarning: The default is now to accept atomic numbers as indexes, do not set periodic_table_index=True. if you need to accept raw indices set periodic_table_index=False
  warnings.warn("The default is now to accept atomic numbers as indexes,"
aev-benchmark-size.py:204: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at  /opt/conda/conda-bld/pytorch_1659484683044/work/torch/csrc/utils/tensor_new.cpp:201.)
  species = torch.tensor([mol.get_atomic_numbers()], device=device)
File: small.pdb, Molecule size: 264 / 264, Species: [1, 6, 7, 8]

Original TorchANI:
   GPU Memory Cached (pytorch) :   134.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):   666.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.38 s
  Speed: 1.88 ms/it

CUaev:
   GPU Memory Cached (pytorch) :   128.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):   660.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.07 s
  Speed: 0.33 ms/it
  aev_error: 3.10e-06
  Speed up: 5.62 X

----------------------------------------------------------------------

File: 1hz5.pdb, Molecule size: 973 / 973, Species: [1, 6, 7, 8, 16]

Original TorchANI:
   GPU Memory Cached (pytorch) :   186.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):   718.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.40 s
  Speed: 1.98 ms/it

CUaev:
   GPU Memory Cached (pytorch) :   128.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):   660.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.07 s
  Speed: 0.37 ms/it
  aev_error: 4.77e-06
  Speed up: 5.32 X

----------------------------------------------------------------------

File: 6W8H.pdb, Molecule size: 3410 / 3410, Species: [6, 7, 8, 16]

Original TorchANI:
   GPU Memory Cached (pytorch) :   496.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):  1028.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.76 s
  Speed: 3.82 ms/it

CUaev:
   GPU Memory Cached (pytorch) :   188.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):   720.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.09 s
  Speed: 0.44 ms/it
  aev_error: 3.10e-06
  Speed up: 8.76 X

----------------------------------------------------------------------

File: 1C17.pdb, Molecule size: 6000 / 16649, Species: [1, 6, 7, 8, 16]

Original TorchANI:
   GPU Memory Cached (pytorch) :  1166.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):  1698.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 2.21 s
  Speed: 11.05 ms/it

CUaev:
   GPU Memory Cached (pytorch) :   244.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):   776.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.24 s
  Speed: 1.19 ms/it
  aev_error: 5.72e-06
  Speed up: 9.28 X

----------------------------------------------------------------------

File: 1C17.pdb, Molecule size: 10000 / 16649, Species: [1, 6, 7, 8, 16]

Original TorchANI:
   GPU Memory Cached (pytorch) :  3028.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):  3560.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 5.12 s
  Speed: 25.59 ms/it

CUaev:
   GPU Memory Cached (pytorch) :   334.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):   866.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.39 s
  Speed: 1.97 ms/it
  aev_error: 6.20e-06
  Speed up: 12.96 X

----------------------------------------------------------------------

Add Backward

File: small.pdb, Molecule size: 264 / 264, Species: [1, 6, 7, 8]

Original TorchANI:
   GPU Memory Cached (pytorch) :   190.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):   722.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.84 s
  Speed: 4.18 ms/it

CUaev:
   GPU Memory Cached (pytorch) :   190.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):   722.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.09 s
  Speed: 0.47 ms/it
  aev_error: 2.98e-06
  force_error: 3.43e-05
  Speed up: 8.91 X

----------------------------------------------------------------------

File: 1hz5.pdb, Molecule size: 973 / 973, Species: [1, 6, 7, 8, 16]

Original TorchANI:
   GPU Memory Cached (pytorch) :   418.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):   950.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.89 s
  Speed: 4.45 ms/it

CUaev:
   GPU Memory Cached (pytorch) :   392.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):   924.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.12 s
  Speed: 0.62 ms/it
  aev_error: 4.77e-06
  force_error: 3.81e-05
  Speed up: 7.19 X

----------------------------------------------------------------------

File: 6W8H.pdb, Molecule size: 3410 / 3410, Species: [6, 7, 8, 16]

Original TorchANI:
   GPU Memory Cached (pytorch) :   782.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):  1314.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 1.27 s
  Speed: 6.34 ms/it

CUaev:
   GPU Memory Cached (pytorch) :   486.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):  1018.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.15 s
  Speed: 0.75 ms/it
  aev_error: 3.10e-06
  force_error: 8.96e-05
  Speed up: 8.45 X

----------------------------------------------------------------------

File: 1C17.pdb, Molecule size: 6000 / 16649, Species: [1, 6, 7, 8, 16]

Original TorchANI:
   GPU Memory Cached (pytorch) :  3242.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):  3774.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 4.02 s
  Speed: 20.12 ms/it

CUaev:
   GPU Memory Cached (pytorch) :  2390.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):  2922.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.44 s
  Speed: 2.18 ms/it
  aev_error: 5.48e-06
  force_error: 4.96e-05
  Speed up: 9.22 X

----------------------------------------------------------------------

File: 1C17.pdb, Molecule size: 10000 / 16649, Species: [1, 6, 7, 8, 16]

Original TorchANI:
   GPU Memory Cached (pytorch) :  6674.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):  7206.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 8.06 s
  Speed: 40.30 ms/it

CUaev:
   GPU Memory Cached (pytorch) :  3994.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):  4526.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.71 s
  Speed: 3.57 ms/it
  aev_error: 5.72e-06
  force_error: 4.58e-05
  Speed up: 11.30 X

----------------------------------------------------------------------


-----------------------------------------------------------------------------------------------------------------------------------------
RUN                 PDB          Size         forward      backward     Others       Total        Total(200)   Speedup      GPU
-----------------------------------------------------------------------------------------------------------------------------------------
01 py aev fd        small.pdb    264          1.8 ms       0.0 ms       0.1 ms       1.9 ms       376.1 ms     -            666.9MB
02 cu aev fd        small.pdb    264          0.3 ms       0.0 ms       0.0 ms       0.3 ms       66.9 ms      5.62         660.9MB
03 py aev fd        1hz5.pdb     973          2.0 ms       0.0 ms       0.0 ms       2.0 ms       396.6 ms     -            718.9MB
04 cu aev fd        1hz5.pdb     973          0.4 ms       0.0 ms       0.0 ms       0.4 ms       74.5 ms      5.32         660.9MB
05 py aev fd        6W8H.pdb     3410         3.8 ms       0.0 ms       0.0 ms       3.8 ms       763.5 ms     -            1028.9MB
06 cu aev fd        6W8H.pdb     3410         0.4 ms       0.0 ms       0.0 ms       0.4 ms       87.2 ms      8.76         720.9MB
07 py aev fd        1C17.pdb     6000         11.0 ms      0.0 ms       0.0 ms       11.0 ms      2.210 sec    -            1698.9MB
08 cu aev fd        1C17.pdb     6000         1.2 ms       0.0 ms       0.0 ms       1.2 ms       238.1 ms     9.28         776.9MB
09 py aev fd        1C17.pdb     10000        25.6 ms      0.0 ms       0.0 ms       25.6 ms      5.117 sec    -            3560.9MB
10 cu aev fd        1C17.pdb     10000        2.0 ms       0.0 ms       0.0 ms       2.0 ms       394.9 ms     12.96        866.9MB
-----------------------------------------------------------------------------------------------------------------------------------------
11 py aev fd+bd     small.pdb    264          2.1 ms       2.1 ms       0.0 ms       4.2 ms       836.7 ms     -            722.9MB
12 cu aev fd+bd     small.pdb    264          0.3 ms       0.1 ms       0.0 ms       0.5 ms       93.9 ms      8.91         722.9MB
13 py aev fd+bd     1hz5.pdb     973          2.2 ms       2.3 ms       0.0 ms       4.5 ms       890.8 ms     -            950.9MB
14 cu aev fd+bd     1hz5.pdb     973          0.4 ms       0.2 ms       0.0 ms       0.6 ms       123.9 ms     7.19         924.9MB
15 py aev fd+bd     6W8H.pdb     3410         4.1 ms       2.2 ms       0.0 ms       6.3 ms       1.269 sec    -            1314.9MB
16 cu aev fd+bd     6W8H.pdb     3410         0.4 ms       0.3 ms       0.0 ms       0.8 ms       150.1 ms     8.45         1018.9MB
17 py aev fd+bd     1C17.pdb     6000         11.2 ms      8.9 ms       0.0 ms       20.1 ms      4.024 sec    -            3774.9MB
18 cu aev fd+bd     1C17.pdb     6000         1.2 ms       0.9 ms       0.0 ms       2.2 ms       436.6 ms     9.22         2922.9MB
19 py aev fd+bd     1C17.pdb     10000        25.7 ms      14.6 ms      0.0 ms       40.3 ms      8.060 sec    -            7206.9MB
20 cu aev fd+bd     1C17.pdb     10000        2.1 ms       1.5 ms       0.0 ms       3.6 ms       713.1 ms     11.30        4526.9MB
-----------------------------------------------------------------------------------------------------------------------------------------

Turn OFF use_fast_math, Turn on intrinsics

python aev-benchmark-size.py
Check args: Namespace(N=200, backward=0, infer_model=0, mnp=0, nsight=False, plot=0, run_energy=0, single_nn=0, use_cell_list=False, use_cuaev_interface=False)
/blue/roitberg/apps/lammps-ani/external/torchani_sandbox/torchani/models.py:99: UserWarning: The default is now to accept atomic numbers as indexes, do not set periodic_table_index=True. if you need to accept raw indices set periodic_table_index=False
  warnings.warn("The default is now to accept atomic numbers as indexes,"
aev-benchmark-size.py:204: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at  /opt/conda/conda-bld/pytorch_1659484683044/work/torch/csrc/utils/tensor_new.cpp:201.)
  species = torch.tensor([mol.get_atomic_numbers()], device=device)
File: small.pdb, Molecule size: 264 / 264, Species: [1, 6, 7, 8]

Original TorchANI:
   GPU Memory Cached (pytorch) :   134.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):   666.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.38 s
  Speed: 1.88 ms/it

CUaev:
   GPU Memory Cached (pytorch) :   128.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):   660.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.07 s
  Speed: 0.34 ms/it
  aev_error: 2.86e-06
  Speed up: 5.55 X

----------------------------------------------------------------------

File: 1hz5.pdb, Molecule size: 973 / 973, Species: [1, 6, 7, 8, 16]

Original TorchANI:
   GPU Memory Cached (pytorch) :   186.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):   718.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.40 s
  Speed: 1.98 ms/it

CUaev:
   GPU Memory Cached (pytorch) :   128.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):   660.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.08 s
  Speed: 0.38 ms/it
  aev_error: 4.77e-06
  Speed up: 5.24 X

----------------------------------------------------------------------

File: 6W8H.pdb, Molecule size: 3410 / 3410, Species: [6, 7, 8, 16]

Original TorchANI:
   GPU Memory Cached (pytorch) :   496.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):  1028.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.77 s
  Speed: 3.84 ms/it

CUaev:
   GPU Memory Cached (pytorch) :   188.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):   720.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.09 s
  Speed: 0.45 ms/it
  aev_error: 3.34e-06
  Speed up: 8.54 X

----------------------------------------------------------------------

File: 1C17.pdb, Molecule size: 6000 / 16649, Species: [1, 6, 7, 8, 16]

Original TorchANI:
   GPU Memory Cached (pytorch) :  1166.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):  1698.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 2.21 s
  Speed: 11.06 ms/it

CUaev:
   GPU Memory Cached (pytorch) :   244.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):   776.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.24 s
  Speed: 1.22 ms/it
  aev_error: 6.20e-06
  Speed up: 9.04 X

----------------------------------------------------------------------

File: 1C17.pdb, Molecule size: 10000 / 16649, Species: [1, 6, 7, 8, 16]

Original TorchANI:
   GPU Memory Cached (pytorch) :  3028.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):  3560.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 5.12 s
  Speed: 25.59 ms/it

CUaev:
   GPU Memory Cached (pytorch) :   334.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):   866.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.41 s
  Speed: 2.07 ms/it
  aev_error: 5.72e-06
  Speed up: 12.37 X

----------------------------------------------------------------------

Add Backward

File: small.pdb, Molecule size: 264 / 264, Species: [1, 6, 7, 8]

Original TorchANI:
   GPU Memory Cached (pytorch) :   190.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):   722.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.85 s
  Speed: 4.23 ms/it

CUaev:
   GPU Memory Cached (pytorch) :   190.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):   722.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.10 s
  Speed: 0.49 ms/it
  aev_error: 2.86e-06
  force_error: 2.57e-05
  Speed up: 8.73 X

----------------------------------------------------------------------

File: 1hz5.pdb, Molecule size: 973 / 973, Species: [1, 6, 7, 8, 16]

Original TorchANI:
   GPU Memory Cached (pytorch) :   418.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):   950.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.89 s
  Speed: 4.46 ms/it

CUaev:
   GPU Memory Cached (pytorch) :   392.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):   924.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.13 s
  Speed: 0.64 ms/it
  aev_error: 5.01e-06
  force_error: 3.43e-05
  Speed up: 6.97 X

----------------------------------------------------------------------

File: 6W8H.pdb, Molecule size: 3410 / 3410, Species: [6, 7, 8, 16]

Original TorchANI:
   GPU Memory Cached (pytorch) :   782.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):  1314.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 1.27 s
  Speed: 6.36 ms/it

CUaev:
   GPU Memory Cached (pytorch) :   486.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):  1018.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.15 s
  Speed: 0.77 ms/it
  aev_error: 2.86e-06
  force_error: 8.20e-05
  Speed up: 8.23 X

----------------------------------------------------------------------

File: 1C17.pdb, Molecule size: 6000 / 16649, Species: [1, 6, 7, 8, 16]

Original TorchANI:
   GPU Memory Cached (pytorch) :  3242.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):  3774.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 4.02 s
  Speed: 20.11 ms/it

CUaev:
   GPU Memory Cached (pytorch) :  2390.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):  2922.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.46 s
  Speed: 2.31 ms/it
  aev_error: 5.25e-06
  force_error: 4.20e-05
  Speed up: 8.71 X

----------------------------------------------------------------------

File: 1C17.pdb, Molecule size: 10000 / 16649, Species: [1, 6, 7, 8, 16]

Original TorchANI:
   GPU Memory Cached (pytorch) :  6674.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):  7206.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 8.06 s
  Speed: 40.29 ms/it

CUaev:
   GPU Memory Cached (pytorch) :  3994.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):  4526.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.76 s
  Speed: 3.80 ms/it
  aev_error: 5.25e-06
  force_error: 3.43e-05
  Speed up: 10.59 X

----------------------------------------------------------------------


-----------------------------------------------------------------------------------------------------------------------------------------
RUN                 PDB          Size         forward      backward     Others       Total        Total(200)   Speedup      GPU
-----------------------------------------------------------------------------------------------------------------------------------------
01 py aev fd        small.pdb    264          1.8 ms       0.0 ms       0.1 ms       1.9 ms       376.9 ms     -            666.9MB
02 cu aev fd        small.pdb    264          0.3 ms       0.0 ms       0.0 ms       0.3 ms       68.0 ms      5.55         660.9MB
03 py aev fd        1hz5.pdb     973          2.0 ms       0.0 ms       0.0 ms       2.0 ms       395.5 ms     -            718.9MB
04 cu aev fd        1hz5.pdb     973          0.4 ms       0.0 ms       0.0 ms       0.4 ms       75.5 ms      5.24         660.9MB
05 py aev fd        6W8H.pdb     3410         3.8 ms       0.0 ms       0.0 ms       3.8 ms       767.5 ms     -            1028.9MB
06 cu aev fd        6W8H.pdb     3410         0.4 ms       0.0 ms       0.0 ms       0.4 ms       89.9 ms      8.54         720.9MB
07 py aev fd        1C17.pdb     6000         11.0 ms      0.0 ms       0.0 ms       11.1 ms      2.211 sec    -            1698.9MB
08 cu aev fd        1C17.pdb     6000         1.2 ms       0.0 ms       0.0 ms       1.2 ms       244.7 ms     9.04         776.9MB
09 py aev fd        1C17.pdb     10000        25.6 ms      0.0 ms       0.0 ms       25.6 ms      5.119 sec    -            3560.9MB
10 cu aev fd        1C17.pdb     10000        2.1 ms       0.0 ms       0.0 ms       2.1 ms       413.7 ms     12.37        866.9MB
-----------------------------------------------------------------------------------------------------------------------------------------
11 py aev fd+bd     small.pdb    264          2.1 ms       2.1 ms       0.0 ms       4.2 ms       846.9 ms     -            722.9MB
12 cu aev fd+bd     small.pdb    264          0.3 ms       0.2 ms       0.0 ms       0.5 ms       97.0 ms      8.73         722.9MB
13 py aev fd+bd     1hz5.pdb     973          2.2 ms       2.3 ms       0.0 ms       4.5 ms       892.6 ms     -            950.9MB
14 cu aev fd+bd     1hz5.pdb     973          0.4 ms       0.2 ms       0.0 ms       0.6 ms       128.1 ms     6.97         924.9MB
15 py aev fd+bd     6W8H.pdb     3410         4.1 ms       2.3 ms       0.0 ms       6.4 ms       1.271 sec    -            1314.9MB
16 cu aev fd+bd     6W8H.pdb     3410         0.5 ms       0.3 ms       0.0 ms       0.8 ms       154.5 ms     8.23         1018.9MB
17 py aev fd+bd     1C17.pdb     6000         11.2 ms      8.9 ms       0.0 ms       20.1 ms      4.022 sec    -            3774.9MB
18 cu aev fd+bd     1C17.pdb     6000         1.3 ms       1.0 ms       0.0 ms       2.3 ms       461.5 ms     8.71         2922.9MB
19 py aev fd+bd     1C17.pdb     10000        25.7 ms      14.6 ms      0.0 ms       40.3 ms      8.058 sec    -            7206.9MB
20 cu aev fd+bd     1C17.pdb     10000        2.2 ms       1.6 ms       0.0 ms       3.8 ms       760.8 ms     10.59        4526.9MB
-----------------------------------------------------------------------------------------------------------------------------------------

Turn off use_fast_math, Turn off intrinsics

python aev-benchmark-size.py
Check args: Namespace(N=200, backward=0, infer_model=0, mnp=0, nsight=False, plot=0, run_energy=0, single_nn=0, use_cell_list=False, use_cuaev_interface=False)
/blue/roitberg/apps/lammps-ani/external/torchani_sandbox/torchani/models.py:99: UserWarning: The default is now to accept atomic numbers as indexes, do not set periodic_table_index=True. if you need to accept raw indices set periodic_table_index=False
  warnings.warn("The default is now to accept atomic numbers as indexes,"
aev-benchmark-size.py:204: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at  /opt/conda/conda-bld/pytorch_1659484683044/work/torch/csrc/utils/tensor_new.cpp:201.)
  species = torch.tensor([mol.get_atomic_numbers()], device=device)
File: small.pdb, Molecule size: 264 / 264, Species: [1, 6, 7, 8]

Original TorchANI:
   GPU Memory Cached (pytorch) :   134.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):   666.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.37 s
  Speed: 1.86 ms/it

CUaev:
   GPU Memory Cached (pytorch) :   128.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):   660.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.07 s
  Speed: 0.37 ms/it
  aev_error: 1.19e-06
  Speed up: 5.09 X

----------------------------------------------------------------------

File: 1hz5.pdb, Molecule size: 973 / 973, Species: [1, 6, 7, 8, 16]

Original TorchANI:
   GPU Memory Cached (pytorch) :   186.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):   718.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.39 s
  Speed: 1.95 ms/it

CUaev:
   GPU Memory Cached (pytorch) :   128.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):   660.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.08 s
  Speed: 0.39 ms/it
  aev_error: 1.91e-06
  Speed up: 5.02 X

----------------------------------------------------------------------

File: 6W8H.pdb, Molecule size: 3410 / 3410, Species: [6, 7, 8, 16]

Original TorchANI:
   GPU Memory Cached (pytorch) :   496.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):  1028.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.76 s
  Speed: 3.82 ms/it

CUaev:
   GPU Memory Cached (pytorch) :   188.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):   720.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.09 s
  Speed: 0.46 ms/it
  aev_error: 9.54e-07
  Speed up: 8.23 X

----------------------------------------------------------------------

File: 1C17.pdb, Molecule size: 6000 / 16649, Species: [1, 6, 7, 8, 16]

Original TorchANI:
   GPU Memory Cached (pytorch) :  1166.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):  1698.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 2.21 s
  Speed: 11.05 ms/it

CUaev:
   GPU Memory Cached (pytorch) :   244.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):   776.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.26 s
  Speed: 1.29 ms/it
  aev_error: 2.62e-06
  Speed up: 8.58 X

----------------------------------------------------------------------

File: 1C17.pdb, Molecule size: 10000 / 16649, Species: [1, 6, 7, 8, 16]

Original TorchANI:
   GPU Memory Cached (pytorch) :  3028.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):  3560.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 5.12 s
  Speed: 25.60 ms/it

CUaev:
   GPU Memory Cached (pytorch) :   334.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):   866.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.44 s
  Speed: 2.20 ms/it
  aev_error: 2.38e-06
  Speed up: 11.62 X

----------------------------------------------------------------------

Add Backward

File: small.pdb, Molecule size: 264 / 264, Species: [1, 6, 7, 8]

Original TorchANI:
   GPU Memory Cached (pytorch) :   190.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):   722.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.84 s
  Speed: 4.19 ms/it

CUaev:
   GPU Memory Cached (pytorch) :   190.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):   722.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.11 s
  Speed: 0.54 ms/it
  aev_error: 1.01e-06
  force_error: 1.62e-05
  Speed up: 7.74 X

----------------------------------------------------------------------

File: 1hz5.pdb, Molecule size: 973 / 973, Species: [1, 6, 7, 8, 16]

Original TorchANI:
   GPU Memory Cached (pytorch) :   418.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):   950.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.89 s
  Speed: 4.46 ms/it

CUaev:
   GPU Memory Cached (pytorch) :   392.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):   924.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.15 s
  Speed: 0.73 ms/it
  aev_error: 1.91e-06
  force_error: 2.29e-05
  Speed up: 6.12 X

----------------------------------------------------------------------

File: 6W8H.pdb, Molecule size: 3410 / 3410, Species: [6, 7, 8, 16]

Original TorchANI:
   GPU Memory Cached (pytorch) :   782.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):  1314.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 1.28 s
  Speed: 6.38 ms/it

CUaev:
   GPU Memory Cached (pytorch) :   486.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):  1018.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.17 s
  Speed: 0.85 ms/it
  aev_error: 1.19e-06
  force_error: 1.72e-05
  Speed up: 7.52 X

----------------------------------------------------------------------

File: 1C17.pdb, Molecule size: 6000 / 16649, Species: [1, 6, 7, 8, 16]

Original TorchANI:
   GPU Memory Cached (pytorch) :  3242.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):  3774.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 4.03 s
  Speed: 20.14 ms/it

CUaev:
   GPU Memory Cached (pytorch) :  2390.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):  2922.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.57 s
  Speed: 2.86 ms/it
  aev_error: 2.38e-06
  force_error: 2.67e-05
  Speed up: 7.04 X

----------------------------------------------------------------------

File: 1C17.pdb, Molecule size: 10000 / 16649, Species: [1, 6, 7, 8, 16]

Original TorchANI:
   GPU Memory Cached (pytorch) :  6674.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):  7206.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 8.07 s
  Speed: 40.35 ms/it

CUaev:
   GPU Memory Cached (pytorch) :  3994.0MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
   GPU Memory Used (nvidia-smi):  4526.9MB / 11019.4MB (NVIDIA GeForce RTX 2080 Ti)
  Duration: 0.94 s
  Speed: 4.71 ms/it
  aev_error: 2.38e-06
  force_error: 2.91e-05
  Speed up: 8.56 X

----------------------------------------------------------------------


-----------------------------------------------------------------------------------------------------------------------------------------
RUN                 PDB          Size         forward      backward     Others       Total        Total(200)   Speedup      GPU
-----------------------------------------------------------------------------------------------------------------------------------------
01 py aev fd        small.pdb    264          1.8 ms       0.0 ms       0.1 ms       1.9 ms       372.6 ms     -            666.9MB
02 cu aev fd        small.pdb    264          0.4 ms       0.0 ms       0.0 ms       0.4 ms       73.2 ms      5.09         660.9MB
03 py aev fd        1hz5.pdb     973          1.9 ms       0.0 ms       0.0 ms       2.0 ms       390.7 ms     -            718.9MB
04 cu aev fd        1hz5.pdb     973          0.4 ms       0.0 ms       0.0 ms       0.4 ms       77.8 ms      5.02         660.9MB
05 py aev fd        6W8H.pdb     3410         3.8 ms       0.0 ms       0.0 ms       3.8 ms       763.7 ms     -            1028.9MB
06 cu aev fd        6W8H.pdb     3410         0.5 ms       0.0 ms       0.0 ms       0.5 ms       92.8 ms      8.23         720.9MB
07 py aev fd        1C17.pdb     6000         11.0 ms      0.0 ms       0.0 ms       11.0 ms      2.210 sec    -            1698.9MB
08 cu aev fd        1C17.pdb     6000         1.3 ms       0.0 ms       0.0 ms       1.3 ms       257.6 ms     8.58         776.9MB
09 py aev fd        1C17.pdb     10000        25.6 ms      0.0 ms       0.0 ms       25.6 ms      5.120 sec    -            3560.9MB
10 cu aev fd        1C17.pdb     10000        2.2 ms       0.0 ms       0.0 ms       2.2 ms       440.8 ms     11.62        866.9MB
-----------------------------------------------------------------------------------------------------------------------------------------
11 py aev fd+bd     small.pdb    264          2.1 ms       2.1 ms       0.0 ms       4.2 ms       837.1 ms     -            722.9MB
12 cu aev fd+bd     small.pdb    264          0.3 ms       0.2 ms       0.0 ms       0.5 ms       108.2 ms     7.74         722.9MB
13 py aev fd+bd     1hz5.pdb     973          2.2 ms       2.3 ms       0.0 ms       4.5 ms       893.0 ms     -            950.9MB
14 cu aev fd+bd     1hz5.pdb     973          0.4 ms       0.3 ms       0.0 ms       0.7 ms       146.0 ms     6.12         924.9MB
15 py aev fd+bd     6W8H.pdb     3410         4.1 ms       2.3 ms       0.0 ms       6.4 ms       1.276 sec    -            1314.9MB
16 cu aev fd+bd     6W8H.pdb     3410         0.5 ms       0.4 ms       0.0 ms       0.8 ms       169.8 ms     7.52         1018.9MB
17 py aev fd+bd     1C17.pdb     6000         11.2 ms      8.9 ms       0.0 ms       20.1 ms      4.027 sec    -            3774.9MB
18 cu aev fd+bd     1C17.pdb     6000         1.3 ms       1.5 ms       0.0 ms       2.9 ms       572.1 ms     7.04         2922.9MB
19 py aev fd+bd     1C17.pdb     10000        25.8 ms      14.6 ms      0.0 ms       40.4 ms      8.070 sec    -            7206.9MB
20 cu aev fd+bd     1C17.pdb     10000        2.3 ms       2.4 ms       0.0 ms       4.7 ms       943.0 ms     8.56         4526.9MB
-----------------------------------------------------------------------------------------------------------------------------------------
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment