-
-
Save eltonfc/dd8755bce756b627464df70faa9d3bab to your computer and use it in GitHub Desktop.
Gromacs 2019 adh_cubic_vsites benchmark with OpenCL on a haswell
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
:-) GROMACS - gmx mdrun, 2019 (-: | |
GROMACS is written by: | |
Emile Apol Rossen Apostolov Paul Bauer Herman J.C. Berendsen | |
Par Bjelkmar Christian Blau Viacheslav Bolnykh Kevin Boyd | |
Aldert van Buuren Rudi van Drunen Anton Feenstra Alan Gray | |
Gerrit Groenhof Anca Hamuraru Vincent Hindriksen M. Eric Irrgang | |
Aleksei Iupinov Christoph Junghans Joe Jordan Dimitrios Karkoulis | |
Peter Kasson Jiri Kraus Carsten Kutzner Per Larsson | |
Justin A. Lemkul Viveca Lindahl Magnus Lundborg Erik Marklund | |
Pascal Merz Pieter Meulenhoff Teemu Murtola Szilard Pall | |
Sander Pronk Roland Schulz Michael Shirts Alexey Shvetsov | |
Alfons Sijbers Peter Tieleman Jon Vincent Teemu Virolainen | |
Christian Wennberg Maarten Wolf | |
and the project leaders: | |
Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel | |
Copyright (c) 1991-2000, University of Groningen, The Netherlands. | |
Copyright (c) 2001-2018, The GROMACS development team at | |
Uppsala University, Stockholm University and | |
the Royal Institute of Technology, Sweden. | |
check out http://www.gromacs.org for more information. | |
GROMACS is free software; you can redistribute it and/or modify it | |
under the terms of the GNU Lesser General Public License | |
as published by the Free Software Foundation; either version 2.1 | |
of the License, or (at your option) any later version. | |
GROMACS: gmx mdrun, version 2019 | |
Executable: /home/eltonfc/.local//bin/gmx | |
Data prefix: /home/eltonfc/.local/ | |
Working dir: /home/eltonfc/trab/software/gromacs/bench/adh_cubic_vsites | |
Process ID: 25079 | |
Command line: | |
gmx mdrun -v -maxh .5 -notunepme | |
GROMACS version: 2019 | |
Precision: single | |
Memory model: 64 bit | |
MPI library: thread_mpi | |
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64) | |
GPU support: OpenCL | |
SIMD instructions: AVX2_256 | |
FFT library: fftw-3.3.6-pl2-fma-sse2-avx-avx2-avx2_128 | |
RDTSCP usage: enabled | |
TNG support: enabled | |
Hwloc support: hwloc-1.11.2 | |
Tracing support: disabled | |
C compiler: /usr/bin/cc GNU 7.3.0 | |
C compiler flags: -mavx2 -mfma -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast | |
C++ compiler: /usr/bin/c++ GNU 7.3.0 | |
C++ compiler flags: -mavx2 -mfma -std=c++11 -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast | |
OpenCL include dir: /usr/include | |
OpenCL library: /usr/lib/libOpenCL.so | |
OpenCL version: 2.0 | |
Running on 1 node with total 4 cores, 8 logical cores, 1 compatible GPU | |
Hardware detected: | |
CPU info: | |
Vendor: Intel | |
Brand: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz | |
Family: 6 Model: 60 Stepping: 3 | |
Features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma hle htt intel lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp rtm sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic | |
Hardware topology: Full, with devices | |
Sockets, cores, and logical processors: | |
Socket 0: [ 0 4] [ 1 5] [ 2 6] [ 3 7] | |
Numa nodes: | |
Node 0 (16704245760 bytes mem): 0 1 2 3 4 5 6 7 | |
Latency: | |
0 | |
0 1.00 | |
Caches: | |
L1: 32768 bytes, linesize 64 bytes, assoc. 8, shared 2 ways | |
L2: 262144 bytes, linesize 64 bytes, assoc. 8, shared 2 ways | |
L3: 8388608 bytes, linesize 64 bytes, assoc. 16, shared 8 ways | |
PCI devices: | |
0000:00:02.0 Id: 8086:0412 Class: 0x0300 Numa: 0 | |
0000:00:19.0 Id: 8086:153a Class: 0x0200 Numa: 0 | |
0000:00:1f.2 Id: 8086:8c02 Class: 0x0106 Numa: 0 | |
GPU info: | |
Number of GPUs detected: 1 | |
#0: name: Intel(R) HD Graphics Haswell GT2 Desktop, vendor: Intel, device version: OpenCL 1.2 beignet 1.3, stat: compatible | |
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ | |
M. J. Abraham, T. Murtola, R. Schulz, S. Páll, J. C. Smith, B. Hess, E. | |
Lindahl | |
GROMACS: High performance molecular simulations through multi-level | |
parallelism from laptops to supercomputers | |
SoftwareX 1 (2015) pp. 19-25 | |
-------- -------- --- Thank You --- -------- -------- | |
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ | |
S. Páll, M. J. Abraham, C. Kutzner, B. Hess, E. Lindahl | |
Tackling Exascale Software Challenges in Molecular Dynamics Simulations with | |
GROMACS | |
In S. Markidis & E. Laure (Eds.), Solving Software Challenges for Exascale 8759 (2015) pp. 3-27 | |
-------- -------- --- Thank You --- -------- -------- | |
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ | |
S. Pronk, S. Páll, R. Schulz, P. Larsson, P. Bjelkmar, R. Apostolov, M. R. | |
Shirts, J. C. Smith, P. M. Kasson, D. van der Spoel, B. Hess, and E. Lindahl | |
GROMACS 4.5: a high-throughput and highly parallel open source molecular | |
simulation toolkit | |
Bioinformatics 29 (2013) pp. 845-54 | |
-------- -------- --- Thank You --- -------- -------- | |
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ | |
B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl | |
GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable | |
molecular simulation | |
J. Chem. Theory Comput. 4 (2008) pp. 435-447 | |
-------- -------- --- Thank You --- -------- -------- | |
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ | |
D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C. | |
Berendsen | |
GROMACS: Fast, Flexible and Free | |
J. Comp. Chem. 26 (2005) pp. 1701-1719 | |
-------- -------- --- Thank You --- -------- -------- | |
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ | |
E. Lindahl and B. Hess and D. van der Spoel | |
GROMACS 3.0: A package for molecular simulation and trajectory analysis | |
J. Mol. Mod. 7 (2001) pp. 306-317 | |
-------- -------- --- Thank You --- -------- -------- | |
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ | |
H. J. C. Berendsen, D. van der Spoel and R. van Drunen | |
GROMACS: A message-passing parallel molecular dynamics implementation | |
Comp. Phys. Comm. 91 (1995) pp. 43-56 | |
-------- -------- --- Thank You --- -------- -------- | |
++++ PLEASE CITE THE DOI FOR THIS VERSION OF GROMACS ++++ | |
https://doi.org/10.5281/zenodo.2424363 | |
-------- -------- --- Thank You --- -------- -------- | |
Input Parameters: | |
integrator = md | |
tinit = 0 | |
dt = 0.005 | |
nsteps = 10000 | |
init-step = 0 | |
simulation-part = 1 | |
comm-mode = Linear | |
nstcomm = 100 | |
bd-fric = 0 | |
ld-seed = 232683026 | |
emtol = 10 | |
emstep = 0.01 | |
niter = 20 | |
fcstep = 0 | |
nstcgsteep = 1000 | |
nbfgscorr = 10 | |
rtpi = 0.05 | |
nstxout = 0 | |
nstvout = 0 | |
nstfout = 0 | |
nstlog = 0 | |
nstcalcenergy = 100 | |
nstenergy = 500 | |
nstxout-compressed = 0 | |
compressed-x-precision = 1000 | |
cutoff-scheme = Verlet | |
nstlist = 10 | |
ns-type = Grid | |
pbc = xyz | |
periodic-molecules = false | |
verlet-buffer-tolerance = 0.005 | |
rlist = 0.935 | |
coulombtype = PME | |
coulomb-modifier = Potential-shift | |
rcoulomb-switch = 0 | |
rcoulomb = 0.9 | |
epsilon-r = 1 | |
epsilon-rf = inf | |
vdw-type = Cut-off | |
vdw-modifier = Potential-shift | |
rvdw-switch = 0 | |
rvdw = 0.9 | |
DispCorr = No | |
table-extension = 1 | |
fourierspacing = 0.1125 | |
fourier-nx = 100 | |
fourier-ny = 100 | |
fourier-nz = 100 | |
pme-order = 4 | |
ewald-rtol = 1e-05 | |
ewald-rtol-lj = 0.001 | |
lj-pme-comb-rule = Geometric | |
ewald-geometry = 0 | |
epsilon-surface = 0 | |
tcoupl = V-rescale | |
nsttcouple = 10 | |
nh-chain-length = 0 | |
print-nose-hoover-chain-variables = false | |
pcoupl = No | |
pcoupltype = Isotropic | |
nstpcouple = -1 | |
tau-p = 1 | |
compressibility (3x3): | |
compressibility[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} | |
compressibility[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} | |
compressibility[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} | |
ref-p (3x3): | |
ref-p[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} | |
ref-p[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} | |
ref-p[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} | |
refcoord-scaling = No | |
posres-com (3): | |
posres-com[0]= 0.00000e+00 | |
posres-com[1]= 0.00000e+00 | |
posres-com[2]= 0.00000e+00 | |
posres-comB (3): | |
posres-comB[0]= 0.00000e+00 | |
posres-comB[1]= 0.00000e+00 | |
posres-comB[2]= 0.00000e+00 | |
QMMM = false | |
QMconstraints = 0 | |
QMMMscheme = 0 | |
MMChargeScaleFactor = 1 | |
qm-opts: | |
ngQM = 0 | |
constraint-algorithm = Lincs | |
continuation = false | |
Shake-SOR = false | |
shake-tol = 0.0001 | |
lincs-order = 6 | |
lincs-iter = 1 | |
lincs-warnangle = 30 | |
nwall = 0 | |
wall-type = 9-3 | |
wall-r-linpot = -1 | |
wall-atomtype[0] = -1 | |
wall-atomtype[1] = -1 | |
wall-density[0] = 0 | |
wall-density[1] = 0 | |
wall-ewald-zfac = 3 | |
pull = false | |
awh = false | |
rotation = false | |
interactiveMD = false | |
disre = No | |
disre-weighting = Conservative | |
disre-mixed = false | |
dr-fc = 1000 | |
dr-tau = 0 | |
nstdisreout = 100 | |
orire-fc = 0 | |
orire-tau = 0 | |
nstorireout = 100 | |
free-energy = no | |
cos-acceleration = 0 | |
deform (3x3): | |
deform[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} | |
deform[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} | |
deform[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} | |
simulated-tempering = false | |
swapcoords = no | |
userint1 = 0 | |
userint2 = 0 | |
userint3 = 0 | |
userint4 = 0 | |
userreal1 = 0 | |
userreal2 = 0 | |
userreal3 = 0 | |
userreal4 = 0 | |
applied-forces: | |
electric-field: | |
x: | |
E0 = 0 | |
omega = 0 | |
t0 = 0 | |
sigma = 0 | |
y: | |
E0 = 0 | |
omega = 0 | |
t0 = 0 | |
sigma = 0 | |
z: | |
E0 = 0 | |
omega = 0 | |
t0 = 0 | |
sigma = 0 | |
grpopts: | |
nrdf: 247713 | |
ref-t: 300 | |
tau-t: 0.1 | |
annealing: No | |
annealing-npoints: 0 | |
acc: 0 0 0 | |
nfreeze: N N N | |
energygrp-flags[ 0]: 0 | |
Changing rlist from 0.935 to 0.956 for non-bonded 4x2 atom kernels | |
Changing nstlist from 10 to 40, rlist from 0.956 to 1.094 | |
Using 1 MPI thread | |
Using 8 OpenMP threads | |
1 GPU auto-selected for this run. | |
Mapping of GPU IDs to the 1 GPU task in the 1 rank on this node: | |
PP:0 | |
PP tasks will do (non-perturbed) short-ranged interactions on the GPU | |
Pinning threads with an auto-selected logical core stride of 1 | |
System total charge: 0.000 | |
Will do PME sum in reciprocal space for electrostatic interactions. | |
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ | |
U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L. G. Pedersen | |
A smooth particle mesh Ewald method | |
J. Chem. Phys. 103 (1995) pp. 8577-8592 | |
-------- -------- --- Thank You --- -------- -------- | |
Using a Gaussian width (1/beta) of 0.288146 nm for Ewald | |
Potential shift: LJ r^-12: -3.541e+00 r^-6: -1.882e+00, Ewald -1.111e-05 | |
Initialized non-bonded Ewald correction tables, spacing: 8.85e-04 size: 1018 | |
Generated table with 1046 data points for 1-4 COUL. | |
Tabscale = 500 points/nm | |
Generated table with 1046 data points for 1-4 LJ6. | |
Tabscale = 500 points/nm | |
Generated table with 1046 data points for 1-4 LJ12. | |
Tabscale = 500 points/nm | |
Using GPU 4x4 nonbonded short-range kernels | |
Using a dual 4x2 pair-list setup updated with dynamic, rolling pruning: | |
outer list: updated every 40 steps, buffer 0.194 nm, rlist 1.094 nm | |
inner list: updated every 4 steps, buffer 0.011 nm, rlist 0.911 nm | |
At tolerance 0.005 kJ/mol/ps per atom, equivalent classical 1x1 list would be: | |
outer list: updated every 40 steps, buffer 0.304 nm, rlist 1.204 nm | |
inner list: updated every 4 steps, buffer 0.040 nm, rlist 0.940 nm | |
Using Lorentz-Berthelot Lennard-Jones combination rule | |
Removing pbc first time | |
Initializing LINear Constraint Solver | |
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ | |
B. Hess | |
P-LINCS: A Parallel Linear Constraint Solver for molecular simulation | |
J. Chem. Theory Comput. 4 (2008) pp. 116-122 | |
-------- -------- --- Thank You --- -------- -------- | |
The number of constraints is 13140 | |
3504 constraints are involved in constraint triangles, | |
will apply an additional matrix expansion of order 6 for couplings | |
between constraints inside triangles | |
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ | |
S. Miyamoto and P. A. Kollman | |
SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for Rigid | |
Water Models | |
J. Comp. Chem. 13 (1992) pp. 952-962 | |
-------- -------- --- Thank You --- -------- -------- | |
Center of mass motion removal mode is Linear | |
We have the following groups for center of mass motion removal: | |
0: rest | |
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ | |
G. Bussi, D. Donadio and M. Parrinello | |
Canonical sampling through velocity rescaling | |
J. Chem. Phys. 126 (2007) pp. 014101 | |
-------- -------- --- Thank You --- -------- -------- | |
There are: 124552 Atoms | |
There are: 11672 VSites | |
Constraining the starting coordinates (step 0) | |
Constraining the coordinates at t0-dt (step 0) | |
RMS relative constraint deviation after constraining: 3.07e-05 | |
Initial temperature: 299.88 K | |
Started mdrun on rank 0 Wed Jan 23 22:17:07 2019 | |
Step Time | |
0 0.00000 | |
Energies (kJ/mol) | |
Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14 | |
2.52212e+04 5.17017e+04 2.06788e+03 2.32931e+04 2.87705e+05 | |
LJ (SR) Coulomb (SR) Coul. recip. Potential Kinetic En. | |
2.00473e+05 -2.26452e+06 1.95926e+04 -1.65446e+06 3.22293e+05 | |
Total Energy Conserved En. Temperature Pressure (bar) Constr. rmsd | |
-1.33217e+06 -1.33217e+06 3.12967e+02 4.47346e+02 6.07222e-05 | |
Writing checkpoint, step 6000 at Wed Jan 23 22:32:12 2019 | |
Step Time | |
10000 50.00000 | |
Writing checkpoint, step 10000 at Wed Jan 23 22:42:16 2019 | |
Energies (kJ/mol) | |
Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14 | |
2.39171e+04 5.14713e+04 1.90487e+03 2.33074e+04 2.88157e+05 | |
LJ (SR) Coulomb (SR) Coul. recip. Potential Kinetic En. | |
1.97225e+05 -2.25992e+06 1.95731e+04 -1.65436e+06 3.08572e+05 | |
Total Energy Conserved En. Temperature Pressure (bar) Constr. rmsd | |
-1.34579e+06 -1.33699e+06 2.99642e+02 4.23996e+02 5.20327e-05 | |
<====== ############### ==> | |
<==== A V E R A G E S ====> | |
<== ############### ======> | |
Statistics over 10001 steps using 101 frames | |
Energies (kJ/mol) | |
Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14 | |
2.40135e+04 5.16104e+04 1.97281e+03 2.32118e+04 2.88337e+05 | |
LJ (SR) Coulomb (SR) Coul. recip. Potential Kinetic En. | |
1.97950e+05 -2.26159e+06 1.94700e+04 -1.65502e+06 3.09169e+05 | |
Total Energy Conserved En. Temperature Pressure (bar) Constr. rmsd | |
-1.34585e+06 -1.33423e+06 3.00222e+02 3.95776e+02 0.00000e+00 | |
Total Virial (kJ/mol) | |
8.72490e+04 2.18022e+02 -2.04698e+01 | |
2.17895e+02 8.71912e+04 -3.73210e+01 | |
-2.03466e+01 -3.83831e+01 8.75039e+04 | |
Pressure (bar) | |
3.96783e+02 -5.85625e+00 8.10214e-01 | |
-5.85304e+00 4.01535e+02 -6.28920e-01 | |
8.07118e-01 -6.02217e-01 3.89011e+02 | |
M E G A - F L O P S A C C O U N T I N G | |
NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels | |
RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table | |
W3=SPC/TIP3p W4=TIP4p (single or pairs) | |
V&F=Potential and force V=Potential only F=Force only | |
Computing: M-Number M-Flops % Flops | |
----------------------------------------------------------------------------- | |
Pair Search distance check 4723.039760 42507.358 0.1 | |
NxN Ewald Elec. + LJ [F] 793216.982880 52352320.870 90.7 | |
NxN Ewald Elec. + LJ [V&F] 8092.108144 865855.571 1.5 | |
1,4 nonbonded interactions 562.216216 50599.459 0.1 | |
Calc Weights 4087.128672 147136.632 0.3 | |
Spread Q Bspline 87192.078336 174384.157 0.3 | |
Gather F Bspline 87192.078336 523152.470 0.9 | |
3D-FFT 398671.243138 3189369.945 5.5 | |
Solve PME 100.010000 6400.640 0.0 | |
Shift-X 34.192224 205.153 0.0 | |
Angles 325.472544 54679.387 0.1 | |
Propers 548.454840 125596.158 0.2 | |
Impropers 40.564056 8437.324 0.0 | |
Virial 13.763169 247.737 0.0 | |
Stop-CM 13.894848 138.948 0.0 | |
Calc-Ekin 272.720448 7363.452 0.0 | |
Lincs 131.439420 7886.365 0.0 | |
Lincs-Mat 3692.627456 14770.510 0.0 | |
Constraint-V 1391.078160 11128.625 0.0 | |
Constraint-Vir 12.719940 305.279 0.0 | |
Settle 376.112800 121484.434 0.2 | |
Virtual Site 3 21.012160 777.450 0.0 | |
Virtual Site 3fd 19.880736 1888.670 0.0 | |
Virtual Site 3fad 3.313456 583.168 0.0 | |
Virtual Site 3out 57.217728 4977.942 0.0 | |
Virtual Site 4fdn 16.486464 4187.562 0.0 | |
----------------------------------------------------------------------------- | |
Total 57716385.269 100.0 | |
----------------------------------------------------------------------------- | |
R E A L C Y C L E A N D T I M E A C C O U N T I N G | |
On 1 MPI rank, each using 8 OpenMP threads | |
Computing: Num Num Call Wall time Giga-Cycles | |
Ranks Threads Count (s) total sum % | |
----------------------------------------------------------------------------- | |
Vsite constr. 1 8 10001 1.000 28.733 0.1 | |
Neighbor search 1 8 251 5.410 155.445 0.4 | |
Launch GPU ops. 1 8 10001 1353.401 38888.866 89.6 | |
Force 1 8 10001 6.941 199.443 0.5 | |
PME mesh 1 8 10001 121.314 3485.853 8.0 | |
Wait GPU NB local 1 8 10001 0.425 12.220 0.0 | |
NB X/F buffer ops. 1 8 19751 6.577 188.988 0.4 | |
Vsite spread 1 8 10102 1.572 45.169 0.1 | |
Write traj. 1 8 2 0.505 14.506 0.0 | |
Update 1 8 10001 5.537 159.111 0.4 | |
Constraints 1 8 10003 6.719 193.054 0.4 | |
Rest 0.691 19.863 0.0 | |
----------------------------------------------------------------------------- | |
Total 1510.092 43391.251 100.0 | |
----------------------------------------------------------------------------- | |
Breakdown of PME mesh computation | |
----------------------------------------------------------------------------- | |
PME spread 1 8 10001 40.884 1174.769 2.7 | |
PME gather 1 8 10001 29.190 838.744 1.9 | |
PME 3D-FFT 1 8 20002 48.085 1381.693 3.2 | |
PME solve Elec 1 8 10001 3.060 87.920 0.2 | |
----------------------------------------------------------------------------- | |
GPU timings | |
----------------------------------------------------------------------------- | |
Computing: Count Wall t (s) ms/step % | |
----------------------------------------------------------------------------- | |
Pair list H2D 251 0.377 1.500 0.0 | |
X / q H2D 10001 18.341 1.834 0.0 | |
Nonbonded F kernel 990055340232448.731 5589922469 50.0 | |
Nonbonded F+ene k. 101 62.366 617.484 0.0 | |
Pruning kernel 251 6.736 26.836 0.0 | |
F D2H 1000155340232519.377 5533469904 50.0 | |
----------------------------------------------------------------------------- | |
Total 110680465055.927 1106693981 100.0 | |
----------------------------------------------------------------------------- | |
*Dynamic pruning 4750 25.872 5.447 0.0 | |
----------------------------------------------------------------------------- | |
Average per-step force GPU/CPU evaluation time ratio: 11066939811.612 ms/12.824 ms = 862973673.837 | |
For optimal resource utilization this ratio should be close to 1 | |
NOTE: The GPU has >25% more load than the CPU. This imbalance wastes | |
CPU resources. | |
Core t (s) Wall t (s) (%) | |
Time: 12080.732 1510.092 800.0 | |
(ns/day) (hour/ns) | |
Performance: 2.861 8.389 | |
Finished mdrun on rank 0 Wed Jan 23 22:42:17 2019 | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment