Skip to content

Instantly share code, notes, and snippets.

@eltonfc

eltonfc/md.log Secret

Created January 24, 2019 02:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save eltonfc/dd8755bce756b627464df70faa9d3bab to your computer and use it in GitHub Desktop.
Save eltonfc/dd8755bce756b627464df70faa9d3bab to your computer and use it in GitHub Desktop.
Gromacs 2019 adh_cubic_vsites benchmark with OpenCL on a haswell
:-) GROMACS - gmx mdrun, 2019 (-:
GROMACS is written by:
Emile Apol Rossen Apostolov Paul Bauer Herman J.C. Berendsen
Par Bjelkmar Christian Blau Viacheslav Bolnykh Kevin Boyd
Aldert van Buuren Rudi van Drunen Anton Feenstra Alan Gray
Gerrit Groenhof Anca Hamuraru Vincent Hindriksen M. Eric Irrgang
Aleksei Iupinov Christoph Junghans Joe Jordan Dimitrios Karkoulis
Peter Kasson Jiri Kraus Carsten Kutzner Per Larsson
Justin A. Lemkul Viveca Lindahl Magnus Lundborg Erik Marklund
Pascal Merz Pieter Meulenhoff Teemu Murtola Szilard Pall
Sander Pronk Roland Schulz Michael Shirts Alexey Shvetsov
Alfons Sijbers Peter Tieleman Jon Vincent Teemu Virolainen
Christian Wennberg Maarten Wolf
and the project leaders:
Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel
Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2018, The GROMACS development team at
Uppsala University, Stockholm University and
the Royal Institute of Technology, Sweden.
check out http://www.gromacs.org for more information.
GROMACS is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License
as published by the Free Software Foundation; either version 2.1
of the License, or (at your option) any later version.
GROMACS: gmx mdrun, version 2019
Executable: /home/eltonfc/.local//bin/gmx
Data prefix: /home/eltonfc/.local/
Working dir: /home/eltonfc/trab/software/gromacs/bench/adh_cubic_vsites
Process ID: 25079
Command line:
gmx mdrun -v -maxh .5 -notunepme
GROMACS version: 2019
Precision: single
Memory model: 64 bit
MPI library: thread_mpi
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support: OpenCL
SIMD instructions: AVX2_256
FFT library: fftw-3.3.6-pl2-fma-sse2-avx-avx2-avx2_128
RDTSCP usage: enabled
TNG support: enabled
Hwloc support: hwloc-1.11.2
Tracing support: disabled
C compiler: /usr/bin/cc GNU 7.3.0
C compiler flags: -mavx2 -mfma -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast
C++ compiler: /usr/bin/c++ GNU 7.3.0
C++ compiler flags: -mavx2 -mfma -std=c++11 -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast
OpenCL include dir: /usr/include
OpenCL library: /usr/lib/libOpenCL.so
OpenCL version: 2.0
Running on 1 node with total 4 cores, 8 logical cores, 1 compatible GPU
Hardware detected:
CPU info:
Vendor: Intel
Brand: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
Family: 6 Model: 60 Stepping: 3
Features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma hle htt intel lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp rtm sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
Hardware topology: Full, with devices
Sockets, cores, and logical processors:
Socket 0: [ 0 4] [ 1 5] [ 2 6] [ 3 7]
Numa nodes:
Node 0 (16704245760 bytes mem): 0 1 2 3 4 5 6 7
Latency:
0
0 1.00
Caches:
L1: 32768 bytes, linesize 64 bytes, assoc. 8, shared 2 ways
L2: 262144 bytes, linesize 64 bytes, assoc. 8, shared 2 ways
L3: 8388608 bytes, linesize 64 bytes, assoc. 16, shared 8 ways
PCI devices:
0000:00:02.0 Id: 8086:0412 Class: 0x0300 Numa: 0
0000:00:19.0 Id: 8086:153a Class: 0x0200 Numa: 0
0000:00:1f.2 Id: 8086:8c02 Class: 0x0106 Numa: 0
GPU info:
Number of GPUs detected: 1
#0: name: Intel(R) HD Graphics Haswell GT2 Desktop, vendor: Intel, device version: OpenCL 1.2 beignet 1.3, stat: compatible
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
M. J. Abraham, T. Murtola, R. Schulz, S. Páll, J. C. Smith, B. Hess, E.
Lindahl
GROMACS: High performance molecular simulations through multi-level
parallelism from laptops to supercomputers
SoftwareX 1 (2015) pp. 19-25
-------- -------- --- Thank You --- -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Páll, M. J. Abraham, C. Kutzner, B. Hess, E. Lindahl
Tackling Exascale Software Challenges in Molecular Dynamics Simulations with
GROMACS
In S. Markidis & E. Laure (Eds.), Solving Software Challenges for Exascale 8759 (2015) pp. 3-27
-------- -------- --- Thank You --- -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Pronk, S. Páll, R. Schulz, P. Larsson, P. Bjelkmar, R. Apostolov, M. R.
Shirts, J. C. Smith, P. M. Kasson, D. van der Spoel, B. Hess, and E. Lindahl
GROMACS 4.5: a high-throughput and highly parallel open source molecular
simulation toolkit
Bioinformatics 29 (2013) pp. 845-54
-------- -------- --- Thank You --- -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable
molecular simulation
J. Chem. Theory Comput. 4 (2008) pp. 435-447
-------- -------- --- Thank You --- -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C.
Berendsen
GROMACS: Fast, Flexible and Free
J. Comp. Chem. 26 (2005) pp. 1701-1719
-------- -------- --- Thank You --- -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
E. Lindahl and B. Hess and D. van der Spoel
GROMACS 3.0: A package for molecular simulation and trajectory analysis
J. Mol. Mod. 7 (2001) pp. 306-317
-------- -------- --- Thank You --- -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
H. J. C. Berendsen, D. van der Spoel and R. van Drunen
GROMACS: A message-passing parallel molecular dynamics implementation
Comp. Phys. Comm. 91 (1995) pp. 43-56
-------- -------- --- Thank You --- -------- --------
++++ PLEASE CITE THE DOI FOR THIS VERSION OF GROMACS ++++
https://doi.org/10.5281/zenodo.2424363
-------- -------- --- Thank You --- -------- --------
Input Parameters:
integrator = md
tinit = 0
dt = 0.005
nsteps = 10000
init-step = 0
simulation-part = 1
comm-mode = Linear
nstcomm = 100
bd-fric = 0
ld-seed = 232683026
emtol = 10
emstep = 0.01
niter = 20
fcstep = 0
nstcgsteep = 1000
nbfgscorr = 10
rtpi = 0.05
nstxout = 0
nstvout = 0
nstfout = 0
nstlog = 0
nstcalcenergy = 100
nstenergy = 500
nstxout-compressed = 0
compressed-x-precision = 1000
cutoff-scheme = Verlet
nstlist = 10
ns-type = Grid
pbc = xyz
periodic-molecules = false
verlet-buffer-tolerance = 0.005
rlist = 0.935
coulombtype = PME
coulomb-modifier = Potential-shift
rcoulomb-switch = 0
rcoulomb = 0.9
epsilon-r = 1
epsilon-rf = inf
vdw-type = Cut-off
vdw-modifier = Potential-shift
rvdw-switch = 0
rvdw = 0.9
DispCorr = No
table-extension = 1
fourierspacing = 0.1125
fourier-nx = 100
fourier-ny = 100
fourier-nz = 100
pme-order = 4
ewald-rtol = 1e-05
ewald-rtol-lj = 0.001
lj-pme-comb-rule = Geometric
ewald-geometry = 0
epsilon-surface = 0
tcoupl = V-rescale
nsttcouple = 10
nh-chain-length = 0
print-nose-hoover-chain-variables = false
pcoupl = No
pcoupltype = Isotropic
nstpcouple = -1
tau-p = 1
compressibility (3x3):
compressibility[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
compressibility[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
compressibility[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
ref-p (3x3):
ref-p[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
ref-p[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
ref-p[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
refcoord-scaling = No
posres-com (3):
posres-com[0]= 0.00000e+00
posres-com[1]= 0.00000e+00
posres-com[2]= 0.00000e+00
posres-comB (3):
posres-comB[0]= 0.00000e+00
posres-comB[1]= 0.00000e+00
posres-comB[2]= 0.00000e+00
QMMM = false
QMconstraints = 0
QMMMscheme = 0
MMChargeScaleFactor = 1
qm-opts:
ngQM = 0
constraint-algorithm = Lincs
continuation = false
Shake-SOR = false
shake-tol = 0.0001
lincs-order = 6
lincs-iter = 1
lincs-warnangle = 30
nwall = 0
wall-type = 9-3
wall-r-linpot = -1
wall-atomtype[0] = -1
wall-atomtype[1] = -1
wall-density[0] = 0
wall-density[1] = 0
wall-ewald-zfac = 3
pull = false
awh = false
rotation = false
interactiveMD = false
disre = No
disre-weighting = Conservative
disre-mixed = false
dr-fc = 1000
dr-tau = 0
nstdisreout = 100
orire-fc = 0
orire-tau = 0
nstorireout = 100
free-energy = no
cos-acceleration = 0
deform (3x3):
deform[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
deform[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
deform[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
simulated-tempering = false
swapcoords = no
userint1 = 0
userint2 = 0
userint3 = 0
userint4 = 0
userreal1 = 0
userreal2 = 0
userreal3 = 0
userreal4 = 0
applied-forces:
electric-field:
x:
E0 = 0
omega = 0
t0 = 0
sigma = 0
y:
E0 = 0
omega = 0
t0 = 0
sigma = 0
z:
E0 = 0
omega = 0
t0 = 0
sigma = 0
grpopts:
nrdf: 247713
ref-t: 300
tau-t: 0.1
annealing: No
annealing-npoints: 0
acc: 0 0 0
nfreeze: N N N
energygrp-flags[ 0]: 0
Changing rlist from 0.935 to 0.956 for non-bonded 4x2 atom kernels
Changing nstlist from 10 to 40, rlist from 0.956 to 1.094
Using 1 MPI thread
Using 8 OpenMP threads
1 GPU auto-selected for this run.
Mapping of GPU IDs to the 1 GPU task in the 1 rank on this node:
PP:0
PP tasks will do (non-perturbed) short-ranged interactions on the GPU
Pinning threads with an auto-selected logical core stride of 1
System total charge: 0.000
Will do PME sum in reciprocal space for electrostatic interactions.
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L. G. Pedersen
A smooth particle mesh Ewald method
J. Chem. Phys. 103 (1995) pp. 8577-8592
-------- -------- --- Thank You --- -------- --------
Using a Gaussian width (1/beta) of 0.288146 nm for Ewald
Potential shift: LJ r^-12: -3.541e+00 r^-6: -1.882e+00, Ewald -1.111e-05
Initialized non-bonded Ewald correction tables, spacing: 8.85e-04 size: 1018
Generated table with 1046 data points for 1-4 COUL.
Tabscale = 500 points/nm
Generated table with 1046 data points for 1-4 LJ6.
Tabscale = 500 points/nm
Generated table with 1046 data points for 1-4 LJ12.
Tabscale = 500 points/nm
Using GPU 4x4 nonbonded short-range kernels
Using a dual 4x2 pair-list setup updated with dynamic, rolling pruning:
outer list: updated every 40 steps, buffer 0.194 nm, rlist 1.094 nm
inner list: updated every 4 steps, buffer 0.011 nm, rlist 0.911 nm
At tolerance 0.005 kJ/mol/ps per atom, equivalent classical 1x1 list would be:
outer list: updated every 40 steps, buffer 0.304 nm, rlist 1.204 nm
inner list: updated every 4 steps, buffer 0.040 nm, rlist 0.940 nm
Using Lorentz-Berthelot Lennard-Jones combination rule
Removing pbc first time
Initializing LINear Constraint Solver
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
B. Hess
P-LINCS: A Parallel Linear Constraint Solver for molecular simulation
J. Chem. Theory Comput. 4 (2008) pp. 116-122
-------- -------- --- Thank You --- -------- --------
The number of constraints is 13140
3504 constraints are involved in constraint triangles,
will apply an additional matrix expansion of order 6 for couplings
between constraints inside triangles
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Miyamoto and P. A. Kollman
SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for Rigid
Water Models
J. Comp. Chem. 13 (1992) pp. 952-962
-------- -------- --- Thank You --- -------- --------
Center of mass motion removal mode is Linear
We have the following groups for center of mass motion removal:
0: rest
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
G. Bussi, D. Donadio and M. Parrinello
Canonical sampling through velocity rescaling
J. Chem. Phys. 126 (2007) pp. 014101
-------- -------- --- Thank You --- -------- --------
There are: 124552 Atoms
There are: 11672 VSites
Constraining the starting coordinates (step 0)
Constraining the coordinates at t0-dt (step 0)
RMS relative constraint deviation after constraining: 3.07e-05
Initial temperature: 299.88 K
Started mdrun on rank 0 Wed Jan 23 22:17:07 2019
Step Time
0 0.00000
Energies (kJ/mol)
Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14
2.52212e+04 5.17017e+04 2.06788e+03 2.32931e+04 2.87705e+05
LJ (SR) Coulomb (SR) Coul. recip. Potential Kinetic En.
2.00473e+05 -2.26452e+06 1.95926e+04 -1.65446e+06 3.22293e+05
Total Energy Conserved En. Temperature Pressure (bar) Constr. rmsd
-1.33217e+06 -1.33217e+06 3.12967e+02 4.47346e+02 6.07222e-05
Writing checkpoint, step 6000 at Wed Jan 23 22:32:12 2019
Step Time
10000 50.00000
Writing checkpoint, step 10000 at Wed Jan 23 22:42:16 2019
Energies (kJ/mol)
Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14
2.39171e+04 5.14713e+04 1.90487e+03 2.33074e+04 2.88157e+05
LJ (SR) Coulomb (SR) Coul. recip. Potential Kinetic En.
1.97225e+05 -2.25992e+06 1.95731e+04 -1.65436e+06 3.08572e+05
Total Energy Conserved En. Temperature Pressure (bar) Constr. rmsd
-1.34579e+06 -1.33699e+06 2.99642e+02 4.23996e+02 5.20327e-05
<====== ############### ==>
<==== A V E R A G E S ====>
<== ############### ======>
Statistics over 10001 steps using 101 frames
Energies (kJ/mol)
Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14
2.40135e+04 5.16104e+04 1.97281e+03 2.32118e+04 2.88337e+05
LJ (SR) Coulomb (SR) Coul. recip. Potential Kinetic En.
1.97950e+05 -2.26159e+06 1.94700e+04 -1.65502e+06 3.09169e+05
Total Energy Conserved En. Temperature Pressure (bar) Constr. rmsd
-1.34585e+06 -1.33423e+06 3.00222e+02 3.95776e+02 0.00000e+00
Total Virial (kJ/mol)
8.72490e+04 2.18022e+02 -2.04698e+01
2.17895e+02 8.71912e+04 -3.73210e+01
-2.03466e+01 -3.83831e+01 8.75039e+04
Pressure (bar)
3.96783e+02 -5.85625e+00 8.10214e-01
-5.85304e+00 4.01535e+02 -6.28920e-01
8.07118e-01 -6.02217e-01 3.89011e+02
M E G A - F L O P S A C C O U N T I N G
NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels
RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
W3=SPC/TIP3p W4=TIP4p (single or pairs)
V&F=Potential and force V=Potential only F=Force only
Computing: M-Number M-Flops % Flops
-----------------------------------------------------------------------------
Pair Search distance check 4723.039760 42507.358 0.1
NxN Ewald Elec. + LJ [F] 793216.982880 52352320.870 90.7
NxN Ewald Elec. + LJ [V&F] 8092.108144 865855.571 1.5
1,4 nonbonded interactions 562.216216 50599.459 0.1
Calc Weights 4087.128672 147136.632 0.3
Spread Q Bspline 87192.078336 174384.157 0.3
Gather F Bspline 87192.078336 523152.470 0.9
3D-FFT 398671.243138 3189369.945 5.5
Solve PME 100.010000 6400.640 0.0
Shift-X 34.192224 205.153 0.0
Angles 325.472544 54679.387 0.1
Propers 548.454840 125596.158 0.2
Impropers 40.564056 8437.324 0.0
Virial 13.763169 247.737 0.0
Stop-CM 13.894848 138.948 0.0
Calc-Ekin 272.720448 7363.452 0.0
Lincs 131.439420 7886.365 0.0
Lincs-Mat 3692.627456 14770.510 0.0
Constraint-V 1391.078160 11128.625 0.0
Constraint-Vir 12.719940 305.279 0.0
Settle 376.112800 121484.434 0.2
Virtual Site 3 21.012160 777.450 0.0
Virtual Site 3fd 19.880736 1888.670 0.0
Virtual Site 3fad 3.313456 583.168 0.0
Virtual Site 3out 57.217728 4977.942 0.0
Virtual Site 4fdn 16.486464 4187.562 0.0
-----------------------------------------------------------------------------
Total 57716385.269 100.0
-----------------------------------------------------------------------------
R E A L C Y C L E A N D T I M E A C C O U N T I N G
On 1 MPI rank, each using 8 OpenMP threads
Computing: Num Num Call Wall time Giga-Cycles
Ranks Threads Count (s) total sum %
-----------------------------------------------------------------------------
Vsite constr. 1 8 10001 1.000 28.733 0.1
Neighbor search 1 8 251 5.410 155.445 0.4
Launch GPU ops. 1 8 10001 1353.401 38888.866 89.6
Force 1 8 10001 6.941 199.443 0.5
PME mesh 1 8 10001 121.314 3485.853 8.0
Wait GPU NB local 1 8 10001 0.425 12.220 0.0
NB X/F buffer ops. 1 8 19751 6.577 188.988 0.4
Vsite spread 1 8 10102 1.572 45.169 0.1
Write traj. 1 8 2 0.505 14.506 0.0
Update 1 8 10001 5.537 159.111 0.4
Constraints 1 8 10003 6.719 193.054 0.4
Rest 0.691 19.863 0.0
-----------------------------------------------------------------------------
Total 1510.092 43391.251 100.0
-----------------------------------------------------------------------------
Breakdown of PME mesh computation
-----------------------------------------------------------------------------
PME spread 1 8 10001 40.884 1174.769 2.7
PME gather 1 8 10001 29.190 838.744 1.9
PME 3D-FFT 1 8 20002 48.085 1381.693 3.2
PME solve Elec 1 8 10001 3.060 87.920 0.2
-----------------------------------------------------------------------------
GPU timings
-----------------------------------------------------------------------------
Computing: Count Wall t (s) ms/step %
-----------------------------------------------------------------------------
Pair list H2D 251 0.377 1.500 0.0
X / q H2D 10001 18.341 1.834 0.0
Nonbonded F kernel 990055340232448.731 5589922469 50.0
Nonbonded F+ene k. 101 62.366 617.484 0.0
Pruning kernel 251 6.736 26.836 0.0
F D2H 1000155340232519.377 5533469904 50.0
-----------------------------------------------------------------------------
Total 110680465055.927 1106693981 100.0
-----------------------------------------------------------------------------
*Dynamic pruning 4750 25.872 5.447 0.0
-----------------------------------------------------------------------------
Average per-step force GPU/CPU evaluation time ratio: 11066939811.612 ms/12.824 ms = 862973673.837
For optimal resource utilization this ratio should be close to 1
NOTE: The GPU has >25% more load than the CPU. This imbalance wastes
CPU resources.
Core t (s) Wall t (s) (%)
Time: 12080.732 1510.092 800.0
(ns/day) (hour/ns)
Performance: 2.861 8.389
Finished mdrun on rank 0 Wed Jan 23 22:42:17 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment