These notes are targeting a Cray XC 40/50 system, however they should be relatively easy to generalize to other systems.
The following assumes your shell is bash. Intel provides .csh scripts too.
module swap PrgEnv-cray PrgEnv-intel
module swap intel/18.0.3.222
# For the next step, `which ifort` should point you in the right direction
source /opt/intel/advisor_2018/advixe-vars.sh intel64
export LD_LIBRARY_PATH="/opt/intel/advisor_2018/lib64:${LD_LIBRARY_PATH}"
The program must be compiled with debug symbols and dynamically linked. On Cray machines this means passing the following FCFLAGS
-g -dynamic
in addition to any other optimization flags being used.
Two scripts are used to collect the survey data which includes realistic timings, and then the loops tripcount analysis which causes very large runtime dilation. The survey should be run first followed by the tripcounts analysis.
#!/bin/bash
# survey.sh
# set this locally or the ADVIXE_PROJ_DIR environment variable in you environment to choose
# where the Intel Advisor-xe sample & analysis files will go
export _local_proj_dir=${ADVIXE_PROJ_DIR:-./proj}
export PMI_RANK=${ALPS_APP_PE}
export PMI_NO_FORK=1 # Otherwise we'll be instrumenting ALPS
export PMI_NO_PREINITIALIZE=1
export PMI_MMAP_SYNC_WAIT_TIME=300
advixe-cl -collect survey -trace-mpi --no-auto-finalize -project-dir ${_local_proj_dir} $@
#!/bin/bash
# tripcounts.sh
# set this locally or the ADVIXE_PROJ_DIR environment variable in you environment to choose
# where the Intel Advisor-xe sample & analysis files will go
export _local_proj_dir=${ADVIXE_PROJ_DIR:-./proj}
export PMI_RANK=${ALPS_APP_PE}
export PMI_NO_FORK=1
export PMI_NO_PREINITIALIZE=1
export PMI_MMAP_SYNC_WAIT_TIME=300
advixe-cl -collect tripcounts -flop -trace-mpi -project-dir ${_local_proj_dir} $@
Ensure both scripts are readable and executable with something like:
chmod +rx ./survey.sh ./tripcounts.sh
Then, to collect the data, pick a suitably small problem size, since:
- You will only be able to examine results on a single MPI rank at any given time
- The runtime dilation during the trip count phase can be quite large
In your batch script, or with an interactive job, ensure your environment is setup correctly, as shown above. Then, perform the survey analysis followed by the trip count analysis:
export ADVIXE_PROJ_DIR=/some/path/to/project/directory # if you don't want ./proj to be used
aprun -B ./survey.sh ./a.out . # -B will grab parameters from PBS, you can set -n, -N etc. explicitly instead
aprun -B ./tripcounts.sh ./a.out
Advisor-xe will create the directory ./proj
or ${ADVIXE_PROJ_DIR}
if it does not exist and
will place sampling/report data there. To ensure the analysis is performed for the same architecture
as the data collection was performed on, use the --snapshot
flag, on the compute node if needed.
(This is in fact needed if you wish to analyze the results for the KNL partition.)
aprun -n 1 -b advixe-cl --snapshot \
--project-dir ${ADVIXE_PROJ_DIR:-./proj} \
--pack \
--cache-sources \
--cache-binaries \
-- ${ADVIXE_PROJ_DIR:-./proj}_snapshot
Some survey data can also be exported as a CSV table for analysis with another tool using:
advixe-cl --report survey \
--project-dir ${ADVIXE_PROJ_DIR:-./proj} \
--show-all-columns \
--format=csv \
--report-output ./proj.csv
This webpage from Intel has the details about running advisor-xe: https://software.intel.com/en-us/articles/analyzing-intel-mpi-applications-using-intel-advisor
With this setup you should be able to collect roofline data for your HPC programs if you have access to a recent Intel Parallel Studio (18.x+) and Intel Advisor-xe. If you have questions or want to share your experiences, please comment below, and or tweet them to me.