Create a gist now

Instantly share code, notes, and snippets.

@Atlas7 /gflops.md Secret
Last active Aug 21, 2017

What would you like to do?
Intel Colfax Cluster - Estimate Theoretical Peak FLOPS for Intel Xeon Phi Processors

Intel Colfax Cluster - Notes - Index Page


This formula estimates the theoretical peak Giga-FLOPS (FLating-point Operations Per Second) an optimized code may achieve:

Peak GFLOPS = 
  Cores x (Base GHz - AVX GHz) x 
  VPUs-per-core x FMA Ops-per-instruction x 
  Floating Point Performance

Here is the theoretical GFLOPS (Giga-FLOPS) for the various Xeon Phi Architectures (see also notes below table):

Architecture Cores Base GHz AVX GHz VPUs-per-core FMA Ops-per-Instruction Precision Floating Point Performance Peak GFLOPS
Xeon Phi 7290 72 1.5 0.2 2 2 SP 16 5990
Xeon Phi 7290 72 1.5 0.2 2 2 DP 8 2995
Xeon Phi 7210 64 1.3 0.2 2 2 SP 16 4505
Xeon Phi 7210 64 1.3 0.2 2 2 DP 8 2253

Notes:

  • I obtained this formula from a Colfax Research Deepdive Series chat discussion on Aug 21st 2017 from the instructor (Andrey Vladimirov, Head of HPC Research at Colfax International).
  • The processor total cores and base frequency may be obtained from the spec page. For example:
  • Processor Frequency goes down by 200 MHz for AVX Workloads, hence the subtract of AVX GHz from Base GHz.
  • VPU stands for Vector Processing Unit. Each VPU can perform multiple floating point operations concurrently per cycle.
  • FMA stands for Fused-Multiply-Add. See FMA Operation Doc on Wikipedia for more info.
  • Floating Point (FP) Precision may be either Single Precision (SP) or Double Precision (DP).
  • The theoretical possible Floating Point Performance depends on whether precision is SP or DP. For a Xeon Phi 7210, it is 16 for SP, or 8 for DP.

References


Intel Colfax Cluster - Notes - Index Page

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment