Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
pytorch perf benchmark: Xeon Skylake (36 cores, 72 threads) vs Quadro P1000 (640 cores, 4GB) on densenet121 fine tuning

Env

Model

  • densenet121: 8M parameters
  • feature image: 224x224x3, train epoch: 352, eval epoch: 40

Result

  • Intel takes 975.458 secs for one epoch training and one epoch validation
  • Nvidia takes 382.901 secs for one epoch training and one epoch validation
  • Nvidia is 2.55x faster although Nvidia is >10x cheaper.

Note: when Nvidia is used, about 30 cpu threads work in %27 utilization to distribute and aggregate(?) tasks.

Note: GeForce GTX 1080 is 3x better FLOP than Quadro P1000 :( according to CUDA benchmark

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.