Skip to content

Instantly share code, notes, and snippets.

@ds-hwang
Last active August 24, 2021 13:43
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ds-hwang/0c35e1f6bf7da4804ac1e91b21275cbd to your computer and use it in GitHub Desktop.
Save ds-hwang/0c35e1f6bf7da4804ac1e91b21275cbd to your computer and use it in GitHub Desktop.
pytorch perf benchmark: Xeon Skylake (36 cores, 72 threads) vs Quadro P1000 (640 cores, 4GB) on densenet121 fine tuning

Env

Model

  • densenet121: 8M parameters
  • feature image: 224x224x3, train epoch: 352, eval epoch: 40

Result

  • Intel takes 975.458 secs for one epoch training and one epoch validation
  • Nvidia takes 382.901 secs for one epoch training and one epoch validation
  • Nvidia is 2.55x faster although Nvidia is >10x cheaper.

Note: when Nvidia is used, about 30 cpu threads work in %27 utilization to distribute and aggregate(?) tasks.

Note: GeForce GTX 1080 is 3x better FLOP than Quadro P1000 :( according to CUDA benchmark

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment