Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@AHEADer
Last active February 5, 2020 07:09
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save AHEADer/81afa2ce355a6cd99e4d6dc49217e3e8 to your computer and use it in GitHub Desktop.
Save AHEADer/81afa2ce355a6cd99e4d6dc49217e3e8 to your computer and use it in GitHub Desktop.
benchmark

Benchmark

This is a speed benchmark for distributed training.

Enviroment

System configuration

  • Ubuntu xxx
  • CUDA xxx
  • NCCL xxx

Framework

  • Autobot xxx
  • Tensorflow xxx
  • Pytorch xxx
  • MXNet xxx

Profiling tools

  • cProfile
  • NVIDIA Nsight Systems
  • Profile tools provided by each framework

Testing models and experiments

Models

*Image Classification: ResNet50 VGG16 *Translation: GNMT-16 *Video Captioning: S2VT

Experiments

Each experient below should be tested among four deep learning frameworks.

  1. Different GPU placement (e.g. 4 GPUs in different nodes)
  2. Horovod or not, our Horvod vs offical Horovod
  3. RDMA or socket
  4. Different parallel architecture

Benchmark results

TODO

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment