Using multi-GPUs to train deep neural networks is a common practice and enables faster training times, as well as the ability to train larger models that cannot fit on a single GPU. This module walks through the concepts and examples needed to help users develop their own multi-GPU training models using high-level frameworks.


Download the Presentation
➤  Using RCCL Communication Collectives Library
ROCm Collective Communications library (RCCL) is one of two widely used communication libraries in ROCm. It supports a host of widely used multi-GPU fabrics and is used in Deep Neural Network training.
Watch Video
See an example
Download the Lab

 

➤  Multi-GPU with MPI
Message Passing Interface (MPI) is another communication library used in ROCm to scale to multiple nodes in HPC applications. This module will walk through an example of the scalability of MPI.
Watch Video
See an example
Download the Lab

 

➤  Multi GPU Summary
This last module summarizes the different communication libraries and concepts of multi-GPU programming.
Watch Video