NCCL plugin for Dolphin Interconnect PCI-e adapters, providing low-latency high-bandwith inter-node GPU communication that can be used for distributed training of artificial neural networks using frameworks such as TensorFlow and PyTorch.
- Linux (tested on Ubuntu 18.04)
- NVIDIA GPU (Quadro or Tesla GPU for GPUDirect RDMA support)
- Dolphin Interconnect Solutions software stack and supported hardware.
- CUDA (tested with 10.0)
- NCCL 2.4.*
- GCC (tested with 7.3.0)
- Autotools (autoconf, automake, libtool)
./autogen.sh
./configure
make
sudo make install
NCCL will automatically detect and load the plugin. Enable debug output
export NCCL_DEBUG=INFO
and you should see something like this in the terminal when you run your application:
NCCL INFO Trying to load SISCI
NCCL INFO NET/SISCI : adapter 0, node id 4
Use the NCCL_NET_GDR_LEVEL
enviromental variable to control the use of GDR. Run the NCCL tests to evaluate performance.
Make sure that the plugin is found either by adding the library install path (defaults to /usr/local/lib
) to the LD_LIBRARY_PATH
environment variable or to /etc/ld.so.conf
.
The GPU doesn't support GPUDirect RDMA. To disable GDR, run:
export NCCL_NET_GDR_LEVEL=0