Fix data corruption with Tree/LL128 on systems with 1GPU:1NIC. Fix hang with Collnet on bfloat16 on systems with less than one NIC per GPU. Fix long initialization time. Fix data corruption with Collnet when mixing multi-process and multi-GPU per process. Fix crash when shared memory creation fails. Fix Avg operation with Collnet/Chain. Fix performance of alltoall at scale with more than one NIC per GPU. Fix performance for DGX H800. Fix race condition in connection progress causing a crash. Fix network flush with Collnet. Fix performance of aggregated allGather/reduceScatter operations. Fix PXN operation when CUDA_VISIBLE_DEVICES is set. Fix NVTX3 compilation issues on Debian 10.
7 lines
103 B
Makefile
7 lines
103 B
Makefile
##### version
|
|
NCCL_MAJOR := 2
|
|
NCCL_MINOR := 18
|
|
NCCL_PATCH := 3
|
|
NCCL_SUFFIX :=
|
|
PKG_REVISION := 1
|