-
2.11.4-1
released this
2021-09-09 07:06:23 +08:00 | 83 commits to master since this releaseAdd new API for creating a reduction operation which multiplies the input by a rank-specific scalar before doing an inter-rank summation (see: ncclRedOpCreatePreMulSum).
Improve CollNet (SHARP) performance of ncclAllReduce when captured in a CUDA Graph via user buffer registration.
Add environment variable NCCL_NET_PLUGIN="" to allow user to choose among multiple NCCL net plugins by substituting into "libnccl-net-.so".
Fix memory leak of NVB connections.
Fix topology detection of IB Virtual Functions (SR-IOV).Downloads