• 2.11.4-1

    Ghost released this 2021-09-09 07:06:23 +08:00 | 83 commits to master since this release

    Add new API for creating a reduction operation which multiplies the input by a rank-specific scalar before doing an inter-rank summation (see: ncclRedOpCreatePreMulSum).
    Improve CollNet (SHARP) performance of ncclAllReduce when captured in a CUDA Graph via user buffer registration.
    Add environment variable NCCL_NET_PLUGIN="" to allow user to choose among multiple NCCL net plugins by substituting into "libnccl-net-.so".
    Fix memory leak of NVB connections.
    Fix topology detection of IB Virtual Functions (SR-IOV).

    Downloads