Add new API for creating a reduction operation which multiplies the input by a rank-specific scalar before doing an inter-rank summation (see: ncclRedOpCreatePreMulSum). Improve CollNet (SHARP) performance of ncclAllReduce when captured in a CUDA Graph via user buffer registration. Add environment variable NCCL_NET_PLUGIN="<suffix>" to allow user to choose among multiple NCCL net plugins by substituting into "libnccl-net-<suffix>.so". Fix memory leak of NVB connections. Fix topology detection of IB Virtual Functions (SR-IOV).
7 lines
103 B
Makefile
7 lines
103 B
Makefile
##### version
|
|
NCCL_MAJOR := 2
|
|
NCCL_MINOR := 11
|
|
NCCL_PATCH := 4
|
|
NCCL_SUFFIX :=
|
|
PKG_REVISION := 1
|