nccl/makefiles
Ke Wen e11238b302 2.11.4-1
Add new API for creating a reduction operation which multiplies the input by a rank-specific scalar before doing an inter-rank summation (see: ncclRedOpCreatePreMulSum).
Improve CollNet (SHARP) performance of ncclAllReduce when captured in a CUDA Graph via user buffer registration.
Add environment variable NCCL_NET_PLUGIN="<suffix>" to allow user to choose among multiple NCCL net plugins by substituting into "libnccl-net-<suffix>.so".
Fix memory leak of NVB connections.
Fix topology detection of IB Virtual Functions (SR-IOV).
2021-09-08 16:06:23 -07:00
..
2021-07-08 14:30:14 -07:00
2019-04-05 13:05:45 -07:00
2021-09-08 16:06:23 -07:00