• NCCL 2.26.3-1

    Ghost released this 2025-04-23 04:50:40 +08:00 | 4 commits to master since this release

    Minimize the performance impact of the device kernel profiling support when
    the profiler plugin is not loaded.

    Reduce the overheads of CUDA graph capturing, which increased in NCCL
    2.26.2 for large graphs.

    Fix the exchange of enhanced connection establishment (ECE) options to
    address potential slowdowns on networks utilizing RoCE.

    Test if cuMem host allocations work and if not, disable them. Enabled by
    default since NCCL 2.24 if the CUDA driver version is at least 12.6, such
    allocations rely on NUMA support, which is by default not available under
    Docker. We recommend invoking Docker with "--cap-add SYS_NICE" to enable
    it.

    Fix an initialization error when running with NCCL_NET_GDR_C2C=1 on
    multiple MNNVL domains with non-uniform network configurations across
    nodes.

    Fix the printing of sub-seconds in the debug log when using a custom
    NCCL_DEBUG_TIMESTAMP_FORMAT setting.

    Downloads