• 2.8.3-1

    Ghost released this 2020-11-18 03:08:52 +08:00 | 91 commits to master since this release

    Optimization for Tree allreduce on A100.
    Improve aggregation performance.
    Use shared buffers for inter-node send/recv.
    Add NVTX profiling hooks.
    Accelerate alltoall connections by merging communication for all
    channels.
    Add support for one hop communication through NVLink, for faster
    send/recv communication on cubemesh topologies like DGX-1.
    Improve alltoall scheduling to better balance intra/inter node
    communication.
    Increase send/recv parallelism by 8x, each warp sending or
    receiving to a different peer.
    Net: move to v4.
    Net: make flush operation asynchronous to accelerate alltoall.
    Net: define maximum number of requests.
    Fix hang when using LL128 protocol after 2^31 steps.
    Fix #379 : topology injection failing when using less GPUs than
    described in the XML.
    Fix #394 : protocol mismatch causing hangs or crashes when using
    one GPU per node.

    Downloads