9 Commits

Author SHA1 Message Date
Sylvain Jeaugey
299c554dcc
2.5.6-1 (#255)
Add LL128 Protocol.

Rewrite the topology detection and tree/ring creation (#179). Improve
tree performance by sending/receiving from different GPUs. Add
model-based tuning to switch between the different algorithms and
protocols.

Rework P2P/SHM detection in containers (#155, #248).

Detect duplicated devices and return an error (#231).

Add tuning for GCP
2019-11-19 14:57:39 -08:00
David Addison
f40ce73e89 NCCL 2.4.6-1
Added detection of IBM/Power NVLink bridge device.
    Add NUMA support to PCI distance calculations.
    Added NCCL_IGNORE_CPU_AFFINITY env var.
    Fix memory leaks; GithubIssue#180
    Compiler warning fix; GithubIssue#178
    Replace non-standard variable length arrays. GithubIssue#171
    Fix Tree+Shared Memory crash. GithubPR#185
    Fix LL cleanup hang during long running DL jobs.
    Fix NCCL_RINGS environment variable handling.
    Added extra checks to catch repeat calls to ncclCommDestroy() GithubIssue#191
    Improve bootstrap socket connection reliability at scale.
    Fix hostname hashing issue. GithubIssue#187
    Code cleanup to rename all non device files from *.cu to *.cc
2019-04-05 13:05:45 -07:00
Sylvain Jeaugey
1450d42675 2.4.2-1
Add tree algorithms for allreduce to improve performance at scale.
Add ncclCommAbort() and ncclCommGetAsyncError() to properly handle
network errors and be permit recover.
Detect initial CPU affinity and no longer escape it.
2019-01-29 15:19:27 -08:00
Christian Sigg
3e6afef473 Qualify nullptr_t with std:: 2018-12-13 14:18:09 -08:00
Christian Sigg
346fc49514 Two temporary workarounds for cuda-clang issues. 2018-12-13 14:17:58 -08:00
Sylvain Jeaugey
0d3a20f96d Add support for external network.
Dynamically load external network from libnccl-net.so.
Add init function in networks.
Move PCI scoring to net.cu, only ask transport to provide a path.
Simplify CUDA PCI path detection.
Add dummy external network
2018-11-26 16:24:31 -08:00
Sylvain Jeaugey
3c6e25210b
Generate nccl.h in build instead of src
Generating nccl.h in src makes source directories dirty after builds.
2018-11-09 14:00:41 -08:00
David Addison
b56650c7f5 2.3.7-1
Improved LL tuning for multi-node jobs.
Improved bootstrap for large job scaling.
Fixed a hang during bootstrap due to socket reuse.
Added operation name to the COLL INFO logging.
2018-10-24 14:44:59 -07:00
Sylvain Jeaugey
f93fe9bfd9 2.3.5-5
Add support for inter-node communication using sockets and InfiniBand/RoCE.
Improve latency.
Add support for aggregation.
Improve LL/regular tuning.
Remove tests as those are now at github.com/nvidia/nccl-tests .
2018-09-25 14:12:01 -07:00