178 Commits

Author SHA1 Message Date
Sylvain Jeaugey
b8a9a32ccb Add NCCL_NET flag to many debug lines. 2018-12-04 13:10:19 -08:00
Sylvain Jeaugey
cdae05b277 Improve INFO message when external network is not found.
Fix #162
2018-12-04 12:10:58 -08:00
David Addison
5fe2618c0e Fixed some compilation errors when TRACE=1 set 2018-11-29 14:12:14 -08:00
Sylvain Jeaugey
eed8218e17 Rework shared memory code to use SYSCHECK macros.
This is to handle EINTR/EGAIN properly (issue #137), and also
make the code consistent with the rest.

Unfortunately posix_fallocate and mmap do not follow the classic
return code/errno pattern, so we need to write wrappers around those
functions.
2018-11-29 12:52:13 -08:00
Sylvain Jeaugey
302d538b73 Rework SYSCHECK macros to better handle retries.
SYSCHECKVAL was not retrying when a retry was needed. Since not all
calls are inside a loop, that means we could silently miss an
EINTR/EAGAIN return code.

Also rework the socket connection code and improve error reporting.
2018-11-29 12:52:13 -08:00
Sylvain Jeaugey
61b50a63ef Improve net API description 2018-11-26 16:24:31 -08:00
Sylvain Jeaugey
98adf2fe11 Make network isend/irecv non blocking 2018-11-26 16:24:31 -08:00
Sylvain Jeaugey
0d3a20f96d Add support for external network.
Dynamically load external network from libnccl-net.so.
Add init function in networks.
Move PCI scoring to net.cu, only ask transport to provide a path.
Simplify CUDA PCI path detection.
Add dummy external network
2018-11-26 16:24:31 -08:00
Alex Sergeev
d7a58cfa58 Generate host-hash for P2P and SHM based on $(readlink /proc/self/ns/uts) + $(readlink /proc/self/ns/mnt) (#156) 2018-11-19 17:39:44 -08:00
Sylvain Jeaugey
3c6e25210b
Generate nccl.h in build instead of src
Generating nccl.h in src makes source directories dirty after builds.
2018-11-09 14:00:41 -08:00
Ke Wen
21d9a877be Add official builds download link 2018-11-08 11:22:28 -08:00
Sylvain Jeaugey
f7d31919d7
Add instructions to install packaging toolchain
Address #143 and #150 : debuild not installed.
2018-11-05 11:42:33 -08:00
Sylvain Jeaugey
bed43524cc Add install target
Fix issue #145
2018-11-05 09:53:59 -08:00
David Addison
b56650c7f5 2.3.7-1
Improved LL tuning for multi-node jobs.
Improved bootstrap for large job scaling.
Fixed a hang during bootstrap due to socket reuse.
Added operation name to the COLL INFO logging.
v2.3.7-1
2018-10-24 14:44:59 -07:00
Obihörnchen
3202d6b393 Fix nccl-tests all_reduce_perf path
It's `all_reduce_perf` not `allreduce_perf`
2018-10-14 00:53:17 -07:00
Sylvain Jeaugey
f93fe9bfd9 2.3.5-5
Add support for inter-node communication using sockets and InfiniBand/RoCE.
Improve latency.
Add support for aggregation.
Improve LL/regular tuning.
Remove tests as those are now at github.com/nvidia/nccl-tests .
v2.3.5-5
2018-09-25 14:12:01 -07:00
Sylvain Jeaugey
286916a1a3
Merge pull request #119 from sclarkson/master
Fix tests: call cudaHostUnregister on the host pointer instead of the device pointer.
2017-11-28 18:41:26 -08:00
sclarkson
680a35c6b7 fix tests on maxwell 2017-11-11 19:22:06 -08:00
Sylvain Jeaugey
03d856977e Update README to link to NCCL2 2017-08-04 09:44:37 -07:00
Sylvain Jeaugey
4a33f66e27 Update README to link to NCCL2 part 3 2017-08-04 09:44:09 -07:00
Sylvain Jeaugey
d66fb63679 Update README to link to NCCL2 #2 2017-08-04 09:43:29 -07:00
Sylvain Jeaugey
80ae43b443 Update README to link to NCCL2 2017-08-04 09:42:25 -07:00
Sylvain Jeaugey
29a1a916dc Add support for CUDA9 half semantics 2017-06-14 11:20:24 -07:00
Sylvain Jeaugey
ccfc4567dc Merge pull request #78 from ilya-biryukov/master
Fix compilation error when compiling with 'clang -x cuda'.
2017-04-04 09:47:52 -07:00
Boris Fomitchev
649f04d077 Added Pascal nvcc flags, bumped version v1.3.4-1 2017-03-24 11:58:14 -07:00
Ilya Biryukov
8241cd7b6e Fix compilation error when compiling with 'clang -x cuda'.
Functions vFetch and vStore are not found by ADL with clang,
so they need to be declared before usage in ReduceCopy.
2017-03-16 12:01:11 +01:00
Sylvain Jeaugey
7fef264bfa Bumping version to 1.3.3 2017-03-01 16:44:27 -08:00
Nathan Luehr
8996811936 Only enable peer access for ring neighbors.
This enables support for systems with more than 9 GPUs attached to a single PCIe root complex.
2017-03-01 16:42:38 -08:00
Sylvain Jeaugey
c219a183d0 Fix copy/paste typo in error message 2017-03-01 16:42:38 -08:00
Sylvain Jeaugey
8e1d6f9b60 Fix crash in Reduce when non-root ranks have invalid recvbuff 2017-03-01 16:42:38 -08:00
Sylvain Jeaugey
024d1e2678 Merge pull request #69 from cwhipkey/master
Qualify nullptr_t with std::
2017-02-08 09:17:50 -08:00
Chad Whipkey
5eab428294 Qualify nullptr_t with std::. 2017-02-08 07:06:31 -08:00
Sylvain Jeaugey
2a974f5ca2 Fix 1.3.2 compilation 2016-12-08 09:11:43 -08:00
Sylvain Jeaugey
648e9fbb58 Adding missing file 2016-12-05 18:06:24 -08:00
Sylvain Jeaugey
34d27771c6 1.3.2 release
Broadcast tuning
Better checking of inputs
Copy/reduce code simplification
2016-12-01 15:17:50 -08:00
Sylvain Jeaugey
1093821c33 Replace min BW by average BW in tests 2016-12-01 15:16:35 -08:00
Sylvain Jeaugey
ddddfba1c0 Merge pull request #54 from peterhj/peterhj-staticlib
Add a static library target "staticlib" to the Makefile.
2016-11-28 09:15:39 -08:00
Peter Jin
5765d608cc Add a static library target "staticlib" to the Makefile.
Rename the static library "libnccl_static.a" to disambiguate from the
dynamic libraries.
2016-11-24 11:31:03 -08:00
Kyle Fernandes, ne Jacobs
c2c515516b Remove irrelevant output from ncclReduce Fortran tests 2016-11-21 10:18:04 -08:00
Kyle Fernandes, ne Jacobs
9c18468fe2 Add Copyright header to Fortran bindings source files 2016-11-21 10:17:58 -08:00
Kyle Fernandes, ne Jacobs
5f2b32e45b Add Fortran bindings 2016-11-17 15:33:34 -08:00
Sylvain Jeaugey
534b9a1697 Bump to 1.3.1 2016-10-13 10:33:05 -07:00
Sylvain Jeaugey
b2781d0501 Fix primitives function prototype 2016-10-13 10:32:42 -07:00
Sylvain Jeaugey
bf7d1514f7 NVML (libwrap) : import the needed definitions 2016-10-13 10:28:59 -07:00
Sylvain Jeaugey
8bb06c94be Improved allreduce segmentation for small sizes 2016-10-07 12:42:23 -07:00
Sylvain Jeaugey
ca330b110a Add scan tests v1.3.0-1 2016-09-22 11:58:33 -07:00
Sylvain Jeaugey
6c77476cc1 Make tests check for deltas and report bandwidth 2016-09-22 11:58:28 -07:00
Sylvain Jeaugey
cabd6848e4 Heavy code refactoring to remove a lot of code in collectives (~1000 lines).
Have all collectives use the same args, the same ring, and the same primitives for synchronization between threads with the same pattern.
2016-09-22 11:57:56 -07:00
Sylvain Jeaugey
e3dbc6110e Add profiling API 2016-09-22 11:56:51 -07:00
Sylvain Jeaugey
1d6715fe20 Fix MPI test path 2016-09-22 11:56:20 -07:00