255 Commits

Author SHA1 Message Date
Sylvain Jeaugey
ccfc4567dc Merge pull request #78 from ilya-biryukov/master
Fix compilation error when compiling with 'clang -x cuda'.
2017-04-04 09:47:52 -07:00
Boris Fomitchev
649f04d077 Added Pascal nvcc flags, bumped version v1.3.4-1 2017-03-24 11:58:14 -07:00
Ilya Biryukov
8241cd7b6e Fix compilation error when compiling with 'clang -x cuda'.
Functions vFetch and vStore are not found by ADL with clang,
so they need to be declared before usage in ReduceCopy.
2017-03-16 12:01:11 +01:00
Sylvain Jeaugey
7fef264bfa Bumping version to 1.3.3 2017-03-01 16:44:27 -08:00
Nathan Luehr
8996811936 Only enable peer access for ring neighbors.
This enables support for systems with more than 9 GPUs attached to a single PCIe root complex.
2017-03-01 16:42:38 -08:00
Sylvain Jeaugey
c219a183d0 Fix copy/paste typo in error message 2017-03-01 16:42:38 -08:00
Sylvain Jeaugey
8e1d6f9b60 Fix crash in Reduce when non-root ranks have invalid recvbuff 2017-03-01 16:42:38 -08:00
Sylvain Jeaugey
024d1e2678 Merge pull request #69 from cwhipkey/master
Qualify nullptr_t with std::
2017-02-08 09:17:50 -08:00
Chad Whipkey
5eab428294 Qualify nullptr_t with std::. 2017-02-08 07:06:31 -08:00
Sylvain Jeaugey
2a974f5ca2 Fix 1.3.2 compilation 2016-12-08 09:11:43 -08:00
Sylvain Jeaugey
648e9fbb58 Adding missing file 2016-12-05 18:06:24 -08:00
Sylvain Jeaugey
34d27771c6 1.3.2 release
Broadcast tuning
Better checking of inputs
Copy/reduce code simplification
2016-12-01 15:17:50 -08:00
Sylvain Jeaugey
1093821c33 Replace min BW by average BW in tests 2016-12-01 15:16:35 -08:00
Sylvain Jeaugey
ddddfba1c0 Merge pull request #54 from peterhj/peterhj-staticlib
Add a static library target "staticlib" to the Makefile.
2016-11-28 09:15:39 -08:00
Peter Jin
5765d608cc Add a static library target "staticlib" to the Makefile.
Rename the static library "libnccl_static.a" to disambiguate from the
dynamic libraries.
2016-11-24 11:31:03 -08:00
Kyle Fernandes, ne Jacobs
c2c515516b Remove irrelevant output from ncclReduce Fortran tests 2016-11-21 10:18:04 -08:00
Kyle Fernandes, ne Jacobs
9c18468fe2 Add Copyright header to Fortran bindings source files 2016-11-21 10:17:58 -08:00
Kyle Fernandes, ne Jacobs
5f2b32e45b Add Fortran bindings 2016-11-17 15:33:34 -08:00
Sylvain Jeaugey
534b9a1697 Bump to 1.3.1 2016-10-13 10:33:05 -07:00
Sylvain Jeaugey
b2781d0501 Fix primitives function prototype 2016-10-13 10:32:42 -07:00
Sylvain Jeaugey
bf7d1514f7 NVML (libwrap) : import the needed definitions 2016-10-13 10:28:59 -07:00
Sylvain Jeaugey
8bb06c94be Improved allreduce segmentation for small sizes 2016-10-07 12:42:23 -07:00
Sylvain Jeaugey
ca330b110a Add scan tests v1.3.0-1 2016-09-22 11:58:33 -07:00
Sylvain Jeaugey
6c77476cc1 Make tests check for deltas and report bandwidth 2016-09-22 11:58:28 -07:00
Sylvain Jeaugey
cabd6848e4 Heavy code refactoring to remove a lot of code in collectives (~1000 lines).
Have all collectives use the same args, the same ring, and the same primitives for synchronization between threads with the same pattern.
2016-09-22 11:57:56 -07:00
Sylvain Jeaugey
e3dbc6110e Add profiling API 2016-09-22 11:56:51 -07:00
Sylvain Jeaugey
1d6715fe20 Fix MPI test path 2016-09-22 11:56:20 -07:00
Sylvain Jeaugey
9ee6189bf9 Merge pull request #41 from jia-kai/master
Some minor fixes for compile/usage
2016-09-15 09:45:52 -07:00
Sylvain Jeaugey
939b0a4297 Merge pull request #45 from NVIDIA/cw-update-copyright-year
Update LICENSE.txt
2016-08-26 15:44:00 -07:00
Cliff Woolley
234c8c9ef3 Update LICENSE.txt 2016-08-26 15:39:21 -07:00
Sylvain Jeaugey
75bad643bd Updated LICENCE.txt 2016-08-26 15:08:20 -07:00
jiakai
47b0797fe1 pass devlist as const int* rather than int* in ncclCommInitAll 2016-08-19 19:00:14 +08:00
jiakai
ed401cc29b link library with -lrt; otherwise there is undefined reference to shm_open 2016-08-19 18:58:56 +08:00
Sylvain Jeaugey
b3a9e1333d Remove unneeded deb build script 2016-07-27 17:58:00 -07:00
Sylvain Jeaugey
428ec5b2a3 Merge remote-tracking branch 'github/master' into public 2016-07-25 10:53:01 -07:00
Nathan Luehr
55c42ad681 Fixed redundant contexts in multi-process apps
Change-Id: If787014450fd281304f0c7baf01d25963e40905d
2016-07-25 10:10:30 -07:00
Sylvain Jeaugey
7a1aa6b563 Improved Deb generation 2016-07-07 16:31:57 +02:00
Sylvain Jeaugey
9ae84f5d6b Fix version number 2016-06-16 17:07:42 -07:00
Sylvain Jeaugey
e51e922924 Add a debug level to NCCL and CUDA versions at init 2016-06-16 17:04:41 -07:00
Sylvain Jeaugey
9fcc523485 Increased version to 1.2.3 2016-06-15 19:18:13 -07:00
Sylvain Jeaugey
67d1ab9106 Packaging : Generate shlibs.local 2016-06-15 19:03:08 -07:00
Sylvain Jeaugey
da6d2009e0 Move deb to build directory 2016-06-15 18:20:10 -07:00
Sylvain Jeaugey
155132d336 Fix make install to use BUILDDIR 2016-06-15 18:20:02 -07:00
Sylvain Jeaugey
08ddfe03d2 Rework debian packaging 2016-06-15 18:18:44 -07:00
Sylvain Jeaugey
5d4716a8a3 Include link to blog post in README.md 2016-06-15 10:54:19 -07:00
Boris Fomitchev
aa8f669a3d Updating for .deb rebuild v1.2.3-1+cuda7.5 2016-06-13 02:01:49 -07:00
Sylvain Jeaugey
d5e507fc7f Only call the CUDA runtime. That may fix #27. 2016-06-07 16:27:51 -07:00
Sylvain Jeaugey
620491a649 Merge remote-tracking branch 'github/master' into HEAD 2016-06-06 14:35:57 -07:00
Sylvain Jeaugey
7edfc57228 Make NCCL collectives work on communicators with only one rank 2016-06-06 14:35:00 -07:00
Sylvain Jeaugey
bd3cf73e6e Changed CURAND generator to work on a wider set of platforms. 2016-06-06 14:34:03 -07:00