yezhengmao/nccl

Author	SHA1	Message	Date
Sylvain Jeaugey	cdae05b277	Improve INFO message when external network is not found. Fix #162	2018-12-04 12:10:58 -08:00
David Addison	5fe2618c0e	Fixed some compilation errors when TRACE=1 set	2018-11-29 14:12:14 -08:00
Sylvain Jeaugey	eed8218e17	Rework shared memory code to use SYSCHECK macros. This is to handle EINTR/EGAIN properly (issue #137), and also make the code consistent with the rest. Unfortunately posix_fallocate and mmap do not follow the classic return code/errno pattern, so we need to write wrappers around those functions.	2018-11-29 12:52:13 -08:00
Sylvain Jeaugey	302d538b73	Rework SYSCHECK macros to better handle retries. SYSCHECKVAL was not retrying when a retry was needed. Since not all calls are inside a loop, that means we could silently miss an EINTR/EAGAIN return code. Also rework the socket connection code and improve error reporting.	2018-11-29 12:52:13 -08:00
Sylvain Jeaugey	61b50a63ef	Improve net API description	2018-11-26 16:24:31 -08:00
Sylvain Jeaugey	98adf2fe11	Make network isend/irecv non blocking	2018-11-26 16:24:31 -08:00
Sylvain Jeaugey	0d3a20f96d	Add support for external network. Dynamically load external network from libnccl-net.so. Add init function in networks. Move PCI scoring to net.cu, only ask transport to provide a path. Simplify CUDA PCI path detection. Add dummy external network	2018-11-26 16:24:31 -08:00
Alex Sergeev	d7a58cfa58	Generate host-hash for P2P and SHM based on $(readlink /proc/self/ns/uts) + $(readlink /proc/self/ns/mnt) (#156 )	2018-11-19 17:39:44 -08:00
Sylvain Jeaugey	3c6e25210b	Generate nccl.h in build instead of src Generating nccl.h in src makes source directories dirty after builds.	2018-11-09 14:00:41 -08:00
David Addison	b56650c7f5	2.3.7-1 Improved LL tuning for multi-node jobs. Improved bootstrap for large job scaling. Fixed a hang during bootstrap due to socket reuse. Added operation name to the COLL INFO logging.	2018-10-24 14:44:59 -07:00
Sylvain Jeaugey	f93fe9bfd9	2.3.5-5 Add support for inter-node communication using sockets and InfiniBand/RoCE. Improve latency. Add support for aggregation. Improve LL/regular tuning. Remove tests as those are now at github.com/nvidia/nccl-tests .	2018-09-25 14:12:01 -07:00
Sylvain Jeaugey	29a1a916dc	Add support for CUDA9 half semantics	2017-06-14 11:20:24 -07:00
Ilya Biryukov	8241cd7b6e	Fix compilation error when compiling with 'clang -x cuda'. Functions vFetch and vStore are not found by ADL with clang, so they need to be declared before usage in ReduceCopy.	2017-03-16 12:01:11 +01:00
Nathan Luehr	8996811936	Only enable peer access for ring neighbors. This enables support for systems with more than 9 GPUs attached to a single PCIe root complex.	2017-03-01 16:42:38 -08:00
Sylvain Jeaugey	c219a183d0	Fix copy/paste typo in error message	2017-03-01 16:42:38 -08:00
Sylvain Jeaugey	8e1d6f9b60	Fix crash in Reduce when non-root ranks have invalid recvbuff	2017-03-01 16:42:38 -08:00
Chad Whipkey	5eab428294	Qualify nullptr_t with std::.	2017-02-08 07:06:31 -08:00
Sylvain Jeaugey	2a974f5ca2	Fix 1.3.2 compilation	2016-12-08 09:11:43 -08:00
Sylvain Jeaugey	648e9fbb58	Adding missing file	2016-12-05 18:06:24 -08:00
Sylvain Jeaugey	34d27771c6	1.3.2 release Broadcast tuning Better checking of inputs Copy/reduce code simplification	2016-12-01 15:17:50 -08:00
Sylvain Jeaugey	b2781d0501	Fix primitives function prototype	2016-10-13 10:32:42 -07:00
Sylvain Jeaugey	bf7d1514f7	NVML (libwrap) : import the needed definitions	2016-10-13 10:28:59 -07:00
Sylvain Jeaugey	8bb06c94be	Improved allreduce segmentation for small sizes	2016-10-07 12:42:23 -07:00
Sylvain Jeaugey	cabd6848e4	Heavy code refactoring to remove a lot of code in collectives (~1000 lines). Have all collectives use the same args, the same ring, and the same primitives for synchronization between threads with the same pattern.	2016-09-22 11:57:56 -07:00
Sylvain Jeaugey	e3dbc6110e	Add profiling API	2016-09-22 11:56:51 -07:00
Sylvain Jeaugey	9ee6189bf9	Merge pull request #41 from jia-kai/master Some minor fixes for compile/usage	2016-09-15 09:45:52 -07:00
Sylvain Jeaugey	75bad643bd	Updated LICENCE.txt	2016-08-26 15:08:20 -07:00
jiakai	47b0797fe1	pass devlist as const int* rather than int* in ncclCommInitAll	2016-08-19 19:00:14 +08:00
Sylvain Jeaugey	428ec5b2a3	Merge remote-tracking branch 'github/master' into public	2016-07-25 10:53:01 -07:00
Nathan Luehr	55c42ad681	Fixed redundant contexts in multi-process apps Change-Id: If787014450fd281304f0c7baf01d25963e40905d	2016-07-25 10:10:30 -07:00
Sylvain Jeaugey	e51e922924	Add a debug level to NCCL and CUDA versions at init	2016-06-16 17:04:41 -07:00
Sylvain Jeaugey	d5e507fc7f	Only call the CUDA runtime. That may fix #27 .	2016-06-07 16:27:51 -07:00
Sylvain Jeaugey	7edfc57228	Make NCCL collectives work on communicators with only one rank	2016-06-06 14:35:00 -07:00
Sylvain Jeaugey	acb93d1aed	Removing unneeded includes	2016-06-02 17:33:43 -07:00
Sylvain Jeaugey	dba3ec9428	Fix random deadlock during ncclCommInitRank.	2016-04-19 10:47:27 -07:00
Nathan Luehr	5554a4c9f0	Fixed useRemoteRecv consistency issue. Change-Id: Ib093a8dc3bb093eddc89dad81d3fffa53c03a6a2 Reviewed-on: http://git-master/r/1013543 Reviewed-by: Cliff Woolley <jwoolley@nvidia.com> Tested-by: Przemek Tredak <ptredak@nvidia.com>	2016-02-18 13:45:42 -08:00
Nathan Luehr	9442285526	Fixed buffer overflow in ReduceOrCopy Bug caused AllGathers and ReduceScatters of less than 8 bytes to fail in certain cases. Change-Id: I33e1beb50805bfdb457ae16a90e3f91c1b283b9b Reviewed-on: http://git-master/r/1011505 Reviewed-by: Przemek Tredak <ptredak@nvidia.com> Tested-by: Przemek Tredak <ptredak@nvidia.com>	2016-02-12 15:13:56 -08:00
Nathan Luehr	caa40b8dd3	Libwrap checks for LIB.so.1 if LIB.so not found Change-Id: I6f07f887f828cb2259dcfd496a2ad707db898cf5 Reviewed-on: http://git-master/r/1000162 Reviewed-by: Przemek Tredak <ptredak@nvidia.com> Tested-by: Przemek Tredak <ptredak@nvidia.com>	2016-01-29 12:36:42 -08:00
Nathan Luehr	fe1a956715	Enabled support for char type to be unsigned. GCC on POWER arch defines char type as unsigned. Change-Id: Ic143cb058fe42414b1f6f1f45b02132c837726ae Reviewed-on: http://git-master/r/999614 Reviewed-by: Przemek Tredak <ptredak@nvidia.com> Tested-by: Przemek Tredak <ptredak@nvidia.com>	2016-01-28 13:38:18 -08:00
Sylvain Jeaugey	c05312f151	Moved tests to separate dir and improved MPI test test sources moved to test/ directory. MPI test displays PASS/FAIL and returns code accordingly. Change-Id: I058ebd1bd5202d8f38cc9787898b2480100c102b Reviewed-on: http://git-master/r/936086 Reviewed-by: Przemek Tredak <ptredak@nvidia.com> Tested-by: Przemek Tredak <ptredak@nvidia.com>	2016-01-28 12:56:36 -08:00
Nathan Luehr	5966316771	Added support for more than 8 GPUs. Change-Id: Iaa1841036a7bfdad6ebec99fed0adcd2bbe6ffad Reviewed-on: http://git-master/r/935459 Reviewed-by: Cliff Woolley <jwoolley@nvidia.com> Tested-by: Przemek Tredak <ptredak@nvidia.com>	2016-01-21 13:00:21 -08:00
Nathan Luehr	130ee246e2	Fixed deadlock in back-to-back reduce_scatters. Change-Id: I92d32b15e516a39710b676aee692ae9b70638937 Reviewed-on: http://git-master/r/935458 Reviewed-by: Przemek Tredak <ptredak@nvidia.com> Tested-by: Przemek Tredak <ptredak@nvidia.com>	2016-01-21 10:36:03 -08:00
Nathan Luehr	651a6edc5c	Fixed bug in MPI initialization.	2015-12-10 17:54:41 -08:00
Simon Layton	41ce4ca9fc	Add int64 and uint64 types for all algorithms and tests	2015-12-04 13:28:36 -05:00
Nathan Luehr	27d32ac5d9	Fixed a race condition in reduce and braodcast.	2015-11-19 11:11:52 -08:00
Nathan Luehr	0673d5f44f	Initial release.	2015-11-17 11:30:40 -08:00

1 2

96 Commits