Sylvain Jeaugey
|
cabd6848e4
|
Heavy code refactoring to remove a lot of code in collectives (~1000 lines).
Have all collectives use the same args, the same ring, and the same primitives for synchronization between threads with the same pattern.
|
2016-09-22 11:57:56 -07:00 |
|
Sylvain Jeaugey
|
75bad643bd
|
Updated LICENCE.txt
|
2016-08-26 15:08:20 -07:00 |
|
Sylvain Jeaugey
|
7edfc57228
|
Make NCCL collectives work on communicators with only one rank
|
2016-06-06 14:35:00 -07:00 |
|
Nathan Luehr
|
5966316771
|
Added support for more than 8 GPUs.
Change-Id: Iaa1841036a7bfdad6ebec99fed0adcd2bbe6ffad
Reviewed-on: http://git-master/r/935459
Reviewed-by: Cliff Woolley <jwoolley@nvidia.com>
Tested-by: Przemek Tredak <ptredak@nvidia.com>
|
2016-01-21 13:00:21 -08:00 |
|
Nathan Luehr
|
130ee246e2
|
Fixed deadlock in back-to-back reduce_scatters.
Change-Id: I92d32b15e516a39710b676aee692ae9b70638937
Reviewed-on: http://git-master/r/935458
Reviewed-by: Przemek Tredak <ptredak@nvidia.com>
Tested-by: Przemek Tredak <ptredak@nvidia.com>
|
2016-01-21 10:36:03 -08:00 |
|
Simon Layton
|
41ce4ca9fc
|
Add int64 and uint64 types for all algorithms and tests
|
2015-12-04 13:28:36 -05:00 |
|
Nathan Luehr
|
27d32ac5d9
|
Fixed a race condition in reduce and braodcast.
|
2015-11-19 11:11:52 -08:00 |
|
Nathan Luehr
|
0673d5f44f
|
Initial release.
|
2015-11-17 11:30:40 -08:00 |
|