Sylvain Jeaugey
|
34d27771c6
|
1.3.2 release
Broadcast tuning
Better checking of inputs
Copy/reduce code simplification
|
2016-12-01 15:17:50 -08:00 |
|
Sylvain Jeaugey
|
cabd6848e4
|
Heavy code refactoring to remove a lot of code in collectives (~1000 lines).
Have all collectives use the same args, the same ring, and the same primitives for synchronization between threads with the same pattern.
|
2016-09-22 11:57:56 -07:00 |
|
Sylvain Jeaugey
|
75bad643bd
|
Updated LICENCE.txt
|
2016-08-26 15:08:20 -07:00 |
|
Sylvain Jeaugey
|
e51e922924
|
Add a debug level to NCCL and CUDA versions at init
|
2016-06-16 17:04:41 -07:00 |
|
Nathan Luehr
|
5966316771
|
Added support for more than 8 GPUs.
Change-Id: Iaa1841036a7bfdad6ebec99fed0adcd2bbe6ffad
Reviewed-on: http://git-master/r/935459
Reviewed-by: Cliff Woolley <jwoolley@nvidia.com>
Tested-by: Przemek Tredak <ptredak@nvidia.com>
|
2016-01-21 13:00:21 -08:00 |
|
Nathan Luehr
|
0673d5f44f
|
Initial release.
|
2015-11-17 11:30:40 -08:00 |
|