6 Commits

Author SHA1 Message Date
Sylvain Jeaugey
34d27771c6 1.3.2 release
Broadcast tuning
Better checking of inputs
Copy/reduce code simplification
2016-12-01 15:17:50 -08:00
Sylvain Jeaugey
cabd6848e4 Heavy code refactoring to remove a lot of code in collectives (~1000 lines).
Have all collectives use the same args, the same ring, and the same primitives for synchronization between threads with the same pattern.
2016-09-22 11:57:56 -07:00
Sylvain Jeaugey
75bad643bd Updated LICENCE.txt 2016-08-26 15:08:20 -07:00
Sylvain Jeaugey
e51e922924 Add a debug level to NCCL and CUDA versions at init 2016-06-16 17:04:41 -07:00
Nathan Luehr
5966316771 Added support for more than 8 GPUs.
Change-Id: Iaa1841036a7bfdad6ebec99fed0adcd2bbe6ffad
Reviewed-on: http://git-master/r/935459
Reviewed-by: Cliff Woolley <jwoolley@nvidia.com>
Tested-by: Przemek Tredak <ptredak@nvidia.com>
2016-01-21 13:00:21 -08:00
Nathan Luehr
0673d5f44f Initial release. 2015-11-17 11:30:40 -08:00