Add support for A100 GPU and related platforms. Add support for CUDA 11. Add support for send/receive operations (beta).