NVIDIA cuBLASMp is a high performance, multi-process, GPU accelerated library for distributed basic dense linear algebra.
NVIDIA cuBLASMp is a high performance, multi-process, GPU accelerated library for distributed basic dense linear algebra. cuBLASMp is compatible with 2D block-cyclic data layout and provides PBLAS-like C APIs. A companion library, CAL, contains utilities to manage communicators and to synchronize processes in a safe way.