NVIDIA cuBLASMp is a high performance, multi-process, GPU accelerated library for distributed basic dense linear algebra.
NVIDIA cuBLASMp is a high performance, multi-process, GPU accelerated library for distributed basic dense linear algebra. cuBLASMp is compatible with 2D block-cyclic data layout and provides PBLAS-like C APIs.