An implementation of BLAS (Basic Linear Algebra Subprograms) on top of the NVIDIA CUDA runtime.
conda install conda-forge::libcublas-static
The cuBLAS Library provides a GPU-accelerated implementation of the basic linear algebra subroutines (BLAS).