Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm
linear-algebra mpi cuda scalapack matrix-multiplication gpu-acceleration rocm matmul communication-optimal pdgemm
-
Updated
Dec 11, 2024 - C++