Accelerated Matrix Multiply Implements matrix multiply in a variety of ways: tiled CUDA, hyper-optimized CUDA, cuBLAS and WMMA.