Dev more bench #8

imrn99 · 2023-11-23T14:22:37Z

Added:

GEMM benchmark (uses the PoC)
hardcoded GEMM benchmark (uses directly rayon / std iterators)

It seems speedup reaches a ceiling of approximately 2 for axpy & gemv benches, while it reaches ideal values (16 for my laptop) when running gemm. Note that ideal values are also reached in the first two benches when artificially lengthening the kernel using a method such as sleep.
The hardcoded gemm bench yields approximately the same result for parallel execution, but significantly better result for serial execution.

Considering these two points, could the compiler be optimizing away some things I missed?

speedup > n_cores ????

it seems the library creates much more overhead for serial execution than parallel

imrn99 added 7 commits November 23, 2023 09:20

gemm skeleton

471413d

completed gemm

fa9c601

speedup > n_cores ????

hardcoded gemm bench

e235853

it seems the library creates much more overhead for serial execution than parallel

grouped blas speedup benches in a folder

374ab49

fixed bench paths

044e245

update doc & readme

c1c63e5

fixed warnings when testing using parallel features

3d6169b

imrn99 merged commit d932467 into master Nov 23, 2023
4 checks passed

imrn99 deleted the dev_more_bench branch November 23, 2023 14:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dev more bench #8

Dev more bench #8

imrn99 commented Nov 23, 2023 •

edited

Loading

Dev more bench #8

Dev more bench #8

Conversation

imrn99 commented Nov 23, 2023 • edited Loading

imrn99 commented Nov 23, 2023 •

edited

Loading