Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev more bench #8

Merged
merged 7 commits into from
Nov 23, 2023
Merged

Dev more bench #8

merged 7 commits into from
Nov 23, 2023

Conversation

imrn99
Copy link
Owner

@imrn99 imrn99 commented Nov 23, 2023

Added:

  • GEMM benchmark (uses the PoC)
  • hardcoded GEMM benchmark (uses directly rayon / std iterators)

It seems speedup reaches a ceiling of approximately 2 for axpy & gemv benches, while it reaches ideal values (16 for my laptop) when running gemm. Note that ideal values are also reached in the first two benches when artificially lengthening the kernel using a method such as sleep.
The hardcoded gemm bench yields approximately the same result for parallel execution, but significantly better result for serial execution.

Considering these two points, could the compiler be optimizing away some things I missed?

@imrn99 imrn99 merged commit d932467 into master Nov 23, 2023
4 checks passed
@imrn99 imrn99 deleted the dev_more_bench branch November 23, 2023 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant