Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure pre-packed matrix layouts so generic GEMM code can be agnostic of panel layout #511

Merged
merged 3 commits into from
Jan 4, 2025

Conversation

robertknight
Copy link
Owner

#510 noted that the layout of pre-packed matrices required the generic GEMM code to know about the layout of data within a packed panel. This revises the layout so that is no longer the case. See the last commit for details. After this change it will be possible to use different panel layouts for each kernel. int8 kernels for example will use dot product instructions that require a different block layout than f32 kernels using FMA instructions.

Along the way I also added some comments about where the cache and register blocking sizes come from. These are explained in the referenced paper, but it is useful to have them more immediately accessible.

@robertknight robertknight changed the title Restructure prepacked matrix layouts so generic GEMM code can be agnostic of panel layout Restructure pre-packed matrix layouts so generic GEMM code can be agnostic of panel layout Jan 4, 2025
These are documented in the papers referenced in other comments in this module,
but add comments inline that are more immediately accessible.
Each kernel now manages a temporary tile of an appropriate size, so the generic
GEMM outer loops don't need to know the maximum size.
Divide prepacked into depth blocks with a size that matches the depth block size
(`kc`) used during computation.

This allows for the generic GEMM code to be agnostic of the layout of panels
within a block, as it no longer needs to be able to slice panels along the depth
dimension. Instead it just uses a depth block index to pick a panel with the
pre-determined block size. This in turn gives the GEMM kernel freedom to choose
the layout within each panel.

This partly undoes #482.
@robertknight robertknight merged commit c8c1bc0 into main Jan 4, 2025
2 checks passed
@robertknight robertknight deleted the gemm-depth-block branch January 4, 2025 10:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant