Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make packing of GEMM inputs more flexible #510

Merged
merged 1 commit into from
Jan 3, 2025

Conversation

robertknight
Copy link
Owner

@robertknight robertknight commented Jan 3, 2025

Previously the GEMM code assumed that data would be packed in the same format as the LHS / RHS input types, and with a fixed layout. To support new data types, especially int8, more flexibility will be needed.

  • int8 matmuls using dot product rather than FMA instructions will use a different blocked layout

  • For some architectures / data types it will make sense to expand inputs to a wider data type during packing rather than in the kernel

  • For some architectures / data types the kernel can operate on an unpacked LHS matrix, if it has unit column stride. For others packing will always be required.

To enable this:

  • Add Kernel methods which return descriptors specifying the size and alignment required for packing a particular A or B input.

  • Modify kernel interface to use opaque [u8] slices for packing buffer contents. The kernel implementations will cast this to a slice of the type they use internally.

  • Add a PackingBuffer struct which wraps a Vec<u32> buffer and provides an API for reserving space in the buffer and casting its contents to [u8] slices for the kernel and its packing methods.

The PackedAMatrix and PackedBMatrix types still have assumptions about the internal layout of packed buffers which will need to removed. That will happen in subsequent commits.

Part of #347.

@robertknight robertknight force-pushed the flexible-gemm-packing branch from 936c06e to 42b75a1 Compare January 3, 2025 21:31
Previously the GEMM code assumed that data would be packed in the same format as
the LHS / RHS input types, and with a fixed layout. To support new data types,
especially int8, more flexibility will be needed.

 - int8 matmuls using dot product rather than FMA instructions will use a
   different blocked layout

 - For some architectures / data types it will make sense to expand inputs to
   a wider data type during packing rather than in the kernel

 - For some architectures / data types the kernel can operate on an unpacked
   LHS matrix, if it has unit column stride. For others packing will always be
   required.

To enable this:

 - Add `Kernel` methods which return descriptors specifying the size and
   alignment required for packing a particular A or B input.

 - Modify kernel interface to use opaque `[u8]` slices for packing buffer
   contents. The kernel implementations will cast this to a slice of the type
   they use internally.

 - Add a `PackingBuffer` struct which wraps a `Vec<u32>` buffer and
   provides an API for reserving space in the buffer and casting its
   contents to `[u8]` slices for the kernel and its packing methods.

The `PackedAMatrix` and `PackedBMatrix` types still have assumptions about
the internal layout of packed buffers which will need to removed. That will
happen in subsequent commits.
@robertknight robertknight force-pushed the flexible-gemm-packing branch from 42b75a1 to e4c2d3d Compare January 3, 2025 21:41
@robertknight robertknight marked this pull request as ready for review January 3, 2025 21:46
@robertknight robertknight merged commit 5b15d8f into main Jan 3, 2025
2 checks passed
@robertknight robertknight deleted the flexible-gemm-packing branch January 3, 2025 21:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant