Make packing of GEMM inputs more flexible #510

robertknight · 2025-01-03T21:29:56Z

Previously the GEMM code assumed that data would be packed in the same format as the LHS / RHS input types, and with a fixed layout. To support new data types, especially int8, more flexibility will be needed.

int8 matmuls using dot product rather than FMA instructions will use a different blocked layout
For some architectures / data types it will make sense to expand inputs to a wider data type during packing rather than in the kernel
For some architectures / data types the kernel can operate on an unpacked LHS matrix, if it has unit column stride. For others packing will always be required.

To enable this:

Add Kernel methods which return descriptors specifying the size and alignment required for packing a particular A or B input.
Modify kernel interface to use opaque [u8] slices for packing buffer contents. The kernel implementations will cast this to a slice of the type they use internally.
Add a PackingBuffer struct which wraps a Vec<u32> buffer and provides an API for reserving space in the buffer and casting its contents to [u8] slices for the kernel and its packing methods.

The PackedAMatrix and PackedBMatrix types still have assumptions about the internal layout of packed buffers which will need to removed. That will happen in subsequent commits.

Part of #347.

Previously the GEMM code assumed that data would be packed in the same format as the LHS / RHS input types, and with a fixed layout. To support new data types, especially int8, more flexibility will be needed. - int8 matmuls using dot product rather than FMA instructions will use a different blocked layout - For some architectures / data types it will make sense to expand inputs to a wider data type during packing rather than in the kernel - For some architectures / data types the kernel can operate on an unpacked LHS matrix, if it has unit column stride. For others packing will always be required. To enable this: - Add `Kernel` methods which return descriptors specifying the size and alignment required for packing a particular A or B input. - Modify kernel interface to use opaque `[u8]` slices for packing buffer contents. The kernel implementations will cast this to a slice of the type they use internally. - Add a `PackingBuffer` struct which wraps a `Vec<u32>` buffer and provides an API for reserving space in the buffer and casting its contents to `[u8]` slices for the kernel and its packing methods. The `PackedAMatrix` and `PackedBMatrix` types still have assumptions about the internal layout of packed buffers which will need to removed. That will happen in subsequent commits.

robertknight force-pushed the flexible-gemm-packing branch from 936c06e to 42b75a1 Compare January 3, 2025 21:31

robertknight force-pushed the flexible-gemm-packing branch from 42b75a1 to e4c2d3d Compare January 3, 2025 21:41

robertknight marked this pull request as ready for review January 3, 2025 21:46

robertknight merged commit 5b15d8f into main Jan 3, 2025
2 checks passed

robertknight deleted the flexible-gemm-packing branch January 3, 2025 21:46

robertknight mentioned this pull request Jan 4, 2025

Restructure pre-packed matrix layouts so generic GEMM code can be agnostic of panel layout #511

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make packing of GEMM inputs more flexible #510

Make packing of GEMM inputs more flexible #510

robertknight commented Jan 3, 2025 •

edited

Loading

Make packing of GEMM inputs more flexible #510

Make packing of GEMM inputs more flexible #510

Conversation

robertknight commented Jan 3, 2025 • edited Loading

robertknight commented Jan 3, 2025 •

edited

Loading