Make packing of GEMM inputs more flexible #510
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Previously the GEMM code assumed that data would be packed in the same format as the LHS / RHS input types, and with a fixed layout. To support new data types, especially int8, more flexibility will be needed.
int8 matmuls using dot product rather than FMA instructions will use a different blocked layout
For some architectures / data types it will make sense to expand inputs to a wider data type during packing rather than in the kernel
For some architectures / data types the kernel can operate on an unpacked LHS matrix, if it has unit column stride. For others packing will always be required.
To enable this:
Add
Kernel
methods which return descriptors specifying the size and alignment required for packing a particular A or B input.Modify kernel interface to use opaque
[u8]
slices for packing buffer contents. The kernel implementations will cast this to a slice of the type they use internally.Add a
PackingBuffer
struct which wraps aVec<u32>
buffer and provides an API for reserving space in the buffer and casting its contents to[u8]
slices for the kernel and its packing methods.The
PackedAMatrix
andPackedBMatrix
types still have assumptions about the internal layout of packed buffers which will need to removed. That will happen in subsequent commits.Part of #347.