Skip to content

Commit

Permalink
Disable dp4a matmul on macOS, since metal does not appear to have DP4…
Browse files Browse the repository at this point in the history
…A in its instruction set.
  • Loading branch information
sushraja-msft committed Jan 20, 2025
1 parent 524da06 commit d56b9c3
Showing 1 changed file with 4 additions and 1 deletion.
5 changes: 4 additions & 1 deletion onnxruntime/contrib_ops/webgpu/quantization/matmul_nbits.cc
Original file line number Diff line number Diff line change
Expand Up @@ -782,9 +782,12 @@ Status MatMulNBits::ComputeInternal(onnxruntime::webgpu::ComputeContext& context

const bool has_zero_points = zero_points != nullptr;
const bool has_subgroup = context.Device().HasFeature(wgpu::FeatureName::Subgroups);
// macOS - Avoid using dp4a on Metal, as it does not appear to have native dp4a support.
// https://github.com/gpuweb/gpuweb/issues/2677#issuecomment-1713292226
const bool use_dp4a = has_subgroup && context.AdapterInfo().backendType != wgpu::BackendType::Metal;
if (accuracy_level_ == 4 && block_size == 32 &&
batch_count == 1 && components_a == 4 && K % 64 == 0 && N % 16 == 0 &&
!has_zero_points && has_subgroup && M >= kMinMForTileOptimization) {
!has_zero_points && use_dp4a && M >= kMinMForTileOptimization) {
constexpr uint32_t kVec4Components = 4;
constexpr uint32_t kVec2Components = 2;
constexpr uint32_t kU32Components = 4;
Expand Down

0 comments on commit d56b9c3

Please sign in to comment.