Does TensorProductUniform4x1d not utilize the Tensor core of GPUs #52

LHJ1098826475 · 2024-12-24T04:36:49Z

Hi

cuet.TensorProduct flattens, squeezes, and splits the descriptor so that it can use the TensorProductUniform4x1d. In fact, GPUs have Tensor core. Will processing data into 1d vectors reduce computational complexity, or may I have misunderstood? Please advise.

The text was updated successfully, but these errors were encountered:

mariogeiger · 2025-01-07T11:48:43Z

Hi,

For now we have 3 kernels and the frontend, as you saw, tries to reduce the given STP to on of these kernels. We are currently working on two things related to that:

improve the reduction of STP to the available kernels
add more kernels.

Can you provide details about the specific STP you are executing?

LHJ1098826475 · 2025-01-14T07:25:31Z

Hi,

We are mainly focusing on the TensorProductUniform4x1d interface. Can you provide more information?

In addition, the frontend of cuEqu shows four interfaces: TensorProductUniform3x1d, TensorProductUniform4x1d, FusedTensorProductOp3, and FusedTensorProductOp4, but you mentioned there are only three kernels. Is the backend kernels corresponding to these four interfaces?

Thanks.

mariogeiger · 2025-01-14T09:15:39Z

The third one is the symmetric contraction. You can see some explanations here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does TensorProductUniform4x1d not utilize the Tensor core of GPUs #52

Does TensorProductUniform4x1d not utilize the Tensor core of GPUs #52

LHJ1098826475 commented Dec 24, 2024

mariogeiger commented Jan 7, 2025

LHJ1098826475 commented Jan 14, 2025

mariogeiger commented Jan 14, 2025

Does TensorProductUniform4x1d not utilize the Tensor core of GPUs #52

Does TensorProductUniform4x1d not utilize the Tensor core of GPUs #52

Comments

LHJ1098826475 commented Dec 24, 2024

mariogeiger commented Jan 7, 2025

LHJ1098826475 commented Jan 14, 2025

mariogeiger commented Jan 14, 2025