Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does TensorProductUniform4x1d not utilize the Tensor core of GPUs #52

Open
LHJ1098826475 opened this issue Dec 24, 2024 · 3 comments
Open

Comments

@LHJ1098826475
Copy link

Hi

cuet.TensorProduct flattens, squeezes, and splits the descriptor so that it can use the TensorProductUniform4x1d. In fact, GPUs have Tensor core. Will processing data into 1d vectors reduce computational complexity, or may I have misunderstood? Please advise.

@mariogeiger
Copy link
Collaborator

Hi,

For now we have 3 kernels and the frontend, as you saw, tries to reduce the given STP to on of these kernels. We are currently working on two things related to that:

  • improve the reduction of STP to the available kernels
  • add more kernels.

Can you provide details about the specific STP you are executing?

@LHJ1098826475
Copy link
Author

Hi,

We are mainly focusing on the TensorProductUniform4x1d interface. Can you provide more information?

In addition, the frontend of cuEqu shows four interfaces: TensorProductUniform3x1d, TensorProductUniform4x1d, FusedTensorProductOp3, and FusedTensorProductOp4, but you mentioned there are only three kernels. Is the backend kernels corresponding to these four interfaces?

Thanks.

@mariogeiger
Copy link
Collaborator

The third one is the symmetric contraction. You can see some explanations here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants