Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA speedup on smaller tensors? #3

Open
fiftysevendegreesofrad opened this issue Jun 18, 2021 · 0 comments
Open

CUDA speedup on smaller tensors? #3

fiftysevendegreesofrad opened this issue Jun 18, 2021 · 0 comments

Comments

@fiftysevendegreesofrad
Copy link

Hi, this module is great :-)

I'm wondering however if there are any options on the table for reducing CUDA fixed overheads and hence getting a speedup on smaller tensors? e.g. modifying perf.py to interpolate fewer points
X, Y = np.meshgrid(np.arange(-.5, 2.5, .1), np.arange(-.5, 2.5, .01))

I'm getting

Interpolating 9000 points on 300 by 300 grid
PyTorch took 1.319 +\- 0.235 ms
PyTorch Cuda took 1.322 +\- 0.869 ms
Scipy took 0.803 +\- 0.052 ms

Do you think there is some way to combine CUDA kernals to get the 20x speed boost on a tensor this size?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant