CUDA speedup on smaller tensors? #3

fiftysevendegreesofrad · 2021-06-18T16:15:32Z

Hi, this module is great :-)

I'm wondering however if there are any options on the table for reducing CUDA fixed overheads and hence getting a speedup on smaller tensors? e.g. modifying perf.py to interpolate fewer points
X, Y = np.meshgrid(np.arange(-.5, 2.5, .1), np.arange(-.5, 2.5, .01))

I'm getting

Interpolating 9000 points on 300 by 300 grid
PyTorch took 1.319 +\- 0.235 ms
PyTorch Cuda took 1.322 +\- 0.869 ms
Scipy took 0.803 +\- 0.052 ms

Do you think there is some way to combine CUDA kernals to get the 20x speed boost on a tensor this size?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA speedup on smaller tensors? #3

CUDA speedup on smaller tensors? #3

fiftysevendegreesofrad commented Jun 18, 2021

CUDA speedup on smaller tensors? #3

CUDA speedup on smaller tensors? #3

Comments

fiftysevendegreesofrad commented Jun 18, 2021