You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm wondering however if there are any options on the table for reducing CUDA fixed overheads and hence getting a speedup on smaller tensors? e.g. modifying perf.py to interpolate fewer points X, Y = np.meshgrid(np.arange(-.5, 2.5, .1), np.arange(-.5, 2.5, .01))
I'm getting
Interpolating 9000 points on 300 by 300 grid
PyTorch took 1.319 +\- 0.235 ms
PyTorch Cuda took 1.322 +\- 0.869 ms
Scipy took 0.803 +\- 0.052 ms
Do you think there is some way to combine CUDA kernals to get the 20x speed boost on a tensor this size?
The text was updated successfully, but these errors were encountered:
Hi, this module is great :-)
I'm wondering however if there are any options on the table for reducing CUDA fixed overheads and hence getting a speedup on smaller tensors? e.g. modifying perf.py to interpolate fewer points
X, Y = np.meshgrid(np.arange(-.5, 2.5, .1), np.arange(-.5, 2.5, .01))
I'm getting
Do you think there is some way to combine CUDA kernals to get the 20x speed boost on a tensor this size?
The text was updated successfully, but these errors were encountered: