-
-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inplace version of batched adjoint/transpose #502
Comments
Do you want to file a PR? You can create a |
Would It is also possible to rewrite the |
Since both |
I'm not familiar with |
Here's an untested translation to KernelAbstractions. I made some minor style changes to better match existing KA-using functions like function batched_transpose_f!(f, dst::AnyGPUArray{<:Any, 3}, src::AnyGPUArray{<:Any, 3})
axes(dst, 1) == axes(src, 2) && axes(dst, 2) == axes(src, 1) && axes(src, 3) == axes(dst, 3) || throw(DimensionMismatch(string(f)))
backend = KernelAbstractions.get_backend(src)
_batched_transpose_f!(backend)(f, dst, src; ndrange=size(src))
return B
end
@kernel function _batched_transpose_f!(f::F, dst, @Const(src)) where F
i, j, k = @index(Global, NTuple)
@inbounds dst[j, i, k] = f(src[i, j, k])
end |
Isn't this |
Yes, but we still need to overload |
We are missing an the inplace version of batched adjoint/transpose. They are required to avoid gpu scalar indexing with
Base.copy
likecopy(batched_adjoint(CUDA.randn(3,5,2)))
. They can be implemented as:which require an extra dependency of
GPUArrays
. I have no idea where should we put these code under theext
.The text was updated successfully, but these errors were encountered: